Apptainer/Singularity issues

The rust_code_server container image provided by this course is unfortunately not compatible with the Singularity HPC container runtima and its Apptainer fork. And there is unfortunately no easy way to make it compatible with it. This page explains why.

How containers normally work

To explain the problem with Singularity and Apptainer, let us first summarize the steps that most container runtimes take when you type in an execution command like podman run:

  1. Mount a read-only disk image of a guest Linux system into some filesystem directory. This image contains most of the things you would expect to find in the / root of a Linux filesystem, including an etc subdirectory for global system configuration, a usr subdirectory for libraries, programs and other resources, a home subdirectory for user-specific data, etc.
  2. Use a filesystem overlay to let the user “modify” some files in this directory hierarchy, without actually altering the underlying read-only filesystem image.
  3. Use bind mounts to inject some files from the host system into this directory hierarchy as needed for correct application operation. This typically includes…
    • A subset of the pseudo-files from /dev, /proc and /sys, which are kernel interfaces that some applications need to access. One way containers are sandboxed for security is by restricting this subset so that the container can access less host resources.
    • Some configuration from the host /etc directory that should also apply to the guest, typically related to network access or timekeeping.
    • Extra directories chosen by the user, used to inject input data into the container or keep output data around after the container has stopped running.
    • If vendor-specific GPU support is enabled, relevant pseudo-files from /dev are added, together with some vendor libraries from the host’s /usr directory which must be kept in sync with the underlying Linux kernel driver to work correctly.
  4. Once the guest directory hierarchy has reached a satisfactory state, use the chroot() system call to redefine the meaning of the filesystem root / so that all future accesses to absolute paths like /x/y/z end up accessing the x/y/z sub-path of the guest directory hierarchy.
  5. Activate various Linux kernel sandboxing features to restrict what applications can do.
    • This typically includes restricting access to host processes, user accounts and some system calls (like the ptrace() API used by debuggers and profilers). All of these can be used to acquire security-sensitive information on the host system and possibly change its configuration, which is not generally considered desirable.
    • Network connectivity is also restricted by forcing the use of a virtual network interface that can only communicate with the hardware network interface via a virtual routing system with a firewall. This lets the user control the container’s ability to make outgoing network connection and accept incoming ones.
    • These restrictions can be relaxed by the user if needed with CLI options like --privileged and --net=host. And they can also be strengthened by e.g. setting limits to how much CPU and RAM resources the containerized processes can use.
  6. Clear environment variables, then set them up as directed by the guest system image metadata and --env command-line options.
  7. Switch to the guest system user specified by the guest image metadata by effectively modifying the active UID/GID seen by running processes (this is harder than it looks).
  8. Execute the user-specified program, or a default program specified by the guest system image metadata, in this well-controlled configuration.

The Singularity/Apptainer way

Given the previous background information, we can now explain the ways in which Singularity and Apptainer differ from typical container runtimes like Docker and Podman. Since Singularity and Apptainer are very similar to each other, every sentence in the following text that contains the word “Apptainer” should be understood as applying to Singularity as well.

  • Unlike Docker and Podman, Apptainer does not provide a writable filesystem overlay (step 2) by default. Every file that comes from the container image, as opposed to being bind-mounted from the host system, is meant to be read-only inside of the container. While writable overlays are technically supported as an optional runtime configuration if you really want them, the associated command line UI is crude and beginner-hostile.
  • Bind mounts (step 3) are much more extensive by default. They include all of /dev, /proc and /sys, as well as the system’s temporary directories /tmp and /var/tmp, the current working directory, and the user’s home directory. Furthermore, the /etc/passwd and /etc/group files of the guest system are replaced with new versions generated by Apptainer, which only include a copy of the local system user’s configuration. Combined with home directory mounting and lack of UID/GID switching (see below), this has the effect of deleting all user accounts that were created at container image building time, and replacing them with a copy of the local user account that is executing the container.
  • System sandboxing (step 5) is a lot weaker by default. There are ways to tighten it if needed, but generally speaking an Apptainer container aims to feel like running a binary installed on the host system, in contrast with most container runtimes which instead aim to feel like running code in an isolated environment / virtual machine.
  • Following the same logic, user environment variables are not cleared before the container-specific environment setup is applied (step 6). And for no obvious reason, Apptainer will also carelessly shell-evaluate the contents of your environment variables, which is almost the same as breaking the standard POSIX shell distrinction between single- and double-quoted strings, but with delayed evaluation as a confusion bonus. This will result in an environment variable setup that is a strange chimera of the host’s setup and the guest’s intended setup, and may include parts that differ from both due to shell evaluation goofiness.
  • Because the container runtime effectively deleted all user accounts inside of the guest user system at bind mount time, it could not switch to the expected user account (step 7) even if it wanted to. So it simply ignores this part of the container image’s configuration.

Overall, the runtime behavior of an Apptainer/Singularity container is much closer to that of a “fat” binary application package that carries its dependencies with it (like an AppImage), than to that of an isolated guest operating system with a well-controlled execution environment.

Why this is a problem?

Many of the ways in which Apptainer and Singularity deviate from the typical container runtime environment are highly problematic from the perspective of this course.

  • Arbitrarily mounting user home directories and forwarding environment variables into a different Linux system with unrelated configuration is, by the own admission of Apptainer developers, a recipe for disaster. It defeats the purpose of using Linux containers for this course, which is to provide students with a well-controlled development environment that is not “contaminated” by the specifics of their local configuration (e.g. ~/.bashrc contents) and thus should not exhibit unpredictable deviations from the expected course workflow.
  • By effectively deleting all user accounts from the container and replacing them with a new one that it made up on the spot, Apptainer makes it impossible to set up containers ahead of time using tools that install and configure software within a user’s home directory. This precludes ahead-of-time installation of a growing number of software packages that either have poor support for system-wide installation or do not provide support for it at all.
  • By making it impossible to install such software ahead of time and to make it cache internet downloads ahead of time, Apptainer makes it hard to comply with the strict network access policies of computing centers, which commonly have issues with cluster nodes accessing the Internet. This is especially ironic when Apptainer advertises itself as being designed to address the need of computing centers.
  • By only providing very poor support for writable filesystem overlays, Apptainer makes it hard to revert its unfortunate user account handling decisions and use the user accounts defined inside of the container as planned. This also breaks core system utilities like ldconfig, which run lazily and write to the filesystem. That is a recipe for unpredictable issues down the line.

These Apptainer design flaws can be worked around through lots of CLI argument and configuration file tuning. But again, this defeats the basic purpose of providing container images for this course, which is to make the life of users easier by making it quicker to set up a working development environment. If manually setting up the required environment is easier than getting the container runtime to use a reasonable configuration, then there is no point in using containers, a local setup will achieve a better result at a lower resource cost.

Thus, after careful consideration, the Apptainer and Singularity container runtimes had to be considered unfit for this course’s need. And for those of any future course by the same author, for that matter. To fully spell out the author’s opinion, these container runtimes are designed in a profoundly misguided way, not fit for any serious purpose, and should never be used as they provide no significant benefit and have many major drawbacks compared to other software packaging solutions. But since the school pledged to provide some support for them, restricting this support to the simpler rust_list container image seemed like the optimal tradeoff.