Building containers from Conda/pip environments¶
Presenter: Julius Roeder (DeiC)
Content:
- Containers from conda/pip environments
- Recipes for PyTorch, Tensorflow, and JAX/Flax on LUMI
Extra materials¶
-
Further reading materials from the slides:
-
The additional training materials mentioned in the "Running containers" page are relevant for this presentation also.
Remarks to things mentioned in the recording¶
ROCm compatibility¶
The compatibility situation is actually even more complicated than explained in this presentation. The kernel driver for the GPUs depends on certain kernel versions. The kernel version depends on the version of the management interface of LUMI. So basically to do the upgrade to a newer ROCm driver, we often need to update nearly everything on LUMI.
Furthermore, we need to ensure that MPI also works. GPU-aware MPI also depends on versions of ROCm and driver. So before updating to a new ROCm version we also need versions of the HPE Programming Environment compatible with those ROCm versions or all users of traditional HPC simulation codes would be very unhappy. That canis also be a factor stopping an update.
What does the tool provided by lumi-container-wrapper do?¶
The lumi-container-wrapper provides a tool that enables to do some pip and conda installations in a file system friendly way. It also uses a base container but that one does not have a ROCm in it so it is of little use for AI software unless you can use the ROCm from the system. It basically does not change the base container, but installs the software in a separate SquashFS file. Furthermore, for each command it can find in the container, it will create a wrapper script outside the container that will call singularity with the right bindings to run that command in the container. It is actually rather hard to start the container "by hand" using the singularity command as you will also have to create the right bindmount for the SquashFS file containing the actual software installation.
The cotainr tool on the other hand will take the selected base image and build a new container from it that can be used the way containers are normally used.
Q&A¶
Remark: MPI was definitely not fully functional in older containers built with cotainr and the base images that were used back then but I guess there are still issues if you install mpi4py with Conda as it may not be using the right MPI library that fully recognises the LUMI network. And using the lumi-c base image from cotainr, I would not expect any change with the old situation. If you have software that uses a recent MPICH binary compatible MPI implementations there are tricks that sometimes work to make MPI work efficiently, but it is not always that simple. (Injecting a library in the container sounds nice on paper, but if the library that you inject is not compatible with the glibc library in the container, it can be very hard to fix that.)
-
What about cotainr with uv?
- Cotainr is fully based on Conda. You can build upon some of the existing container with
uvif you want with the Singularity CE proot unprivileged build process which is discussed in the container lecture in the regular intro course. See this part of the lecture notes of that lecture (but we do not discussuvthere specifically).
- Cotainr is fully based on Conda. You can build upon some of the existing container with