Container demo 1: Fooocus¶
Fooocus is an AI-based image generating package that is available under the GNU General Public License V3.
The version on which we first prepared this demo, insists on writing in the directories with some of the Fooocus files, so we cannot put Fooocus in a container at the moment.
It is based on PyTorch. However, we cannot use the containers provided on LUMI as-is as additional system level libraries are needed for the graphics.
This demo shows:
-
Installing one of the containers provided on LUMI with EasyBuild,
-
Installing additional software in the container with the SingularityCE "unprivileged proot builds" process and the SUSE Linux
zypper
install tool, -
Further adding packages in a virtual environment and putting them in a SquashFS file for better file system performance, and
-
Using that setup with Fooocus.
Video of the demo¶
Step 1: Checking Fooocus¶
Let's create an installation directory for the demo. Set the environment variable
installdir
to a proper value for the directories on LUMI that you have access to.
installdir=/project/project_465001102/kurtlust/DEMO1
mkdir -p "$installdir" ; cd "$installdir"
We are now in the installation directory of which we also ensured its existence first. Let's now download and unpack Fooocus release 2.3.1 (the one we tested for this demo)
fooocusversion=2.3.1
wget https://github.com/lllyasviel/Fooocus/archive/refs/tags/$fooocusversion.zip
unzip $fooocusversion.zip
rm -f $fooocusversion.zip
If we check what's in the Fooocus directory:
ls Fooocus-$fooocusversion
we see a rather messy bunch of mostly Python files missing the traditional setup scripts that you expect with a Python package. So installing this could become a messy thing...
It also contains a Dockerfile
(to build a base Docker container), a requirements_docker.txt
and a requirements_versions.txt
file that give hints about what exactly is needed.
The Dockerfile
suggests close to the top that some OpenGL libraries will be needed.
And the fact that it can be fully installed in a docker container also indicates that there
must in fact be ways to run it in readonly directories, but in this demo we'll put Fooocus in
a place were it can write. The requirements_docker.txt
file also suggests to use Pytorch 2.0, but
we'll take some risks though and use a newer version of PyTorch than suggested as for AMD GPUs
it is often important to use recently enough versions (and because that version has a more sophisticated
module better suited for what we want to demonstrate).
Step 2: Install the PyTorch container¶
We can find an overview of the available PyTorch containers on the PyTorch page in the LUMI Software Library. We'll use a version that already has support for Python virtual environments built in as that will make it a lot easier to install extra Python packages. Moreover, as we have also seen that we will need to change the container, we'll follow a somewhat atypical build process.
Rather than installing directly from the available EasyBuild recipes, we'll edit an EasyConfig to change the name to reflect that we have made changes and installed Fooocus with it. First we must prepare a temporary directory to do this work and also set up EasyBuild:
mkdir -p "$installdir/tmp" ; cd "$installdir/tmp"
module purge
module load LUMI/23.09 partition/container EasyBuild-user
We'll now use a function of EasyBuild to copy an existing EasyConfig file to a new location, and rename it in one move to reflect the module version that we want:
eb --copy-ec PyTorch-2.2.0-rocm-5.6.1-python-3.10-singularity-20240315.eb PyTorch-2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315.eb
This is not enough to generate a module PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
,
we also need to edit the versionsuffix
line in the EasyBuild recipe. Of course you can do this easily with
your favourite editor, but to avoid errors we'll use a command for the demo that you only need to copy:
sed -e "s|^\(versionsuffix.*\)-singularity-20240315|\1-Fooocus-singularity-20240315|" -i PyTorch-2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315.eb
Let's check:
grep versionsuffix PyTorch-2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315.eb
which returns
versionsuffix = f'-rocm-{local_c_rocm_version}-python-{local_c_python_mm}-Fooocus-singularity-20240315'
so we see that the versionsuffix
line looks rather strange but we do see that the -Fooocus-
part is injected
in the name so we assume everything is OK.
We're now ready to install the container with EasyBuild:
eb PyTorch-2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315.eb
The documentation in the PyTorch page in the LUMI Software Library suggests that we can now delete the container file in the installation directory, but this is a bad idea in this case as we want to build our own container and hence will not use one of the containers provided on the system while running.
We're now finished with EasyBuild so don't need the modules related to EasyBuild anymore. So lets's clean the environment an load the PyTorch container module that we just built with EasyBuild:
module purge
module load LUMI/23.09
module load PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
Notice that we don't need to load partition/container
anymore. Any partition would do, and in fact, we can
even use CrayEnv
instead of LUMI/23.09
.
Notice that the container module provides the environment variables SIF
and SIFPYTORCH
, both of which
point to the .sif
file of the container:
echo $SIF
echo $SIFPYTORCH
We'll make use of that when we add SUSE packages to the container.
Step 3: Adding some SUSE packages¶
To update the singularity container, we need three things.
First, the PyTorch
module cannot be loaded as it sets a number of singularity-related
environment variables. Yet we want to use the value of SIF
, so we will simply save it in
a different environment variable before unloading the module:
export CONTAINERFILE="$SIF"
module unload PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
Second, the proot
command is not available by default on LUMI, but it can be enabled by loading
the systools
module in LUMI/23.09
or newer stacks, or systools/23.09
or newer in CrayEnv
:
module load systools
Third, we need a file defining the build process for singularity. This is a bit technical and outside the scope of this tutorial to explain what goes into this file. It can be created with the following shell command:
cat > lumi-pytorch-rocm-5.6.1-python-3.10-pytorch-v2.2.0-Fooocus.def <<EOF
Bootstrap: localimage
From: $CONTAINERFILE
%post
zypper -n install -y Mesa libglvnd libgthread-2_0-0 hostname
EOF
You can check the file with
cat lumi-pytorch-rocm-5.6.1-python-3.10-pytorch-v2.2.0-Fooocus.def
We basically install an OpenGL library that emulates on the CPU and some missing tools. Note that the AMD MI250X GPUs are not rendering GPUs, so we cannot run hardware accelerated rendering on them.
An annoying element of the singularity build procedure is that it is not very friendly for a Lustre filesystem. We'll do the build process on a login node, where we have access to a personal RAM disk area that will also be cleaned automatically when we log out, which is always useful for a demo. Therefore we need to set two environment variables for Singularity, and create two directories, which is done with the following commands:
export SINGULARITY_CACHEDIR=$XDG_RUNTIME_DIR/singularity/cache
export SINGULARITY_TMPDIR=$XDG_RUNTIME_DIR/singularity/tmp
mkdir -p $SINGULARITY_CACHEDIR
mkdir -p $SINGULARITY_TMPDIR
Now we're ready to do the actual magic and rebuild the container with additional packages installed in it:
singularity build $CONTAINERFILE lumi-pytorch-rocm-5.6.1-python-3.10-pytorch-v2.2.0-Fooocus.def
The build process will ask you if you want to continue as it will overwrite the container file,
so confirm with y
. The whole build process may take a couple of minutes.
We'll be kind to our fellow LUMI users and already clean up the directories that we just created:
rm -rf $XDG_RUNTIME_DIR/singularity
Let's reload the container:
module load PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
and do some checks:
singularity shell $SIF
brings us into the container (note that the command prompt has changed).
The command
which python
returns
/user-software/venv/pytorch/bin/python
which shows that the virtual environment pre-installed in the container is indeed active.
We do have the hostname command in the container (one of the packages mentioned in the
container .def
file that we created) as is easily tested:
hostname
and
ls /usr/lib64/*mesa*
shows that indeed a number of MESA libraries are installed (the OpenGL installation that we did).
We can now leave the container with the
exit
command (or CTRL-D key combination).
So it looks we are ready to start installing Python packages...
Step 4: Adding Python packages¶
To install the packages, we'll use the requirements_versions.txt
file which we found in the
Fooocus directories. The installation has to happen from within the container though. So let's got
to the Fooocus directory and go into the container again:
cd "$installdir/Fooocus-$fooocusversion"
singularity shell $SIF
We'll install the extra packages simply with the pip
tool:
pip install -r requirements_versions.txt
This process may again take a few minutes.
After finishing,
ls /user-software/venv/pytorch/lib/python3.10/site-packages/
shows that indeed a lot of packages have been installed. Though accessible from the container,
they are not in the container .sif
file as that file cannot be written.
Let's leave the container again:
exit
Now try:
ls $CONTAINERROOT/user-software/venv/pytorch/lib/python3.10/site-packages/
and notice that we see the same long list of packages. In fact, a trick to see the number of files and directories is
lfs find $CONTAINERROOT/user-software/venv/pytorch/lib/python3.10/site-packages | wc -l
which prints the name of all files and directories and then counts the number of lines, and we see that
this is a considerable number. Lustre isn't really that fond of it. However, the module also provides an
easy solution: We can convert the $EBROOTPYTORCH/user-software
subdirectory into a SquashFS file that can
be mounted as a filesystem in the container, and the module provides all the tools to make this easy to do.
All we need to do is to run
make-squashfs
This will also take some time as the script limits the resources the make-squashfs
can use to keep the
load on the login nodes low.
Now we can then safely remove the user-software
subdirectory:
rm -rf $CONTAINERROOT/user-software
Before continuing, we do need to reload the module so that the bindings between the container and files and directories on LUMI are reset:
module load PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
Just check
singularity exec $SIF ls /user-software/venv/pytorch/lib/python3.10/site-packages
and see that our package installation is still there!
However, we can no longer write in that directory. E.g., try
touch /user-software/test
to create an empty file test
in /user-software
and note that we get an error message.
So now we are ready-to-run.
The reward: Running Fooocus¶
First confirm we'll in the directory containing the Fooocus package (which should be the case if you followed these instructions):
cd "$installdir/Fooocus-$fooocusversion"
We'll start an interactive job with a single GPU:
srun -psmall-g -n1 -c7 --time=30:00 --gpus=1 --mem=60G -A project_465001102 --pty bash
The necessary modules will still be available, but if you are running from a new shell, you can load them again:
module load LUMI/23.09
module load PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
Also check the hostname if it is not part of your prompt as you will need it later:
hostname
We can now go into the container:
singularity shell $SIF
and launch Fooocus:
python launch.py --listen --disable-xformers
Fooocus provides a web interface. If you're the only one on the node using Fooocus, it should run
on port 7865. To access it from our laptop, we need to create an SSH tunnel to LUMI. The precise statement
needed for this will depend on your ssh implementation. Assuming you've define a lumi
rule in the ssh
config file to make life easy and use an OpenSSH-style ssh client, you can use:
ssh -N -L 7865:nid00XXXX:7865 lumi
replacing with the node name that we got from the hostname
command`.
Next, simply open a web browser on your laptop and point to
http://localhost:7865
Alternative way of running¶
We can also launch Fooocus directly from the srun
command, e.g., from the directory containing the
Fooocus code,
module load LUMI/23.09
module load PyTorch/2.2.0-rocm-5.6.1-python-3.10-Fooocus-singularity-20240315
srun -psmall-g -n1 -c7 --time=30:00 --gpus=1 --mem=60G -A project_465001102 --pty \
bash -c 'echo -e "Running on $(hostname)\n" ; singularity exec $SIF python launch.py --listen --disable-xformers'
It will also print the host name on which the Fooocus is running, so you can connect to Fooocus using the same procedure as above.