Skip to content

Moving your AI training jobs to LUMI workshop - UiT, Tromsø, June 11-12, 2026

Course organisation

Setting up for the exercises

During the course

Not relevant anymore as the course has ended.

After the termination of the course project

Setting up for the exercises is a bit more elaborate now.

The exercises as they were during the course are available as the tag ai-20260611 in the GitHub repository. Whereas the repository could simply be cloned during the course, now you have to either:

  • Download the content of the repository as a tar file or bzip2-compressed tar file or from the GitHub release where you have a choice of formats,

  • or clone the repository and then check out the tag ai-20260611:

    git clone https://github.com/Lumi-supercomputer/Getting_Started_with_AI_workshop.git
    cd Getting_Started_with_AI_workshop
    git checkout ai-20260611
    

Note also that any reference to a reservation in Slurm has to be removed.

The exercises were thoroughly tested at the time of the course. LUMI is an evolving supercomputer though, so it is expected that some exercises may fail over time, and modules that need to be loaded, will also change as at every update we have to drop some versions of the LUMI module as the programming environment is no longer functional. Likewise it is expected that at some point the ROCm driver on the system may become incompatible with the ROCm versions used in the containers for the course.

Course materials

Course materials will be made available during the course.

Note: Some links in the table below will remain invalid until after the course when all materials are uploaded.

Presentation Slides recording
Welcome and course introduction / video
Introduction to LUMI slides video
Using the LUMI web-interface slides video
Hands-on: Run a simple PyTorch example notebook / video
Your first AI training job on LUMI slides video
Hands-on: Run a simple single-GPU PyTorch AI training job / video
Understanding GPU activity & checking jobs slides video
Hands-on: Checking GPU usage interactively using rocm-smi / video
Running containers on LUMI slides video
Hands-on: Pull and run a container / video
Building containers from Conda/pip environments slides video
Hands-on: Creating a conda environment file and building a container using cotainr / video
Extending containers with virtual environments for faster testing slides video
Scaling AI training to multiple GPUs slides video
Hands-on: Converting the PyTorch single GPU AI training job to use all GPUs in a single node via DDP / video
Extreme scale AI slides video
Hands-on: Extreme scale AI / video
Loading training data on LUMI slides video
Coupling machine learning with HPC simulation slides video
Hands-on: Advancing your project and general Q&A / video