Your first training job on LUMI¶

Presenters: Mats Sjöberg (CSC) and Marlon Tobaben (CSC)

Content:

Extra materials¶

Presentation slides
Hands-on exercises
More extensive training materials on Slurm from the recent introductory "Supercomputing with LUMI" course from April 2026
- A more detailed introduction to Slurm but without AI-specific examples is given in the "Slurm on LUMI" presentation. It also discusses the sacct command that can be used to get at least some resource use info from jobs.
- The presentation "Process and Thread Distribution and Binding" is more oriented towards traditional HPC codes, but the discussion on a proper mapping of GPU dies onto CPU chiplets is also relevant for AI applications. But that is a discussion for the second day of this course/workshop

Is the --mem-per-gpu parameter required for the SLURM batch script? If you do not specify this, do you get all available memory for one gpu?
- What you get depends on the partition. You should always specify how much RAM you need just to be safe. On standard-g you get all the memory, but on small-g you do not and may get much less than you expect. It is a fixed quantity independent of the amount of GPUs you ask. Another reason for a proper memory request is that it can protect you from getting a node where memory is taken by, e.g., a memory leak. We had that in the early years of LUMI.
How to add log files parameters to the bash file ?
- You can redirect output to a file by adding in your slurm script:
  - To redirect stdout: #SBATCH -o <name of output file>
  - To redirect stderr: #SBATCH -e <name of error output file>