Exercises 2: Running jobs with Slurm¶
Exercises on the Slurm allocation modes¶
-
Run single task with a job step of
srunusing multiple cpu cores. Inspect default task allocation withtasksetcommand (taskset -cp $$will show you cpu numbers allocated to a current process). Try withstandard-gandsmall-gpartitions. Are there any diffences? You may need to use specific reservation forstandard-gpartition to avoid long waiting.Click to see the solution.
srun --partition=small-g --nodes=1 --tasks=1 --cpus-per-task=16 --time=5 --account=<project_id> bash -c 'taskset -cp $$'Note you need to replace
<project_id>with actual project account ID in a form ofproject_plus 9 digits number.srun --partition=standard-g --nodes=1 --tasks=1 --cpus-per-task=16 --time=5 --account=<project_id> --reservation=<res_id> bash -c 'taskset -cp $$'The command runs single process (
bashshell with a native Linuxtasksettool showing process's CPU affinity) on a compute node. You can useman tasksetcommand to see how the tool works. -
Try Slurm allocations with
hybrid_checktool program from the LUMI Software Stack. The program is preinstalled on the system.Use the simple job script to run parallel program with multiple tasks (MPI ranks) and threads (OpenMP). Test task/threads affinity with
sbatchsubmission on the CPU partition.#!/bin/bash -l #SBATCH --partition=small-g # Partition name #SBATCH --nodes=1 # Total number of nodes #SBATCH --ntasks-per-node=8 # 8 MPI ranks per node #SBATCH --cpus-per-task=6 # 6 threads per task #SBATCH --time=5 # Run time (minutes) #SBATCH --account=<project_id> # Project for billing module load LUMI/22.12 module load lumi-CPEtools srun hybrid_check -n -rBe careful with copy/paste of script body while it may brake some specific characters.
Click to see the solution.
Save script contents into
job.shfile (you can usenanoconsole text editor for instance), remember to use valid project account name.Submit job script using
sbatchcommand.sbatch job.shThe job output is saved in the
slurm-<job_id>.outfile. You can view it's contents with eitherlessormoreshell commands.Actual task/threads affinity may depend on the specific OpenMP runtime but you should see "block" thread affinity as a default behaviour.
-
Improve threads affinity with OpenMP runtime variables. Alter your script and add MPI runtime variable to see another cpu mask summary.
Click to see the solution.
Export
SRUN_CPUS_PER_TASKenvironment variable to follow convention from recent Slurm's versions in your script. Add this line before thehybrid_checkcall:export SRUN_CPUS_PER_TASK=16Add OpenMP environment variables definition to your script:
export OMP_NUM_THREADS=${SRUN_CPUS_PER_TASK} export OMP_PROC_BIND=close export OMP_PLACES=coresYou can also add MPI runtime variable to see another cpu mask summary:
export MPICH_CPUMASK_DISPLAY=1Note
hybrid_checkand MPICH cpu mask may not be consistent. It is found to be confusing. -
Use
gpu_checkprogram tool using interactive shell on a GPU node to inspect device binding. Check on which CCD task's CPU core and GPU device are allocated (this is shown with-loption of the tool program).Click to see the solution.
Allocate resources for a single task with a single GPU with
salloc:salloc --partition=small-g --nodes=1 --tasks=1 --cpus-per-task=1 --gpus-per-node=1 --time=10 --account=<project_id>Note that, after allocation being granted, you receive new shell but still on the compute node. You need to use
srunto execute on the allocated node.You need to load specific modules to access tools with GPU support.
module load LUMI/22.12 partition/Gmodule load lumi-CPEtoolsRun `gpu_check` interactively on a compute node: ``` srun gpu_check -l ```Still remember to terminate your interactive session with
exitcommand.exit
Slurm custom binding on GPU nodes¶
-
Allocate one GPU node with one task per GPU and bind tasks to each CCD (8-core group sharing L3 cache). Use 7 threads per task having low noise mode of the GPU nodes in mind. Use
select_gpuwrapper to map exactly one GPU per task.Click to see the solution.
Begin with the example from the slides with 7 cores per task:
#!/bin/bash -l #SBATCH --partition=standard-g # Partition (queue) name #SBATCH --nodes=1 # Total number of nodes #SBATCH --ntasks-per-node=8 # 8 MPI ranks per node #SBATCH --gpus-per-node=8 # Allocate one gpu per MPI rank #SBATCH --time=5 # Run time (minutes) #SBATCH --account=<project_id> # Project for billing #SBATCH --hint=nomultithread module load LUMI/22.12 module load partition/G module load lumi-CPEtools cat << EOF > select_gpu #!/bin/bash export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID exec \$* EOF chmod +x ./select_gpu export OMP_NUM_THREADS=7 export OMP_PROC_BIND=close export OMP_PLACES=cores srun --cpus-per-task=${OMP_NUM_THREADS} ./select_gpu gpu_check -lYou need to add explicit
--cpus-per-taskoption for srun to get correct GPU mapping. If you save the script in thejob_step.shthen simply submit it with sbatch. Inspect the job output. -
Change your CPU binding leaving first (#0) and last (#7) cores unused. Run a program with 6 threads per task and inspect actual task/threads affinity.
Click to see the solution.
Now you would need to alter masks to disable 7th core of each of the group (CCD). Base mask is then
01111110which is0x7ein hexadecimal notation.Try to apply new bitmask, change the corresponding variable to spawn 6 threads per task and check how new binding works.
#!/bin/bash -l #SBATCH --partition=standard-g # Partition (queue) name #SBATCH --nodes=1 # Total number of nodes #SBATCH --ntasks-per-node=8 # 8 MPI ranks per node #SBATCH --gpus-per-node=8 # Allocate one gpu per MPI rank #SBATCH --time=5 # Run time (minutes) #SBATCH --account=<project_id> # Project for billing #SBATCH --hint=nomultithread module load LUMI/22.12 module load partition/G module load lumi-CPEtools cat << EOF > select_gpu #!/bin/bash export ROCR_VISIBLE_DEVICES=\$SLURM_LOCALID exec \$* EOF chmod +x ./select_gpu CPU_BIND="mask_cpu:0x7e000000000000,0x7e00000000000000," CPU_BIND="${CPU_BIND}0x7e0000,0x7e000000," CPU_BIND="${CPU_BIND}0x7e,0x7e00," CPU_BIND="${CPU_BIND}0x7e00000000,0x7e0000000000" export OMP_NUM_THREADS=6 export OMP_PROC_BIND=close export OMP_PLACES=cores srun --cpu-bind=${CPU_BIND} ./select_gpu gpu_check -l