Slurm issues on LUMI¶
Note: Use sbatch --version
to check the version of Slurm.
Wrong allocations on small-g when requesting 1 CPU per task¶
Observed on Slurm 22.05.8.
When requesting a GPU allocation requesting only 1 CPU per task with --cpus-per-task=1
and requesting GPUs with --gpus-per-task
, we get invalid allocations at least when
the job has to span multiple nodes. The problems disappear as soon as a value larger
than 1 is used for --cpus-per-task
.
Sample job script showing the bug
#! /bin/bash
#SBATCH --job-name=map-smallg-1gpt-error
#SBATCH --output %x-%j.txt
#SBATCH --partition=small-g
#SBATCH --ntasks=12
#SBATCH --cpus-per-task=1
#SBATCH --gpus-per-task=1
#SBATCH --hint=nomultithread
#SBATCH --time=5:00
module load LUMI/22.12 partition/G lumi-CPEtools/1.1-cpeCray-22.12
echo "Requested resources as reported through SLURM_ variables:"
echo "- SLURM_NTASKS: $SLURM_NTASKS"
echo "- SLURM_CPUS_PER_TASK: $SLURM_CPUS_PER_TASK"
echo "- SLURM_GPUS_PER_TASK: $SLURM_GPUS_PER_TASK"
echo "Distribution based on SLURM_ variables:"
echo "- SLURM_JOB_NUM_NODES: $SLURM_JOB_NUM_NODES"
echo "- SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST"
echo "- SLURM_TASKS_PER_NODE: $SLURM_TASKS_PER_NODE"
echo "- SLURM_JOB_CPUS_PER_NODE: $SLURM_JOB_CPUS_PER_NODE"
echo
echo "Control: All SLURM_ and SRUN_ variables:"
env | egrep ^SLURM_
env | egrep ^SRUN_
echo
echo -e "Control: Job script\n\n======================================================"
cat $0
echo -e "======================================================\n"
set -x
srun -n $SLURM_NTASKS -c $SLURM_CPUS_PER_TASK --gpus-per-task=$SLURM_GPUS_PER_TASK gpu_check -l
set +x
/bin/rm -f select_gpu_$SLURM_JOB_ID echo_dev_$SLURM_JOB_ID
Workaround: Use a value larger than 1 for --cpus-per-task
. A pure MPI application will not use
the additional CPU cores, but since in an ideal case a CCD should not be used by more than 1 task unless
the GPU is also shared by multiple tasks.