Advanced Placement¶
Presenter: Jean-Yves Vet (HPE)
Archived materials on LUMI:
-
Slides:
/appl/local/training/4day-20241028/files/LUMI-4day-20241028-2_01_Advanced_Application_Placement.pdf
-
Recording:
/appl/local/training/4day-20241028/recordings/2_01_Advanced_Application_Placement.mp4
These materials can only be distributed to actual users of LUMI (active user account).
Remark
The lumi-CPEtools
module
contains several tools mentioned in the presentation.
Note that these commands also have various command line options that were not mentioned and show more information about the actual binding.
Q&A¶
-
In the slide with the
lscpu | grep -Ei "CPU\ "
we saw this distribution of NUMA nodes CPUs. What do the 4 last rows refer to? e.g. "NUMA node0 CPU(s): 0-15, 64-79"- (alfio) these are the core ids, so node0 has cores with ids 0 to 15 and then 64 to 79 (hyperthread cores)
Ah okay, so it does not have to do with the distance of those nodes between each other, right?
- No, the "relative" distance is the command
numactl -H
`
-
Does
--exclusive
flag make the available memory on each node be distributed equally among CPUs defined by--cpus-per-node
? Or do we still need to set it via--mem-per-cpu
or something like that?-
(alfio) exclusive is only to get the entire node, without sharing with other users.
-
(Kurt) Defaults set by the sysadmins still apply, so you may want to use, e.g., '--mem=0' to get all CPU memory on the node, or even better,
--mem=224G
or--mem=480G
for regular LUMI-C and LUMI-G nodes respectively as that would protect you from getting nodes where a system memory leak may have reduced the amount of memory available to a user.Memory is always a pool per node and not limited per task. This is in fact essential to make communication through shared memory possible and is also why
--gpu-bind
in Slurm does not work: It currently creates a so-called cgroup per GPU which makes memory-to-memory communication impossible.
-
-
Why is
gpu_check
reporting Bus_ID dc? That's a PCI bridge, not a GPU (the GPU is de). Is it just a output issue in thegpu_check
binary?-
(Kurt) I'd have to check the code to see how the Bus_id is determined. It is basically the code of the ORNL hello_jobstep program a bit reworked with some extra output (but the determination of the ID is taken from the hello_jobstep code if I remember correclty). This will be something to look into after the course. It is strange that it is only for this GPU, the other ones correspond to what I see in
lstopo
.Bug located, just a constant in the code with the wrong value. It will be fixed next week or the week after, at the next update of the software stack.
-