Skip to content

GPU binding issues

Nature of the problem

When using per-task cgroups, P2P IPC doesn't work as a communication mechanism as GPUs cannot find each other. It is used, e.g., by Cray MPICH for all but very short messages when communicating directly between GPUs in a node.

See also the schedMD case #17875.

Setups that work

TODO