Exercise session 7¶
-
See
/project/project_465000524/slides/HPE/Exercises.pdf
for the exercises. -
Files are in
/project/project_465000524/exercises/HPE/day3
-
Permanent archive on LUMI:
-
Exercise notes in
/appl/local/training/4day-20230530/files/LUMI-4day-20230530-Exercises_HPE.pdf
-
Exercises as bizp2-compressed tar file in
/appl/local/training/4day-20230530/files/LUMI-4day-20230530-Exercises_HPE.tar.bz2
-
Exercises as uncompressed tar file in
/appl/local/training/4day-20230530/files/LUMI-4day-20230530-Exercises_HPE.tar
-
Q&A¶
-
I tried perfools-lite on another example and got the following message from pat-report:
Observation: MPI Grid Detection There appears to be point-to-point MPI communication in a 4 X 128 grid pattern. The 24.6% of the total execution time spent in MPI functions might be reduced with a rank order that maximizes communication between ranks on the same node. The effect of several rank orders is estimated below. No custom rank order was found that is better than the RoundRobin order. Rank Order On-Node On-Node MPICH_RANK_REORDER_METHOD Bytes/PE Bytes/PE% of Total Bytes/PE RoundRobin 1.517e+11 100.00% 0 Fold 1.517e+11 100.00% 2 SMP 0.000e+00 0.00% 1
Normally for this code, SMP rank ordering should make sure that collective communication is all intra-node and inter-node communication is limited to point-to-point MPI calls. So I don't really get why the recommendation is to switch to RoundRobin (if I understand this remark correctly)? Is this recommendation only based on analysing point-to-point communication?
Answer: Yes, you understood the remark correctly. This warning means that Cray PAT detected a suboptimal communication topology and according to the tool estimate, a round-robin rank ordering should maximize intra-node communications. There is a session about that at the beginning of the afternoon.
Reply: I would be very surprised if round-robin rank ordering would be beneficial in this case. I tried to run a job with it, but this failed with:
and similar lines for each task. The job script looks as follows:srun: error: task 256 launch failed: Error configuring interconnect
module load LUMI/22.12 partition/C module load cpeCray/22.12 module load cray-hdf5-parallel/1.12.2.1 module load cray-fftw/3.3.10.3 export MPICH_RANK_REORDER_METHOD=0 srun ${executable}