Wrapper-induced issues¶
Tests for Score-P failing on the GPU nodes¶
Reproducing is trivial:
#include <stdlib.h>
#include <omp.h>
#include <inttypes.h>
int main( void )
{
return 0;
}
and compile with LUMI/23.09 partition/G cpeCray
or cpeAMD
.
Answer from Alfio:
-
The problem when using the wrapper
cc
is that the module craype-accel-amd-gfx90a promotes OpenMP to be offloaded to the target GPU. Then, we have to mix with the ROCm files and this can be done after the preprocessing of the OpenMP pragmas.You can use the flag
-craype-verbose
to see what command is executed, e.g.$ cc -fopenmp -craype-verbose --preprocess test_with_includes.c clang -march=znver2 -fopenmp-targets=amdgcn-amd-amdhsa -Xopenmp-target=amdgcn-amd-amdhsa -march=gfx90a -dynamic -D__CRAY_X86_ROME -D__CRAY_AMD_GFX90A -D__CRAYXT_COMPUTE_LINUX_TARGET --gcc-toolchain=/opt/cray/pe/gcc/10.3.0/snos -isystem /opt/cray/pe/cce/16.0.1/cce-clang/x86_64/lib/clang/16/include -isystem /opt/cray/pe/cce/16.0.1/cce/x86_64/include/craylibs -Wl,-rpath=/opt/cray/pe/cce/16.0.1/cce/x86_64/lib -fopenmp --preprocess test_with_includes.c -I/opt/cray/pe/libsci/23.09.1.1/CRAY/12.0/x86_64/include -I/opt/cray/pe/mpich/8.1.27/ofi/cray/14.0/include -I/opt/cray/pe/dsmml/0.2.2/dsmml//include -I/opt/cray/xpmem/2.5.2-2.4_3.50__gd0f7936.shasta/include -L/opt/cray/pe/libsci/23.09.1.1/CRAY/12.0/x86_64/lib -L/opt/cray/pe/mpich/8.1.27/ofi/cray/14.0/lib -L/opt/cray/pe/mpich/8.1.27/gtl/lib -L/opt/cray/pe/dsmml/0.2.2/dsmml//lib -L/opt/cray/pe/cce/16.0.1/cce/x86_64/lib/pkgconfig/../ -L/opt/cray/xpmem/2.5.2-2.4_3.50__gd0f7936.shasta/lib64 -Wl,--as-needed,-lsci_cray_mpi_mp,--no-as-needed -Wl,--as-needed,-lsci_cray_mp,--no-as-needed -ldl -Wl,--as-needed,-lmpi_cray,--no-as-needed -lmpi_gtl_hsa -Wl,--as-needed,-ldsmml,--no-as-needed -lxpmem -Wl,--as-needed,-lstdc++,--no-as-needed -Wl,--as-needed,-lpgas-shmem,--no-as-needed -lquadmath -lmodules -lfi -lcraymath -lf -lu -lcsup -Wl,--as-needed,-lpthread,-latomic,--no-as-needed -Wl,--as-needed,-lm,--no-as-needed -Wl,--disable-new-dtags
As you said, it works if you don't directly use the wrappers (or you remove OpenMP, or if you don't load the GPU target module).
My suspicious is that the problem is due to LLVM (<=16) compatibility.
-
For the CCE, we can apply the following trick:
cc -fno-cray -fopenmp --preprocess test_with_includes.c > test_with_includes.prep.offload.c
Here, the
-fno-cray
gives you the vanilla LLVM compiler without specific Cray optimizations. Note that we need this trick only for the CCE 16.0.1 installed on LUMI, I tried with CCE 17+ and we don't need this flag anymore.The next step is to compile the preprocessor output:
$ cc -fno-cray -fopenmp --preprocess test_with_includes.c > test_with_includes.prep.offload.c && cc -nostdinc -fopenmp test_with_includes.prep.offload.c
In this case I'm using the flag
-nostdinc
to avoid the compiler to reintroduce implicit headers that will conflict with the existing headers of the preprocessor output. -
Concerning the AMD compiler, the version installed on LUMI relies on LLVM 14 (Rocm 5.2) and it gives you the same problem (even if you specify
-nostdinc
). Also in this case, I've tested it with a new ROCM version (6.1) and it works. I don't have a workaround for the version installed on LUMI though, the suggestion can be to directly use amdclang as you already did (you can use the flag--cray-print-opts
for the wrappers to get a list of the options and then exclude the OpenMP offload ones, seeman cc
).