AMD ROCmTM profiling tools¶
Presenter: Samuel Antao (AMD)
-
Slides:
The slide files are also available on LUMI in
/appl/local/training/profiling-20231122/files
.
The recordings are also available on LUMI in
/appl/local/training/profiling-20231122/recordings
.
Q&A¶
-
Can the tool used also for profiling ML framework, Tensforflow-Horovod
- Yes, omnitrace-python is the driver to be used in these cases ot see the Python call stack alongside the GPU activity.
-
In the first set of slides it was mentioned that rocmprof serialize the kernels execution. How does this affect the other tools? Is it possible to use the tools to profile a program that launches multiple kernels on different streams or even in different processes and see the overal performance?
- No, rocprof does not serialize kernels, what I tried to explain is that users should serialize kernels for counter readings to be meaningful.
-
Could you check the slide about installing of omniperf? I see different path in CMAKE_INSTALL_PREFIX and "export PATH". There is dependencies: Python 3.7 but default on Lumi is Python 3.6, which module is the best for that (e.g. cray-python)?
-
Cray-python should be fine. The exported PATH is a typo, it should be:
export PATH=$INSTALL_DIR/1.0.10/bin:$PATH
. For the exercises we use the following to provide omniperf for ROCm 5.4.3:module use /pfs/lustrep2/projappl/project_462000125/samantao-public/mymodules module load rocm/5.4.3 omniperf/1.0.10-rocm-5.4.x source /pfs/lustrep2/projappl/project_462000125/samantao-public/omnitools/venv/bin/activate
-