Documentation links¶
Note that documentation, and especially web based documentation, is very fluid. Links change rapidly and were correct when this page was developed right after the course. However, there is no guarantee that they are still correct when you read this and will only be updated at the next course on the pages of that course.
This documentation page is far from complete but bundles a lot of links mentioned during the presentations, and some more.
Web documentation¶
-
HPE Cray Programming Environment web documentation has only become available in May 2023 and is a work-in-progress. It does contain a lot of HTML-processed man pages in an easier-to-browse format than the man pages on the system.
The presentations on debugging and profiling tools referred a lot to pages that can be found on this web site. The manual pages mentioned in those presentations are also in the web documentation and are the easiest way to access that documentation.
-
Cray PE Github account with whitepapers and some documentation.
-
Cray DSMML - Distributed Symmetric Memory Management Library
-
Clang latest version documentation (Usually for the latest version)
-
Clang 13.0.0 version (basis for aocc/3.2.0)
-
Clang 14.0.0 version (basis for rocm/5.2.3 and amd/5.2.3)
-
Clang 15.0.0 version (cce/15.0.0 and cce/15.0.1 in 22.12/23.03)
-
Clang 16.0.0 version (cce/16.0.0 in 23.09)
-
-
AMD Developer Information. Note that AMD doesn't archive manuals of older versions which can be a problem. You have to reprocess them from GitHub repositories.
-
AOCC 4.0 CompilerOptions Quick Reference Guide (Version 4.0 compilers will come when the 23.05 or later CPE release gets installed on LUMI and the system is updated to COS 2.5 as some libraries are missing in COS 2.4)
-
-
-
rocminfo application for reporting system info.
-
Libraries:
-
Random number generation: rocRAND
-
Iterative solvers: rocALUTION
-
Machine Learning Libraries: MIOpen (similar to cuDNN), Tensile (GEMM Autotuner), RCCL (ROCm analogue of NCCL) and Horovod (Distributed ML)
-
Machine Learning Frameworks: Tensorflow, Pytorch and Caffe
-
Development tools:
-
rocgdb resources:
-
2021 Linux Plumbers Conference presentation with youTube video with a part of the presentation
-
-
-
Mentioned in the Lustre presentation: The ExaIO project paper "Transparent Asynchronous Parallel I/O Using Background Threads".
AMD documentation
AMD doesn't archive documentation for past versions of ROCM and their CPU compilers in a way that is ready-to-read. Instead
Man pages¶
A selection of man pages explicitly mentioned during the course:
-
Compilers
PrgEnv C C++ Fortran PrgEnv-cray man craycc
man crayCC
man crayftn
PrgEnv-gnu man gcc
man g++
man gfortran
PrgEnv-aocc/PrgEnv-amd - - - Compiler wrappers man cc
man CC
man ftn
-
OpenMP in CCE
-
OpenACC in CCE
-
MPI:
-
MPI itself:
man intro_mpi
orman mpi
-
libfabric:
man fabric
-
CXI: `man fi_cxi'
-
-
LibSci
-
man intro_libsci
andman intro_libsci_acc
-
man intro_blas1
,man intro_blas2
,man intro_blas3
,man intro_cblas
-
man intro_lapack
-
man intro_scalapack
andman intro_blacs
-
man intro_irt
-
man intro_fftw3
-
-
DSMML - Distributed Symmetric Memory Management Library
man intro_dsmml
-
Slurm manual pages are also all on the web and are easily found by Google, but are usually those for the latest version.
Via the module system¶
Most HPE Cray PE modules contain links to further documentation. Try module help cce
etc.
From the commands themselves¶
PrgEnv | C | C++ | Fortran |
---|---|---|---|
PrgEnv-cray | craycc --help |
crayCC --help |
crayftn --help |
craycc --craype-help |
crayCC --craype-help |
crayftn --craype-help |
|
PrgEnv-gnu | gcc --help |
g++ --help |
gfortran --help |
PrgEnv-aocc | clang --help |
clang++ --help |
flang --help |
PrgEnv-amd | amdclang --help |
amdclang++ --help |
amdflang --help |
Compiler wrappers | cc --help |
CC --help |
ftn --help |
For the PrgEnv-gnu compiler, the --help
option only shows a little bit of help information, but mentions
further options to get help about specific topics.
Further commands that provide extensive help on the command line:
rocm-smi --help
, even on the login nodes.
Documentation of other Cray EX systems¶
Note that these systems may be configured differently, and this especially applies to the scheduler. So not all documentations of those systems applies to LUMI. Yet these web sites do contain a lot of useful information.
-
Archer2 documentation. Archer2 is the national supercomputer of the UK, operated by EPCC. It is an AMD CPU-only cluster. Two important differences with LUMI are that (a) the cluster uses AMD Rome CPUs with groups of 4 instead of 8 cores sharing L3 cache and (b) the cluster uses Slingshot 10 instead of Slinshot 11 which has its own bugs and workarounds.
It includes a page on cray-python referred to during the course.
-
ORNL Frontier User Guide and ORNL Crusher Qucik-Start Guide. Frontier is the first USA exascale cluster and is built up of nodes that are very similar to the LUMI-G nodes (same CPA and GPUs but a different storage configuration) while Crusher is the 192-node early access system for Frontier. One important difference is the configuration of the scheduler which has 1 core reserved in each CCD to have a more regular structure than LUMI.
-
KTH Dardel documentation. Dardel is the Swedish "baby-LUMI" system. Its CPU nodes use the AMD Rome CPU instead of AMD Milan, but its GPU nodes are the same as in LUMI.
-
Setonix User Guide. Setonix is a Cray EX system at Pawsey Supercomputing Centre in Australia. The CPU and GPU compute nodes are the same as on LUMI.