ROCm miscellaneous issues¶
Driver-version compatibility¶
5.2.3 driver version¶
-
ROCm 5.4.3 appears to work
-
ROCm 5.6.1:
- According to Samuel there are some known issues with queries about available memory being off but it hasn't been a blocker for apps.
- Peter mentioned that he couldn't get Neko to work with ROCm 5.,6, with a failure in
hipDeviceGetStreamPriorityRange
right at the start. - The performance of GPU-aware MPI can be very bad in combination with the Cray MPICH modules on the system in early 2024.
rocFFT crashes¶
In some cases, crashes in rocFFT are due to cache overwrite. One workaround that worked in DestinE is using
export ROCFFT_RTC_CACHE_PATH=/dev/null
export ROCFFT_RTC_CACHE_PATH=/tmp