frontier

Frontier and Crusher at Oak Ridge National Laboratory

Currently, OLCF has the same software stack on Frontier and Crusher (since the latter is a development version of the former). So, the same configuration works for both machines. (Although, note that account numbers on Crusher are appended with _crusher.)

Compiling

The SLATE repo can be cloned as

git clone --recursive https://github.com/icl-utk-edu/slate.git

Currently, the GCC stack appears to be the most robust. It can be configured as:

module purge
module load craype-accel-amd-gfx90a craype-x86-trento
module load PrgEnv-gnu rocm
export MPICH_GPU_SUPPORT_ENABLED=1
export CPATH=${ROCM_PATH}/include:${CPATH}
export LIBRARY_PATH=${ROCM_PATH}/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH="${CRAY_LD_LIBRARY_PATH}:${LD_LIBRARY_PATH}"

cat > make.inc << END
CXX=CC
FC=ftn
CXXFLAGS+=-I${ROCM_PATH}/include -craype-verbose -g
LDFLAGS+=-L${ROCM_PATH}/lib -craype-verbose
blas=libsci
gpu_backend=hip
hip_arch=gfx90a
mpi=cray
END

Then, the tester can be compiled with nice make -j 8 tester.

GPU-aware MPI

SLATE can utilize GPU-aware MPI when the SLATE_GPU_AWARE_MPI=1 environement variable is set.

However, there is a known bug when trying to use GPU-aware Cray-MPICH that can result in a hang. So, listBcastMT should not be enabled (as shown above), and the lookahead must be disabled for some routines (including geqrf and getrf_nopiv).

There is an alternative workaround: setting the FI_MR_CACHE_MONITOR=kdreg2 environement variable. However, HPE is still testing this setting, so it should not be used for long-running production jobs. With the setting, it appears that lookaheads can be used as normal and listBcastMT can be enabled (by adding -DSLATE_HAVE_MT_BCAST to CXXFLAGS during compilation). If this setting causes problems, inform Neil so that it can be passed on to HPE.

Running

The SLATE tests can be run as

srun -A $ACCOUNT_NUM -t 5:00 -J slate_example -N 1 -n 8 -c 7 \
     --gpus-per-node=8 --ntasks-per-gpu=1 --threads-per-core=1 \
     --gpu-bind=closest \
     test/tester  --type d --target d --nb 512 posv

Alternatively, the job can be submitted as a batch script.

#!/bin/bash
#SBATCH -A $ACCOUNT_NUM
#SBATCH -t 5:00
#SBATCH -J slate_example
#SBATCH -o %x-%j.out
#SBATCH -N 1

srun -n 8 -c 7 --gpus-per-node=8 --ntasks-per-gpu=1 --threads-per-core=1 \
     --gpu-bind=closest \
     test/tester --target d --type d --nb 512 posv

Tuning notes

For getrf, Cray's OpenMP doesn't support the multi-threaded panel factorization. So, the panel thread count should be set to 1 (--panel-threads for the tester or slate::Option::MaxPanelThreads for the routine.)

Provide feedback

Saved searches