3.4 Hybrid parallel jobs

In some cases it is useful to use both MPI and threads-based parallelisation in the same task. cp2k is an example of a software that can use this kind of hybrid parallelization. Below is an example of a batch job file that can be used to launch a cp2k job. Note, that for cp2k it's usually best not to use hybrid parallelization.
 

Hybrid parallel job example 1.

!/bin/bash
#SBATCH -t 00:10:00
#SBATCH -J CP2K
#SBATCH -o ocp2k.%j
#SBATCH -e ecp2k.%j
#SBATCH -p test
#SBATCH -N 4
#SBATCH --no-requeue
# here we just ask SLURM for N full nodes, and tell below (to ALPS/aprun) how to use them

module load cp2k

export OMP_NUM_THREADS=12

aprun -n 8 -d 12 -N 2 -S 1 -ss -cc numa_node cp2k.psmp H2O-32.inp > H2O-32_n8d8.out

# -n 8 -> 8 mpi tasks
# -d 12 -> 12 threads per mpi task
#  (n*d must be equal total number of CORES requested. 8x12 = 4x24 in this case)
# -N 2 -> 2 mpi tasks per node
# -S 1 -> number of mpi tasks per NUMA node
# -ss -> let threads use memory only from their own CPU (numa node)
# -cc numa_node -> threads of one mpitask are assigned to the same physical CPU (numa node)
# you may also try -cc cpu (which is the default)

 

In this example, four nodes are reserved ( -N 4 ) for ten minutes. The actual job consist of 8 MPI processes each using twelve threads, respectively. Note that you have to set both OMP_NUM_THREADS environment variable and the aprun -d variable. The total number of cores used in job will use is 8*12 = 96 cores, which must be equal to the allocated cores from SLURM (4*24).  More information about compiling and runnig hybrid jobs can be found form in Chapter 4.4 Shared memory and hybrid parallelization. If you are uncertain of the correct options, contact CSC Service Desk. Note that for cp2k software used in this example, mpi-only parallelization is often more efficient than the hybrid parallelization above.

    Previous chapter     One level up     Next chapter