3.3 Using aprun to execute parallel processes
The command aprun is a Cray Linux Environment utility which launches the executable on compute nodes. This command is analogous to the SLURM command srun, which should not be used in Sisu. The following table lists the most important options for aprun. For a complete description of options see the manual page aprun(1).
man aprunTable 3.4 Most important aprun options.
|-n PEs||The number of processing elements (PEs, in Cray terminology), often same as number of cores needed by an application. In Sisu it defines the number of MPI tasks. The default is 1.|
|-N PEs_per_node||The number of PEs per node, at Sisu: number of MPI tasks per compute node. Not to be confused with -N option of sbatch, which has completely different meaning.|
|-m size||Specifies the memory per PE required, at Sisu: memory required per MPI task. Resident Set Size memory size in megabytes. K, M, and G suffixes are supported (16M = 16 megabytes, for example). Any truncated or full spelling of unlimited is recognized.|
|-d depth||The number of threads per PE, at Sisu: number of OpenMP threads per MPI task. The default is 1.|
|-j num_cpus|| |
Specifies how many CPUs to use per compute unit for an ALPS job, at Sisu: number of logical CPU cores per a physical CPU core. In sisu, at most 2 logical cores are available per a single physical core. If the user does not want to use logical CPU cores, a setting of -j 1 is recommended.
|-L node_list||The node_list specifies the candidate nodes to constrain application placement. The syntax allows a comma-separated list of node IDs (node,node,...), a range of nodes (node_x- node_y), and a combination of both formats|
|-e ENV_VAR=value||Set an environment variable on the compute nodes, must use format VARNAME=value. To set multiple environment variables use multiple -e arguments.|
|-S PEs_per_NUMA_node||The number of PEs per NUMA node, at Sisu: number of MPI tasks per NUMA node. Each compute node in Sisu has two sockets and one socket has one 12-core processor. Each socket is a NUMA node that contains a 12-core processor and its local NUMA node memory.|
|-ss||Request strict memory containment per NUMA node, i.e. to allocate memory only from local NUMA node.|
|-cc CPU_binding||Controls how tasks are bound to cores and NUMA nodes. -cc none means that MPI affinity is not needed. -cc numa_node constrains MPI tasks and OpenMP threads to local NUMA node. Default is -cc cpu that is the typical use case for an MPI job (it binds each MPI task to a core).|
#!/bin/bash -l #SBATCH -N 6 #SBATCH -J cputest #SBATCH -p small #SBATCH -o /wrk/username/%J.out #SBATCH -e /wrk/username/%J.err aprun -n 144 /wrk/username/my_program
|Previous chapter||One level up||Next chapter|