3.Using the batch job environment

At CSC, batch job systems are used to execute computing tasks in clusters and supercomputers. The Sisu supercomputer uses SLURM(Simple Linux Utility for Resource Management System) batch job system, which is used in Taito cluster too. However, the SLURM setup differs significantly between Sisu and Taito. In Sisu, the SLURM definitions are used just to reserve computing resources. The parallel commands are launched using ALPS (Application Level Placement Scheduler) and the ALPS command aprun is used instead of the srun command which is normally used in SLURM batch jobs. The aprun command is also used to define many resource allocation details that in other severs are defined using the SLURM commands.

Table 3.1 Batch job partitions in Sisu

Partition Minimum number of nodes Maximum number of nodes Maximum number of cores Maximum running time Notes
 test  1  24  576  30 minutes  
 test_large  1  800  19200  4 hours Jobs in this partition have very low priority.
 small  3  24  576  12 hours  
 small_long  3  24  576  72 hours  
 large  24  400  9600  72 hours Scaling tests are required for jobs that use more than 42 nodes
 gc         Special queue for grand challenge jobs. Limits will be set based of the needs of the projects.