3. Batch jobs

CSC uses batch job systems to execute computing tasks in clusters and supercomputers. In this chapter we provide introduction to the SLURM (Simple Linux Utility for Resource Management) batch job system that is used in Taito supercluster.

Batch job systems are essential for effective usage of large computing servers. First of all, the batch job system takes care that the server does not get overloaded: Users can submit large amounts of jobs to be executed and the batch job system takes automatically care that optimal number of jobs are running, while rest of the jobs are queueing until sufficient resources are available. Further, most of the batch job systems have a "fair share" functionalities that take care that, on the long run, all the users get equal possibilities to use resources. For example in a case where user A has submitted 500 jobs before user B submits his job, the user B don't have to wait that all the jobs of user A have been processed. Instead, the batch job system gives higher priority to the job of user B compared to user A, as user A is already using much more computing resources that user B.

When a batch job system is used, the commands to be executed are not started immediately like in normal interactive usage. Instead the user creates a file that contains the Linux commands to be executed. In addition to the commands, this so called batch job file normally contains information about the resources that the job needs ( for example: required computing time, memory and number of cores ). The batch job file is submitted to the batch job system with a job submission command. After that the batch job system checks the resource requirements of the job, sends the job to a suitable queue and starts the job when sufficient resources are available. If the job exceeds the requested values (e.g. requires more computing time than what was requested) the batch job system kills the job. After job submission, user can follow the progress of the job or cancel the job if needed.

 

Previous chapter     One level up     Next chapter