Back

Why is my batch job queueing so long?

Queueing is inevitable when there are more jobs than resources. CSC uses the fair share prioritization algorithm in SLURM. It means, that the more resources you've used recently, the lower the priority of your next jobs will be. The priority of the jobs will increase as they queue, and eventually they will be run.

You can check the load status from the SCientist's User interface Host Monitor: https://sui.csc.fi/web/guest/host-monitor The Taito situation might show that there are resources available, but in practice that might not be the case. The Cloud nodes cannot be allocated to batch jobs, and some jobs might have reserved so much memory per core, that those nodes cannot be used to run more jobs.

You can also check the current situation of running and pending jobs with the squeue -command.

In general, if you want your jobs to queue as little as possible, it is a good idea to reserve only those resources that the jobs really need. Computing time is not so critical here (unless it is really short, like less than 30 minutes or so, in which case the backfiller might find a slot for your job before it would run due to its actual priority), but requesting too much memory will surely make the job queue longer.

You might want to check FAQ entry: How to estimate how much memory my batch job needs