[slurm-users] introduce short delay starting multiple parallel jobs with srun
Yakupov, Renat /DZNE
Renat.Yakupov at dzne.de
Thu Nov 9 07:09:17 MST 2017
Dear SLURM users,
I would like some suggestions on how to spread out in time the start of multiple parallel jobs with srun.
I have a very basic script which specifies number of nodes and tasks with just one command: srun myjob. The problem is that 10-20 tasks start accessing files at the same time, and that causes some tasks to quit.
What I would like to do is somehow tell SLURM to start each task with a delay, like next task 5 seconds after the previous one. What I have tried so far:
1) Using a random number generator helps, but it is not 100% safe.
2) If tasks run 1 per node, I can use node hostnames, but that doesnt help if I run all tasks on one node.
3) Parallel module has an option to delay the start, but we dont have it available.
Is there a way to get a task number? I know there is SLURM_ARRAY_TASK_ID variable, but all job array related variables dont work for me. I guess, job array capabilities arent enable on our SLURM.
Any other suggestions?
Thanks in advance!
More information about the slurm-users