[slurm-users] Pending (Resources) when nodes are available
Brian Andrus
toomuchit at gmail.com
Fri Jun 14 16:57:02 UTC 2019
All,
We have a cluster that is using Azure and nodes are started up as needed.
I have encountered an interesting situation where a user did a loop to
launch 100 jobs using srun. Simple job to just do an 'id' command for
testing.
The intention was to have 100 jobs on 100 machines. The partition has
125 nodes configured for it. There are no limits/qos/etc to constrain them.
However, slurm only starts up 50 of the nodes, puts one job in Pending
(Resources) and the others Pending (Priority).
I am unable to find the cause. Is there a limit on how many nodes of
the ResumeProgam is passed to bring up at once? ResumeRate is the
default 300.
Brian Andrus
More information about the slurm-users
mailing list