[slurm-users] Pending (Resources) when nodes are available

Brian Andrus toomuchit at gmail.com
Fri Jun 14 16:57:02 UTC 2019


All,

We have a cluster that is using Azure and nodes are started up as needed.

I have encountered an interesting situation where a user did a loop to 
launch 100 jobs using srun. Simple job to just do an 'id' command for 
testing.

The intention was to have 100 jobs on 100 machines. The partition has 
125 nodes configured for it. There are no limits/qos/etc to constrain them.

However, slurm only starts up 50 of the nodes, puts one job in Pending 
(Resources) and the others Pending (Priority).

I am unable to find the cause.  Is there a limit on how many nodes of 
the ResumeProgam is passed to bring up at once? ResumeRate is the 
default 300.

Brian Andrus





More information about the slurm-users mailing list