[slurm-users] Limit on number of nodes user able to request

Brian Andrus toomuchit at gmail.com
Thu Apr 1 18:50:53 UTC 2021


For this one, you want to look closely at the job. Is it targeting a 
specific partition/nodelist?

See what resources it is looking for (scontrol show job <jobid>)
Also look at the partition limits as well as any QOS items (if you are 
using them).

Brian Andrus

On 4/1/2021 10:00 AM, Sajesh Singh wrote:
>
> Some additional information after enabling debug3 on slurmctld daemon:
>
> Logs show that there are enough usable nodes for the job:
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-11
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-12
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-13
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-14
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-15
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-16
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-17
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-18
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-19
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-20
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-21
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-22
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-23
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-24
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-25
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-26
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-27
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-28
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-29
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-30
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-31
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-32
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-33
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-34
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-35
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-36
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-37
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-38
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-39
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-40
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-41
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-42
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-43
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-44
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-45
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-46
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-47
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-48
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-49
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-50
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-51
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-52
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-53
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-54
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-55
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-56
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-57
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-58
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-59
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-60
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-61
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-62
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-63
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-64
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-65
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-66
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-67
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-68
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-69
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-70
>
> [2021-04-01T10:39:14.400] debug2: found 1 usable nodes from config 
> containing node-71
>
> But then the following line is in the log as well:
>
> debug3: select_nodes: JobId=67171529 required nodes not avail
>
> --
>
> -Sajesh-
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> *On Behalf 
> Of *Sajesh Singh
> *Sent:* Thursday, March 25, 2021 9:02 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Limit on number of nodes user able to request
>
> *EXTERNAL SENDER*
>
> No nodes in downed or drained state. These are nodes that are 
> dynamically brought up and down via the powersave plugin. When the are 
> taken offline due to being idle I believe the state is set to FUTURE 
> by the powersave plugin.
>
> -Sajesh-
>
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com 
> <mailto:slurm-users-bounces at lists.schedmd.com>> *On Behalf Of *Brian 
> Andrus
> *Sent:* Wednesday, March 24, 2021 11:02 PM
> *To:* slurm-users at lists.schedmd.com <mailto:slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] Limit on number of nodes user able to request
>
> *EXTERNAL SENDER*
>
> Do 'sinfo -R' and see if you have any down or drained nodes.
>
> Brian Andrus
>
> On 3/24/2021 6:31 PM, Sajesh Singh wrote:
>
>     Slurm 20.02
>
>     CentOS 8
>
>     I just recently noticed a strange behavior when using the
>     powersave plugin for bursting to AWS. I have a queue configured
>     with 60 nodes, but if I submit a job to use all of the nodes I get
>     the error:
>
>     (Nodes required for job are DOWN, DRAINED or reserved for jobs in
>     higher priority partitions
>
>     If I lower the job to request 50 nodes it gets submitted and runs
>     with no problems. I do not have and associations or QOS limits in
>     place that would limit the user. Any ideas as to what could be
>     causing this limit of 50 nodes to be imposed?
>
>     -Sajesh-
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210401/5c0bc24b/attachment.htm>


More information about the slurm-users mailing list