[slurm-users] Resource assignment problems.
blspcy at mst.edu
Wed Feb 21 12:51:22 MST 2018
I'm running Slurm 15.08 and I'm having a problem I'm a bit confused about why it is happening. I have a user that is submitting asking for 64 tasks, the default system wide is 1 cpu per task. But the user is getting varied numbers of cpus for each task, sometimes as few as 2 sometimes all 64.
Here is an example of one with too few resources given.
Output from scontrol show job, with userdata removed.
Priority=266 Nice=0 Account=mechanical QOS=normal WCKey=*default
JobState=COMPLETING Reason=NonZeroExitCode Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=255:0
RunTime=00:01:12 TimeLimit=6-16:00:00 TimeMin=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
NumNodes=1 NumCPUs=38 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=3500M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Excerpt from the slurmctld log where the job started...
[2018-02-21T11:57:48.305] sched: Allocate JobID=882939 NodeList=edrcompute-22-12,edrcompute-43-3 #CPUs=64 Partition=free
That seems to suggest that it should have gotten 64 cpus, but the scontrol output, and the fact that the job fails because there aren't the right number of threads for mpi to run on seems to suggest that it is indeed getting the number of cpus listed in scontrol not the number suggested by the tres, or the scheduler log. Here is the head of the user's submission file it's pretty basic but it should give them 64 tasks, which should give them 64 cpus.
#SBATCH -J Vasp
Thanks for your input,
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users