[slurm-users] slurm jobs are pending but resources are available
Marius.Cetateanu at sony.com
Marius.Cetateanu at sony.com
Mon Apr 16 04:35:16 MDT 2018
Hi,
I'm having some trouble with resource allocation in the sense that according to how I understood
the documentation and applied that to the config file I am expecting some behavior that does not happen.
Here is the relevant excerpt from the config file:
60 SchedulerType=sched/backfill
61 SchedulerParameters=bf_continue,bf_interval=45,bf_resolution=90,max_array_tasks=1000
62 #SchedulerAuth=
63 #SchedulerPort=
64 #SchedulerRootFilter=
65 SelectType=select/cons_res
66 SelectTypeParameters=CR_CPU_Memory
67 FastSchedule=1
...
102 NodeName=cn_burebista Sockets=2 CoresPerSocket=14 ThreadsPerCore=2 RealMemory=256000 State=UNKNOWN
103 PartitionName=main_compute Nodes=cn_burebista Shared=YES Default=YES MaxTime=76:00:00 State=UP
According to the above I have the backfill scheduler enabled with CPUs and Memory configured as
resources. I have 56 CPUs and 256GB of RAM in my resource pool. I would expect that he backfill
scheduler attempts to allocate the resources in order to fill as much of the cores as possible if there
are multiple processes asking for more resources than available. In my case I have the following queue:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
2361 main_comp training mcetatea PD 0:00 1 (Resources)
2356 main_comp skrf_ori jhanca R 58:41 1 cn_burebista
2357 main_comp skrf_ori jhanca R 44:13 1 cn_burebista
Jobs 2356 and 2357 are asking for 16 CPUs each, job 2361 is asking for 20 CPUs, meaning in total 52 CPUs
As seen from above job 2361(which is started by a different user) is marked as pending due to lack of resources although there are plenty of CPUs and memory available. "scontrol show nodes cn_burebista" gives me the following:
NodeName=cn_burebista Arch=x86_64 CoresPerSocket=14
CPUAlloc=32 CPUErr=0 CPUTot=56 CPULoad=21.65
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=cn_burebista NodeHostName=cn_burebista Version=16.05
OS=Linux RealMemory=256000 AllocMem=64000 FreeMem=178166 Sockets=2 Boards=1
State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
BootTime=2018-03-09T12:04:52 SlurmdStartTime=2018-03-20T10:35:50
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
I'm going through the documentation again and again but I cannot figure out what am I doing wrong ...
Why do I have the above situation? What should I change to my config to make this work?
scontrol show -dd job <jobid> shows me the following:
JobId=2361 JobName=training_carlib
UserId=mcetateanu(1000) GroupId=mcetateanu(1001) MCS_label=N/A
Priority=4294901726 Nice=0 Account=(null) QOS=(null)
JobState=PENDING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=3-04:00:00 TimeMin=N/A
SubmitTime=2018-03-27T10:30:38 EligibleTime=2018-03-27T10:30:38
StartTime=2018-03-28T10:27:36 EndTime=2018-03-31T14:27:36 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=main_compute AllocNode:Sid=zalmoxis:23690
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null) SchedNodeList=cn_burebista
NumNodes=1 NumCPUs=20 NumTasks=1 CPUs/Task=20 ReqB:S:C:T=0:0:*:*
TRES=cpu=20,node=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=20 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/mcetateanu/workspace/CarLib/src/_outputs/linux-xeon_e5v4-icc17.0/bin/classifier/train_classifier.sh
WorkDir=/home/mcetateanu/workspace/CarLib/src/_outputs/linux-xeon_e5v4-icc17.0/bin/classifier
StdErr=/home/mcetateanu/workspace/CarLib/src/_outputs/linux-xeon_e5v4-icc17.0/bin/classifier/training_job_2383.out
StdIn=/dev/null
StdOut=/home/mcetateanu/workspace/CarLib/src/_out
I also changed my config to specify exactly the numver of CPUs and to not let slurm compute the CPUs
from Sockets, CoresPerSocket, and ThreadsPerCore. The 2 tasks that I am trying to run have the following
output from "scontrol show -dd job <jobid>" but the one asking for 20 CPUs is still pending due to lack of resources:
NumNodes=1 NumCPUs=16 NumTasks=1 CPUs/Task=16 ReqB:S:C:T=0:0:*:* TRES=cpu=16,mem=32000M,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* Nodes=cn_burebista CPU_IDs=0-15 Mem=32000 MinCPUsNode=16 MinMemoryCPU=2000M MinTmpDiskNode=0
NumNodes=1 NumCPUs=20 NumTasks=1 CPUs/Task=20 ReqB:S:C:T=0:0:*:* TRES=cpu=20,node=1 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
Thank you
-------------------------------------------------------------------------------------------
Marius Cetateanu
Senior Embedded Software Engineer
Engineering Department 1, Driver & Embedded
Sony Depthsensing Solutions
Tel: +32 (0)28992171
email: Marius.Cetateanu at sony.com
Sony Depthsensing Solutions
11 Boulevard de la Plaine, 1050 Brussels, Belgium
**********************************************************************
This email and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you have received this email in error please notify the sender. This
footnote also confirms that this email message has been checked for all
known viruses.
Sony DepthSensing Solutions SA/NV
Registered Office: 11 Boulevard de la Plaine, 1050 Brussels, Belgium
Registered number: RPM/RPR Brussels 0811 784 189
**********************************************************************
________________________________________
From: slurm-users [slurm-users-bounces at lists.schedmd.com] on behalf of slurm-users-request at lists.schedmd.com [slurm-users-request at lists.schedmd.com]
Sent: Sunday, April 15, 2018 9:02 PM
To: slurm-users at lists.schedmd.com
Subject: slurm-users Digest, Vol 6, Issue 21
Send slurm-users mailing list submissions to
slurm-users at lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.schedmd.com%2Fcgi-bin%2Fmailman%2Flistinfo%2Fslurm-users&data=01%7C01%7Cmcetateanu%40softkinetic.com%7C531c46b911e643cc3bad08d5a303860b%7C918620360842404b8ecc17f785a95cfe%7C0&sdata=0J4phgqhMDHOFVqXITNuNY62BWyprqriA75AvslDMG8%3D&reserved=0
or, via email, send a message with subject or body 'help' to
slurm-users-request at lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner at lists.schedmd.com
When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."
Today's Topics:
1. Re: ulimit in sbatch script (Mahmood Naderan)
2. Re: ulimit in sbatch script (Bill Barth)
3. Re: ulimit in sbatch script (Mahmood Naderan)
4. Re: ulimit in sbatch script (Mahmood Naderan)
5. Re: ulimit in sbatch script (Bill Barth)
----------------------------------------------------------------------
Message: 1
Date: Sun, 15 Apr 2018 22:56:01 +0430
From: Mahmood Naderan <mahmood.nt at gmail.com>
To: Ole.H.Nielsen at fysik.dtu.dk, Slurm User Community List
<slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] ulimit in sbatch script
Message-ID:
<CADa2P2XsyW0tBVGjuBi_yRpDdO15jALKssqqxDzGZCzD8VcyyQ at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
I actually have disabled the swap partition (!) since the system goes
really bad and based on my experience I have to enter the room and
reset the affected machine (!). Otherwise I have to wait for long
times to see it get back to normal.
When I ssh to the node with root user, the ulimit -a says unlimited
virtual memory. So, it seems that the root have unlimited value while
users have limited value.
Regards,
Mahmood
On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen
<Ole.H.Nielsen at fysik.dtu.dk> wrote:
> Hi Mahmood,
>
> It seems your compute node is configured with this limit:
>
> virtual memory (kbytes, -v) 72089600
>
> So when the batch job tries to set a higher limit (ulimit -v 82089600) than
> permitted by the system (72089600), this must surely get rejected, as you
> have discovered!
>
> You may want to reconfigure your compute nodes' limits, for example by
> setting the virtual memory limit to "unlimited" in your configuration. If
> the nodes has a very small RAM memory + swap space size, you might encounter
> Out Of Memory errors...
>
> /Ole
------------------------------
Message: 2
Date: Sun, 15 Apr 2018 18:31:08 +0000
From: Bill Barth <bbarth at tacc.utexas.edu>
To: Slurm User Community List <slurm-users at lists.schedmd.com>,
"Ole.H.Nielsen at fysik.dtu.dk" <Ole.H.Nielsen at fysik.dtu.dk>
Subject: Re: [slurm-users] ulimit in sbatch script
Message-ID: <6218364A-07C8-4A75-B90A-A7AE77EBE238 at tacc.utexas.edu>
Content-Type: text/plain; charset="utf-8"
Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from.
Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
On 4/15/18, 1:26 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:
I actually have disabled the swap partition (!) since the system goes
really bad and based on my experience I have to enter the room and
reset the affected machine (!). Otherwise I have to wait for long
times to see it get back to normal.
When I ssh to the node with root user, the ulimit -a says unlimited
virtual memory. So, it seems that the root have unlimited value while
users have limited value.
Regards,
Mahmood
On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen
<Ole.H.Nielsen at fysik.dtu.dk> wrote:
> Hi Mahmood,
>
> It seems your compute node is configured with this limit:
>
> virtual memory (kbytes, -v) 72089600
>
> So when the batch job tries to set a higher limit (ulimit -v 82089600) than
> permitted by the system (72089600), this must surely get rejected, as you
> have discovered!
>
> You may want to reconfigure your compute nodes' limits, for example by
> setting the virtual memory limit to "unlimited" in your configuration. If
> the nodes has a very small RAM memory + swap space size, you might encounter
> Out Of Memory errors...
>
> /Ole
------------------------------
Message: 3
Date: Sun, 15 Apr 2018 23:01:32 +0430
From: Mahmood Naderan <mahmood.nt at gmail.com>
To: Ole.H.Nielsen at fysik.dtu.dk, Slurm User Community List
<slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] ulimit in sbatch script
Message-ID:
<CADa2P2U-9Pxm0oPT-DkmjzBDa66uk2z=tr-69X=p5WOaWphEUQ at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
BTW, the memory size of the node is 64GB.
Regards,
Mahmood
On Sun, Apr 15, 2018 at 10:56 PM, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
> I actually have disabled the swap partition (!) since the system goes
> really bad and based on my experience I have to enter the room and
> reset the affected machine (!). Otherwise I have to wait for long
> times to see it get back to normal.
>
> When I ssh to the node with root user, the ulimit -a says unlimited
> virtual memory. So, it seems that the root have unlimited value while
> users have limited value.
>
> Regards,
> Mahmood
>
>
>
>
> On Sun, Apr 15, 2018 at 10:26 PM, Ole Holm Nielsen
> <Ole.H.Nielsen at fysik.dtu.dk> wrote:
>> Hi Mahmood,
>>
>> It seems your compute node is configured with this limit:
>>
>> virtual memory (kbytes, -v) 72089600
>>
>> So when the batch job tries to set a higher limit (ulimit -v 82089600) than
>> permitted by the system (72089600), this must surely get rejected, as you
>> have discovered!
>>
>> You may want to reconfigure your compute nodes' limits, for example by
>> setting the virtual memory limit to "unlimited" in your configuration. If
>> the nodes has a very small RAM memory + swap space size, you might encounter
>> Out Of Memory errors...
>>
>> /Ole
------------------------------
Message: 4
Date: Sun, 15 Apr 2018 23:11:20 +0430
From: Mahmood Naderan <mahmood.nt at gmail.com>
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] ulimit in sbatch script
Message-ID:
<CADa2P2XTFSztdtW2_drBtXkKWxz4QdQNLf9P2SBmpU_4C2okQg at mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"
Excuse me... I think the problem is not pam.d.
How do you interpret the following output?
[hamid at rocks7 case1_source2]$ sbatch slurm_script.sh
Submitted batch job 53
[hamid at rocks7 case1_source2]$ tail -f hvacSteadyFoam.log
max memory size (kbytes, -m) 65536000
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) 72089600
file locks (-x) unlimited
^C
[hamid at rocks7 case1_source2]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
53 CLUSTER hvacStea hamid R 0:27 1 compute-0-3
[hamid at rocks7 case1_source2]$ ssh compute-0-3
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Last login: Sun Apr 15 23:03:29 2018 from rocks7.local
Rocks Compute Node
Rocks 7.0 (Manzanita)
Profile built 19:21 11-Apr-2018
Kickstarted 19:37 11-Apr-2018
[hamid at compute-0-3 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256712
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[hamid at compute-0-3 ~]$
As you can see, the log file where I put "ulimit -a" before the main
command says limited virtual memory. However, when I login to the
node, it says unlimited!
Regards,
Mahmood
On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bbarth at tacc.utexas.edu> wrote:
> Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from.
>
> Best,
> Bill.
------------------------------
Message: 5
Date: Sun, 15 Apr 2018 19:02:48 +0000
From: Bill Barth <bbarth at tacc.utexas.edu>
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] ulimit in sbatch script
Message-ID: <9A10D099-77FD-4892-9288-9708B796FFC4 at tacc.utexas.edu>
Content-Type: text/plain; charset="utf-8"
Mahmood, sorry to presume. I meant to address the root user and your ssh to the node in your example.
At our site, we use UsePAM=1 in our slurm.conf, and our /etc/pam.d/slurm and slurm.pam files both contain pam_limits.so, so it could be that way for you, too. I.e. Slurm could be setting the limits for jobscripts for your users, but for root SSHes, where that’s being set by PAM through another config file. Also, root’s limits are potentially differently set by PAM (in /etc/security/limits.conf) or the kernel at boot time.
Finally, users should be careful using ulimit in their job scripts b/c that can only change the limits for that shell script process and not across nodes. That jobscript appears to only apply to one node, but if they want different limits for jobs that span nodes, they may need to use other features of SLURM to get them across all the nodes their job wants (cgroups, perhaps?).
Best,
Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth at tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
On 4/15/18, 1:41 PM, "slurm-users on behalf of Mahmood Naderan" <slurm-users-bounces at lists.schedmd.com on behalf of mahmood.nt at gmail.com> wrote:
Excuse me... I think the problem is not pam.d.
How do you interpret the following output?
[hamid at rocks7 case1_source2]$ sbatch slurm_script.sh
Submitted batch job 53
[hamid at rocks7 case1_source2]$ tail -f hvacSteadyFoam.log
max memory size (kbytes, -m) 65536000
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) 72089600
file locks (-x) unlimited
^C
[hamid at rocks7 case1_source2]$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
53 CLUSTER hvacStea hamid R 0:27 1 compute-0-3
[hamid at rocks7 case1_source2]$ ssh compute-0-3
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Last login: Sun Apr 15 23:03:29 2018 from rocks7.local
Rocks Compute Node
Rocks 7.0 (Manzanita)
Profile built 19:21 11-Apr-2018
Kickstarted 19:37 11-Apr-2018
[hamid at compute-0-3 ~]$ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 256712
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
[hamid at compute-0-3 ~]$
As you can see, the log file where I put "ulimit -a" before the main
command says limited virtual memory. However, when I login to the
node, it says unlimited!
Regards,
Mahmood
On Sun, Apr 15, 2018 at 11:01 PM, Bill Barth <bbarth at tacc.utexas.edu> wrote:
> Are you using pam_limits.so in any of your /etc/pam.d/ configuration files? That would be enforcing /etc/security/limits.conf for all users which are usually unlimited for root. Root’s almost always allowed to do stuff bad enough to crash the machine or run it out of resources. If the /etc/pam.d/sshd file has pam_limits.so in it, that’s probably where the unlimited setting for root is coming from.
>
> Best,
> Bill.
End of slurm-users Digest, Vol 6, Issue 21
******************************************
More information about the slurm-users
mailing list