[slurm-users] Sharing a node with non-gres and gres jobs
Philippe Dos Santos
philippe.dos-santos at u-psud.fr
Wed Mar 20 13:29:55 UTC 2019
Hello Peter,
In order to run non-gres and gres jobs on the g1 node, you should try to limit the memory requested for each job, for instance if suitable:
# sbatch --wrap="sleep 100 && env" -o singlecpu.log -c 3 -J cpu --mem=100
# sbatch --wrap="sleep 6 && env" -o gres.log -c 1 -J gpu --gres=gpu:1 --mem=100
Indeed, it looks like the job 11 is using the whole memory of the node g1:
# scontrol show job -dd 11
JobId=11 JobName=wrap
...
Nodes=g1 CPU_IDs=0-2 Mem= 4000 GRES_IDX=
...
# scontrol show node -dd g1
NodeName=g1 CoresPerSocket=4
CPUAlloc=3 CPUTot=4 CPULoad=N/A
...
RealMemory= 4000 AllocMem= 4000 FreeMem=N/A Sockets=1 Boards=1
...
According to your slurm.conf ( https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf ), the memory is a consumable resource :
SelectType= select/cons_res SelectTypeParameters= CR_Core_Memory
Therefore, there's no more memory available for incoming jobs (jobs 15 and 16).
Regards,
Philippe DS
----- Mail original -----
De: "Peter Steinbach" <steinbac at mpi-cbg.de>
À: slurm-users at lists.schedmd.com
Envoyé: Mercredi 20 Mars 2019 10:19:07
Objet: Re: [slurm-users] Sharing a node with non-gres and gres jobs
Hi Chris,
I changed the initial state a bit (the number of cores per node was
misconfigured):
https://raw.githubusercontent.com/psteinb/docker-centos7-slurm/18.08.5-with-gres/slurm.conf
But that doesn't change things. Initially, I see this:
# sinfo -N -l
Wed Mar 20 09:03:26 2019
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK
WEIGHT AVAIL_FE REASON
g1 1 gpu* unknown 4 1:4:1 1000 0
1 (null) none
g2 1 gpu* unknown 4 1:4:1 1000 0
1 (null) none
g3 1 gpu* unknown 4 1:4:1 1000 0
1 (null) none
[root at ernie /]# sbatch --wrap="sleep 600 && env" -o non-gres.log -c 3
Submitted batch job 11
[root at ernie /]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
11 gpu wrap root R 0:03 1 g1
[root at ernie /]# scontrol show job -dd 11
JobId=11 JobName=wrap
UserId=root(0) GroupId=root(0) MCS_label=N/A
Priority=4294901750 Nice=0 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:12 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-03-20T09:12:45 EligibleTime=2019-03-20T09:12:45
AccrueTime=Unknown
StartTime=2019-03-20T09:12:47 EndTime=2019-03-25T09:12:47 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-03-20T09:12:47
Partition=gpu AllocNode:Sid=ernie:1
ReqNodeList=(null) ExcNodeList=(null)
NodeList=g1
BatchHost=localhost
NumNodes=1 NumCPUs=3 NumTasks=1 CPUs/Task=3 ReqB:S:C:T=0:0:*:*
TRES=cpu=3,mem=4000M,node=1,billing=3
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
Nodes=g1 CPU_IDs=0-2 Mem=4000 GRES_IDX=
MinCPUsNode=3 MinMemoryNode=4000M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/
StdErr=//non-gres.log
StdIn=/dev/null
StdOut=//non-gres.log
Power=
[root at ernie /]# scontrol show node -dd g1
NodeName=g1 CoresPerSocket=4
CPUAlloc=3 CPUTot=4 CPULoad=N/A
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=gpu:titanxp:2
GresDrain=N/A
GresUsed=gpu:titanxp:0(IDX:N/A)
NodeAddr=127.0.0.1 NodeHostName=localhost Port=0
RealMemory=4000 AllocMem=4000 FreeMem=N/A Sockets=1 Boards=1
State=MIXED ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=gpu
BootTime=2019-03-18T10:14:18 SlurmdStartTime=2019-03-20T09:07:45
CfgTRES=cpu=4,mem=4000M,billing=4
AllocTRES=cpu=3,mem=4000M
CapWatts=n/a
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
I now filled the 'cluster' with non-gres jobs and I submitted a GPU job:
[root at ernie /]# sbatch --wrap="sleep 6 && env" -o gres.log -c 1
--gres=gpu:1 --mem=100
Submitted batch job 15
[root at ernie /]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
15 gpu wrap root PD 0:00 1
(Resources)
11 gpu wrap root R 2:02 1 g1
12 gpu wrap root R 1:01 1 g2
13 gpu wrap root R 0:58 1 g3
[root at ernie /]# scontrol show job -dd 15
JobId=15 JobName=wrap
UserId=root(0) GroupId=root(0) MCS_label=N/A
Priority=4294901746 Nice=0 Account=(null) QOS=normal
JobState=PENDING Reason=Resources Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:00 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-03-20T09:14:45 EligibleTime=2019-03-20T09:14:45
AccrueTime=Unknown
StartTime=Unknown EndTime=Unknown Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-03-20T09:14:47
Partition=gpu AllocNode:Sid=ernie:1
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=100M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=100M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/
StdErr=//gres.log
StdIn=/dev/null
StdOut=//gres.log
Power=
TresPerNode=gpu:1
I am curious what you think!
Peter
PS. I submit another single-core job, this one doesn't get in as well
due to priority now:
[root at ernie /]# sbatch --wrap="sleep 100 && env" -o singlecpu.log -c 1
-J single-cpu --mem=100
Submitted batch job 16
[root at ernie /]# squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
15 gpu wrap root PD 0:00 1
(Resources)
16 gpu single-c root PD 0:00 1
(Priority)
11 gpu wrap root R 5:01 1 g1
12 gpu wrap root R 4:00 1 g2
13 gpu wrap root R 3:57 1 g3
[root at ernie /]# scontrol show job -dd 16
JobId=16 JobName=single-cpu
UserId=root(0) GroupId=root(0) MCS_label=N/A
Priority=4294901745 Nice=0 Account=(null) QOS=normal
JobState=PENDING Reason=Priority Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:00:00 TimeLimit=5-00:00:00 TimeMin=N/A
SubmitTime=2019-03-20T09:17:32 EligibleTime=2019-03-20T09:17:32
AccrueTime=Unknown
StartTime=2019-03-25T09:12:47 EndTime=2019-03-30T09:12:47 Deadline=N/A
PreemptTime=None SuspendTime=None SecsPreSuspend=0
LastSchedEval=2019-03-20T09:18:44
Partition=gpu AllocNode:Sid=ernie:1
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=1,mem=100M,node=1,billing=1
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=100M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=(null)
WorkDir=/
StdErr=//singlecpu.log
StdIn=/dev/null
StdOut=//singlecpu.log
Power=
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190320/82af3119/attachment.html>
More information about the slurm-users
mailing list