Using sharding

List overview All Threads
Download

newer

older

Run a program via Strigger when a...

Re: Slurm commands fail when run...

Ricardo Cruz

4 Jul 2024 4 Jul '24

3:43 p.m.

Greetings,

There are not many questions regarding GPU sharding here, and I am unsure if I am using it correctly... I have configured it according to the instructions https://slurm.schedmd.com/gres.html, and it seems to be configured properly:

$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95 AvailableFeatures=(null) ActiveFeatures=(null)

* Gres=gpu:8,shard:32* [truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls srun: job 192 queued and waiting for resources srun: job 192 has been allocated resources (...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193 JobId=193 JobName=ls UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A Priority=1 Nice=0 Account=account QOS=normal

* JobState=PENDING Reason=Resources Dependency=(null)* Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51 AccrueTime=2024-06-28T05:36:51 StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 Scheduler=Backfill:* Partition=partition AllocNode:Sid=localhost:47757 ReqNodeList=(null) ExcNodeList=(null) NodeList= NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=1031887M,node=1,billing=1 AllocTRES=(null) Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=ls WorkDir=/home/rpcruz Power= * TresPerNode=gres/shard:2*

Again, I think I have configured it properly - it shows up correctly in scontrol (as shown above). Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf: GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 [truncated] Our /etc/slurm/gres.conf is also straight-forward: (it works fine for --gres=gpu:1) Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Maybe I am just running srun improperly? Shouldn't it just be srun --gres= shard:2 to allocate half of a GPU? (since I am using 32 shards for the 8 gpus, so it's 4 shards per gpu)

Thank you very much for your attention, -- Ricardo Cruz - https://rpmcruz.github.io

Attachments:

attachment.html (text/html — 3.8 KB)

Show replies by date

Brian Andrus

4 Jul 4 Jul

4:14 p.m.

To help dig into it, can you paste the full output of scontrol show node compute01 while the job is pending? Also 'sinfo' would be good.

It is basically telling you there aren't enough resources in the partition to run the job. Often this is because all the nodes are in use at that moment.

Brian Andrus

On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote:

...

Greetings,

There are not many questions regarding GPU sharding here, and I am unsure if I am using it correctly... I have configured it according to the instructions https://slurm.schedmd.com/gres.html, and it seems to be configured properly:

$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95 AvailableFeatures=(null) ActiveFeatures=(null) * Gres=gpu:8,shard:32

[truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls srun: job 192 queued and waiting for resources srun: job 192 has been allocated resources (...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193 JobId=193 JobName=ls UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A Priority=1 Nice=0 Account=account QOS=normal * JobState=PENDING Reason=Resources Dependency=(null) * Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51 AccrueTime=2024-06-28T05:36:51 StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 Scheduler=Backfill:* Partition=partition AllocNode:Sid=localhost:47757 ReqNodeList=(null) ExcNodeList=(null) NodeList= NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=1031887M,node=1,billing=1 AllocTRES=(null) Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=ls WorkDir=/home/rpcruz Power= * TresPerNode=gres/shard:2**

Again, I think I have configured it properly - it shows up correctly in scontrol (as shown above). Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf: GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 [truncated] Our /etc/slurm/gres.conf is also straight-forward: (it works fine for --gres=gpu:1) Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Maybe I am just running srun improperly? Shouldn't it just be srun --gres=shard:2 to allocate half of a GPU? (since I am using 32 shards for the 8 gpus, so it's 4 shards per gpu)

Thank you very much for your attention,

Ricardo Cruz - https://rpmcruz.github.io https://rpmcruz.github.io/

Ricardo Cruz

4:54 p.m.

Dear Brian,

Currently, we have 5 GPUs available (out of 8).

rpcruz@atlas:~$ /usr/bin/srun --gres=shard:2 ls srun: job 515 queued and waiting for resources

The job shows as PD in squeue. scontrol says that 5 GPUs are allocated out of 8...

rpcruz@atlas:~$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=80 CPUEfctv=128 CPUTot=128 CPULoad=65.38 AvailableFeatures=(null) ActiveFeatures=(null)

* Gres=gpu:8,shard:32* NodeAddr=compute01 NodeHostName=compute01 Version=23.11.4 OS=Linux 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 RealMemory=1031887 AllocMem=644925 FreeMem=701146 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=partition BootTime=2024-07-02T14:08:37 SlurmdStartTime=2024-07-02T14:08:51 LastBusyTime=2024-07-03T12:02:11 ResumeAfterTime=None

* CfgTRES=cpu=128,mem=1031887M,billing=128,gres/gpu=8 AllocTRES=cpu=80,mem=644925M,gres/gpu=5* CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a

rpcruz@atlas:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST partition* up 5-00:00:00 1 mix compute01

The output is the same, independent of whether "srun --gres=shard:2" is pending or not. I wonder if the problem is that CfgTRES is not showing gres/shard ... it sounds like it should, right?

The complete last part of my /etc/slurm/slurm.conf (which is of course the same in the login and compute node):

# COMPUTE NODES GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 CPUs=128 RealMemory=1031887 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN PartitionName=partition Nodes=ALL Default=YES MaxTime=5-00:00:00 State=UP DefCpuPerGPU=16 DefMemPerGPU=128985

And in the compute node /etc/slurm/gres.conf is: Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Thank you! -- Ricardo Cruz - https://rpmcruz.github.io

Brian Andrus via slurm-users slurm-users@lists.schedmd.com escreveu (quinta, 4/07/2024 à(s) 17:16):

...

To help dig into it, can you paste the full output of scontrol show node compute01 while the job is pending? Also 'sinfo' would be good.

It is basically telling you there aren't enough resources in the partition to run the job. Often this is because all the nodes are in use at that moment.

Brian Andrus On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote:

Greetings,

There are not many questions regarding GPU sharding here, and I am unsure if I am using it correctly... I have configured it according to the instructions https://slurm.schedmd.com/gres.html, and it seems to be configured properly:

$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95 AvailableFeatures=(null) ActiveFeatures=(null)

Gres=gpu:8,shard:32 * [truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls srun: job 192 queued and waiting for resources srun: job 192 has been allocated resources (...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193 JobId=193 JobName=ls UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A Priority=1 Nice=0 Account=account QOS=normal

JobState=PENDING Reason=Resources Dependency=(null) * Requeue=1

Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51 AccrueTime=2024-06-28T05:36:51 StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 Scheduler=Backfill:* Partition=partition AllocNode:Sid=localhost:47757 ReqNodeList=(null) ExcNodeList=(null) NodeList= NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=1031887M,node=1,billing=1 AllocTRES=(null) Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=ls WorkDir=/home/rpcruz Power=

TresPerNode=gres/shard:2*

Again, I think I have configured it properly - it shows up correctly in scontrol (as shown above). Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf: GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 [truncated] Our /etc/slurm/gres.conf is also straight-forward: (it works fine for --gres=gpu:1) Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Maybe I am just running srun improperly? Shouldn't it just be srun --gres= shard:2 to allocate half of a GPU? (since I am using 32 shards for the 8 gpus, so it's 4 shards per gpu)

Thank you very much for your attention,

Ricardo Cruz - https://rpmcruz.github.io

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Brian Andrus

5:09 p.m.

Just a thought.

Try specifying some memory. It looks like the running jobs do that and by default, if not specified it is "all the memory on the node", so it can't start because some of it is taken.

Brian Andrus

On 7/4/2024 9:54 AM, Ricardo Cruz wrote:

...

Dear Brian,

Currently, we have 5 GPUs available (out of 8).

rpcruz@atlas:~$ /usr/bin/srun --gres=shard:2 ls srun: job 515 queued and waiting for resources

The job shows as PD in squeue. scontrol says that 5 GPUs are allocated out of 8...

rpcruz@atlas:~$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=80 CPUEfctv=128 CPUTot=128 CPULoad=65.38 AvailableFeatures=(null) ActiveFeatures=(null) * Gres=gpu:8,shard:32 * NodeAddr=compute01 NodeHostName=compute01 Version=23.11.4 OS=Linux 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 RealMemory=1031887 AllocMem=644925 FreeMem=701146 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=partition BootTime=2024-07-02T14:08:37 SlurmdStartTime=2024-07-02T14:08:51 LastBusyTime=2024-07-03T12:02:11 ResumeAfterTime=None * CfgTRES=cpu=128,mem=1031887M,billing=128,gres/gpu=8 AllocTRES=cpu=80,mem=644925M,gres/gpu=5 * CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a

rpcruz@atlas:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST partition* up 5-00:00:00 1 mix compute01

The output is the same, independent of whether "srun --gres=shard:2" is pending or not. I wonder if the problem is that CfgTRES is not showing gres/shard ... it sounds like it should, right?

The complete last part of my /etc/slurm/slurm.conf (which is of course the same in the login and compute node):

# COMPUTE NODES GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 CPUs=128 RealMemory=1031887 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN PartitionName=partition Nodes=ALL Default=YES MaxTime=5-00:00:00 State=UP DefCpuPerGPU=16 DefMemPerGPU=128985

And in the compute node /etc/slurm/gres.conf is: Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Thank you!

Ricardo Cruz - https://rpmcruz.github.io https://rpmcruz.github.io/

Brian Andrus via slurm-users slurm-users@lists.schedmd.com escreveu (quinta, 4/07/2024 à(s) 17:16):
To help dig into it, can you paste the full output of scontrol
show node compute01 while the job is pending? Also 'sinfo' would
be good.

It is basically telling you there aren't enough resources in the
partition to run the job. Often this is because all the nodes are
in use at that moment.

Brian Andrus

On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote:
...
Greetings,

There are not many questions regarding GPU sharding here, and I
am unsure if I am using it correctly... I have configured it
according to the instructions
<https://slurm.schedmd.com/gres.html>, and it seems to be
configured properly:

$ scontrol show node compute01
NodeName=compute01 Arch=x86_64 CoresPerSocket=32
   CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95
   AvailableFeatures=(null)
   ActiveFeatures=(null)
*   Gres=gpu:8,shard:32
*
   [truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls
srun: job 192 queued and waiting for resources
srun: job 192 has been allocated resources
(...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls
srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193
JobId=193 JobName=ls
   UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A
   Priority=1 Nice=0 Account=account QOS=normal
*   JobState=PENDING Reason=Resources Dependency=(null)
*   Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0
   RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A
   SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51
   AccrueTime=2024-06-28T05:36:51
   StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22
Deadline=N/A
   SuspendTime=None SecsPreSuspend=0
LastSchedEval=2024-06-28T05:37:20 Scheduler=Backfill:*
   Partition=partition AllocNode:Sid=localhost:47757
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   ReqTRES=cpu=1,mem=1031887M,node=1,billing=1
   AllocTRES=(null)
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=ls
   WorkDir=/home/rpcruz
   Power=
*   TresPerNode=gres/shard:2**
*

Again, I think I have configured it properly - it shows up
correctly in scontrol (as shown above).
Our setup is pretty simple - I just added shard to
/etc/slurm/slurm.conf:
GresTypes=gpu,shard
NodeName=compute01 Gres=gpu:8,shard:32 [truncated]
Our /etc/slurm/gres.conf is also straight-forward: (it works fine
for --gres=gpu:1)
Name=gpu File=/dev/nvidia[0-7]
Name=shard Count=32


Maybe I am just running srun improperly? Shouldn't it just be
srun --gres=shard:2 to allocate half of a GPU? (since I am using
32 shards for the 8 gpus, so it's 4 shards per gpu)

Thank you very much for your attention,
--
Ricardo Cruz - https://rpmcruz.github.io
<https://rpmcruz.github.io/>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Reed Dier

5 Jul 5 Jul

6:56 p.m.

I would try specifying cpus and mem just to be sure its not requesting 0/all.

Also, I was running into a weird issue when I had oversubscribe=yes:2 causing odd issues in my lab cluster when playing with shards, where they would go pending resources despite no alloc of gpu/shards. Once I reverted to my normal FORCE:1, it behaved as expected.

Also may want to make sure there isn’t a job_submit script possibly intercepting gres requests?

...

On Jul 4, 2024, at 12:09 PM, Brian Andrus via slurm-users slurm-users@lists.schedmd.com wrote:

Just a thought.

Try specifying some memory. It looks like the running jobs do that and by default, if not specified it is "all the memory on the node", so it can't start because some of it is taken.

Brian Andrus

On 7/4/2024 9:54 AM, Ricardo Cruz wrote:

...
Dear Brian,

Currently, we have 5 GPUs available (out of 8).

rpcruz@atlas:~$ /usr/bin/srun --gres=shard:2 ls srun: job 515 queued and waiting for resources

The job shows as PD in squeue. scontrol says that 5 GPUs are allocated out of 8...

rpcruz@atlas:~$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=80 CPUEfctv=128 CPUTot=128 CPULoad=65.38 AvailableFeatures=(null) ActiveFeatures=(null) Gres=gpu:8,shard:32 NodeAddr=compute01 NodeHostName=compute01 Version=23.11.4 OS=Linux 6.8.0-36-generic #36-Ubuntu SMP PREEMPT_DYNAMIC Mon Jun 10 10:49:14 UTC 2024 RealMemory=1031887 AllocMem=644925 FreeMem=701146 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=partition BootTime=2024-07-02T14:08:37 SlurmdStartTime=2024-07-02T14:08:51 LastBusyTime=2024-07-03T12:02:11 ResumeAfterTime=None CfgTRES=cpu=128,mem=1031887M,billing=128,gres/gpu=8 AllocTRES=cpu=80,mem=644925M,gres/gpu=5 CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a

rpcruz@atlas:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST partition* up 5-00:00:00 1 mix compute01

The output is the same, independent of whether "srun --gres=shard:2" is pending or not. I wonder if the problem is that CfgTRES is not showing gres/shard ... it sounds like it should, right?

The complete last part of my /etc/slurm/slurm.conf (which is of course the same in the login and compute node):

# COMPUTE NODES GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 CPUs=128 RealMemory=1031887 Sockets=2 CoresPerSocket=32 ThreadsPerCore=2 State=UNKNOWN PartitionName=partition Nodes=ALL Default=YES MaxTime=5-00:00:00 State=UP DefCpuPerGPU=16 DefMemPerGPU=128985

And in the compute node /etc/slurm/gres.conf is: Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Thank you!

Ricardo Cruz - https://rpmcruz.github.io https://rpmcruz.github.io/

Brian Andrus via slurm-users <slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com> escreveu (quinta, 4/07/2024 à(s) 17:16):

...
To help dig into it, can you paste the full output of scontrol show node compute01 while the job is pending? Also 'sinfo' would be good.

It is basically telling you there aren't enough resources in the partition to run the job. Often this is because all the nodes are in use at that moment.

Brian Andrus

On 7/4/2024 8:43 AM, Ricardo Cruz via slurm-users wrote:

...
Greetings,

There are not many questions regarding GPU sharding here, and I am unsure if I am using it correctly... I have configured it according to the instructions https://slurm.schedmd.com/gres.html, and it seems to be configured properly:

$ scontrol show node compute01 NodeName=compute01 Arch=x86_64 CoresPerSocket=32 CPUAlloc=48 CPUEfctv=128 CPUTot=128 CPULoad=10.95 AvailableFeatures=(null) ActiveFeatures=(null) Gres=gpu:8,shard:32 [truncated]

When running with gres:gpu everything works perfectly:

$ /usr/bin/srun --gres=gpu:2 ls srun: job 192 queued and waiting for resources srun: job 192 has been allocated resources (...)

However, when using sharding, it just stays waiting indefinitely:

$ /usr/bin/srun --gres=shard:2 ls srun: job 193 queued and waiting for resources

The reason it gives for pending is just "Resources":

$ scontrol show job 193 JobId=193 JobName=ls UserId=rpcruz(1000) GroupId=rpcruz(1000) MCS_label=N/A Priority=1 Nice=0 Account=account QOS=normal JobState=PENDING Reason=Resources Dependency=(null) Requeue=1 Restarts=0 BatchFlag=0 Reboot=0 ExitCode=0:0 RunTime=00:00:00 TimeLimit=2-00:00:00 TimeMin=N/A SubmitTime=2024-06-28T05:36:51 EligibleTime=2024-06-28T05:36:51 AccrueTime=2024-06-28T05:36:51 StartTime=2024-06-29T18:13:22 EndTime=2024-07-01T18:13:22 Deadline=N/A SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-06-28T05:37:20 Scheduler=Backfill:* Partition=partition AllocNode:Sid=localhost:47757 ReqNodeList=(null) ExcNodeList=(null) NodeList= NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:* ReqTRES=cpu=1,mem=1031887M,node=1,billing=1 AllocTRES=(null) Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=ls WorkDir=/home/rpcruz Power= TresPerNode=gres/shard:2

Again, I think I have configured it properly - it shows up correctly in scontrol (as shown above). Our setup is pretty simple - I just added shard to /etc/slurm/slurm.conf: GresTypes=gpu,shard NodeName=compute01 Gres=gpu:8,shard:32 [truncated] Our /etc/slurm/gres.conf is also straight-forward: (it works fine for --gres=gpu:1) Name=gpu File=/dev/nvidia[0-7] Name=shard Count=32

Maybe I am just running srun improperly? Shouldn't it just be srun --gres=shard:2 to allocate half of a GPU? (since I am using 32 shards for the 8 gpus, so it's 4 shards per gpu)

Thank you very much for your attention,

Ricardo Cruz - https://rpmcruz.github.io https://rpmcruz.github.io/

-- slurm-users mailing list -- slurm-users@lists.schedmd.com mailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com mailto:slurm-users-leave@lists.schedmd.com

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Reed

Ward Poelmans

6:44 a.m.

Hi Ricardo,

It should show up like this:

Gres=gpu:gtx_1080_ti:4(S:0-1),shard:gtx_1080_ti:16(S:0-1)

CfgTRES=cpu=32,mem=515000M,billing=130,gres/gpu=4,gres/shard=16 AllocTRES=cpu=8,mem=31200M,gres/shard=1

I can't directly spot any error however. Our gres.conf is simply `AutoDetect=nvml`.

AccountingStorageTRES=gres/gpu,gres/shard GresTypes=gpu,shard

Did you try restarting slurm?

Ward

Arnuld

11:56 a.m.

...

On Fri, Jul 5, 2024 at 12:19 PM Ward Poelmans via slurm-users slurm-users@lists.schedmd.com wrote:

...

Hi Ricardo,

It should show up like this:
Gres=gpu:gtx_1080_ti:4(S:0-1),shard:gtx_1080_ti:16(S:0-1)

What's the meaning of (S:0-1) here?

Ward Poelmans

3:22 p.m.

Hi Arnuld,

On 5/07/2024 13:56, Arnuld via slurm-users wrote:

...

It should show up like this:

     Gres=gpu:gtx_1080_ti:4(S:0-1),shard:gtx_1080_ti:16(S:0-1)

What's the meaning of (S:0-1) here?

The sockets to which the GPUs are associated:

If GRES are associated with specific sockets, that information will be reported. For example if all 4 GPUs on a node are all associated with socket zero, then "Gres=gpu:4(S:0)". If associated with sockets 0 and 1 then "Gres=gpu:4(S:0-1)".

Ward

400

Age (days ago)

401

Last active (days ago)

slurm-users@lists.schedmd.com

7 comments

5 participants

tags (0)

participants (5)

Arnuld
Brian Andrus
Reed Dier
Ricardo Cruz
Ward Poelmans