diagnosing why interactive/non-interactive job waits are so long with State=MIXED

List overview All Threads
Download

newer

older

JWKS File Update - Reread File

Re: Problems with gres.conf

Robert Kudyba

4 Jun 2024 4 Jun '24

8:03 p.m.

At the moment we have 2 nodes that are having long wait times. Generally this is when the nodes are fully allocated. What would be the other reasons if there is still enough available memory and CPU available, that a job would take so long? Slurm version is 23.02.4 via Bright Computing. Note the compute nodes have hyperthreading enabled but that should be irrelevant. Is there a way to determine what else could be holding jobs up?

srun --pty -t 0-01:00:00 --nodelist=node001 --gres=gpu:1 -A ourts -p short /bin/bash srun: job 672204 queued and waiting for resources

scontrol show node node001 NodeName=m001 Arch=x86_64 CoresPerSocket=48 CPUAlloc=24 CPUEfctv=192 CPUTot=192 CPULoad=20.37 AvailableFeatures=location=local ActiveFeatures=location=local Gres=gpu:A6000:8 NodeAddr=node001 NodeHostName=node001 Version=23.02.4 OS=Linux 5.14.0-70.13.1.el9_0.x86_64 #1 SMP PREEMPT Thu Apr 14 12:42:38 EDT 2022 RealMemory=1031883 AllocMem=1028096 FreeMem=222528 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=ours,short BootTime=2024-04-29T16:18:30 SlurmdStartTime=2024-05-18T16:48:11 LastBusyTime=2024-06-03T10:49:49 ResumeAfterTime=None CfgTRES=cpu=192,mem=1031883M,billing=192,gres/gpu=8 AllocTRES=cpu=24,mem=1004G,gres/gpu=2,gres/gpu:a6000=2 CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

grep 672204 /var/log/slurmctld [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204 NodeList=(null) usec=852

Attachments:

attachment.html (text/html — 2.0 KB)

Show replies by date

Ryan Novosielski

4 Jun 4 Jun

8:13 p.m.

This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler.

-- #BlackLivesMatter ____ || \UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'

On Jun 4, 2024, at 16:03, Robert Kudyba via slurm-users slurm-users@lists.schedmd.com wrote:

srun --pty -t 0-01:00:00 --nodelist=node001 --gres=gpu:1 -A ourts -p short /bin/bash srun: job 672204 queued and waiting for resources

grep 672204 /var/log/slurmctld [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204 NodeList=(null) usec=852

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Robert Kudyba

8:20 p.m.

Thanks for the quick response Ryan!

Are there any recommendations for bf_ options from https://slurm.schedmd.com/sched_config.html that could help with this? bf_continue? Decreasing bf_interval= to a value lower than 30?

On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski novosirj@rutgers.edu wrote:

...

This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler.

-- #BlackLivesMatter ____ || \UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'

On Jun 4, 2024, at 16:03, Robert Kudyba via slurm-users < slurm-users@lists.schedmd.com> wrote:

At the moment we have 2 nodes that are having long wait times. Generally this is when the nodes are fully allocated. What would be the other reasons if there is still enough available memory and CPU available, that a job would take so long? Slurm version is 23.02.4 via Bright Computing. Note the compute nodes have hyperthreading enabled but that should be irrelevant. Is there a way to determine what else could be holding jobs up?

srun --pty -t 0-01:00:00 --nodelist=node001 --gres=gpu:1 -A ourts -p short /bin/bash srun: job 672204 queued and waiting for resources

scontrol show node node001 NodeName=m001 Arch=x86_64 CoresPerSocket=48 CPUAlloc=24 CPUEfctv=192 CPUTot=192 CPULoad=20.37 AvailableFeatures=location=local ActiveFeatures=location=local Gres=gpu:A6000:8 NodeAddr=node001 NodeHostName=node001 Version=23.02.4 OS=Linux 5.14.0-70.13.1.el9_0.x86_64 #1 SMP PREEMPT Thu Apr 14 12:42:38 EDT 2022 RealMemory=1031883 AllocMem=1028096 FreeMem=222528 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=ours,short BootTime=2024-04-29T16:18:30 SlurmdStartTime=2024-05-18T16:48:11 LastBusyTime=2024-06-03T10:49:49 ResumeAfterTime=None CfgTRES=cpu=192,mem=1031883M,billing=192,gres/gpu=8 AllocTRES=cpu=24,mem=1004G,gres/gpu=2,gres/gpu:a6000=2 CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

grep 672204 /var/log/slurmctld [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204 NodeList=(null) usec=852

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

Ryan Novosielski

8:44 p.m.

We do have bf_continue set. And also bf_max_job_user=50, because we discovered that one user can submit so many jobs that it will hit the limit of the number it’s going to consider and not run some jobs that it could otherwise run.

On Jun 4, 2024, at 16:20, Robert Kudyba rkudyba@fordham.edu wrote:

Thanks for the quick response Ryan!

Are there any recommendations for bf_ options from https://slurm.schedmd.com/sched_config.html that could help with this? bf_continue? Decreasing bf_interval= to a value lower than 30?

On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski <novosirj@rutgers.edumailto:novosirj@rutgers.edu> wrote: This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler.

-- #BlackLivesMatter ____ || \UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edumailto:novosirj@rutgers.edu || \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'

On Jun 4, 2024, at 16:03, Robert Kudyba via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> wrote:

srun --pty -t 0-01:00:00 --nodelist=node001 --gres=gpu:1 -A ourts -p short /bin/bash srun: job 672204 queued and waiting for resources

grep 672204 /var/log/slurmctld [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204 NodeList=(null) usec=852

-- slurm-users mailing list -- slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.commailto:slurm-users-leave@lists.schedmd.com

Loris Bennett

5 Jun 5 Jun

7:23 a.m.

Ryan Novosielski via slurm-users slurm-users@lists.schedmd.com writes:

...

We do have bf_continue set. And also bf_max_job_user=50, because we discovered that one user can submit so many jobs that it will hit the limit of the number it’s going to consider and not run some jobs that it could otherwise run.

On Jun 4, 2024, at 16:20, Robert Kudyba rkudyba@fordham.edu wrote:

Thanks for the quick response Ryan!

Are there any recommendations for bf_ options from https://slurm.schedmd.com/sched_config.html that could help with this? bf_continue? Decreasing bf_interval= to a value lower than 30?

Your bf_window may be too small. From 'man slurm.conf':

bf_window=#

The number of minutes into the future to look when considering jobs to schedule. Higher values result in more overhead and less responsiveness. A value at least as long as the highest allowed time limit is generally advisable to prevent job starvation. In order to limit the amount of data managed by the backfill scheduler, if the value of bf_window is increased, then it is generally advisable to also increase bf_resolution. This option applies only to SchedulerType=sched/backfill. Default: 1440 (1 day), Min: 1, Max: 43200 (30 days).

...

On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski novosirj@rutgers.edu wrote:

This is relatively true of my system as well, and I believe it’s that the backfill schedule is slower than the main scheduler.

-- #BlackLivesMatter ____ || \UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \ of NJ | Office of Advanced Research Computing - MSB A555B, Newark `'

On Jun 4, 2024, at 16:03, Robert Kudyba via slurm-users slurm-users@lists.schedmd.com wrote:

At the moment we have 2 nodes that are having long wait times. Generally this is when the nodes are fully allocated. What would be the other reasons if there is still enough available memory and CPU available, that a job would take so long? Slurm version is 23.02.4 via Bright Computing. Note the compute nodes have hyperthreading enabled but that should be irrelevant. Is there a way to determine what else could be holding jobs up?

srun --pty -t 0-01:00:00 --nodelist=node001 --gres=gpu:1 -A ourts -p short /bin/bash srun: job 672204 queued and waiting for resources

scontrol show node node001 NodeName=m001 Arch=x86_64 CoresPerSocket=48 CPUAlloc=24 CPUEfctv=192 CPUTot=192 CPULoad=20.37 AvailableFeatures=location=local ActiveFeatures=location=local Gres=gpu:A6000:8 NodeAddr=node001 NodeHostName=node001 Version=23.02.4 OS=Linux 5.14.0-70.13.1.el9_0.x86_64 #1 SMP PREEMPT Thu Apr 14 12:42:38 EDT 2022 RealMemory=1031883 AllocMem=1028096 FreeMem=222528 Sockets=2 Boards=1 State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=ours,short BootTime=2024-04-29T16:18:30 SlurmdStartTime=2024-05-18T16:48:11 LastBusyTime=2024-06-03T10:49:49 ResumeAfterTime=None CfgTRES=cpu=192,mem=1031883M,billing=192,gres/gpu=8 AllocTRES=cpu=24,mem=1004G,gres/gpu=2,gres/gpu:a6000=2 CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

grep 672204 /var/log/slurmctld [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources JobId=672204 NodeList=(null) usec=852

-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

-- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

Robert Kudyba

1:18 p.m.

...

Your bf_window may be too small. From 'man slurm.conf':

bf_window=#

     The number of minutes into the future to look when considering
     jobs to schedule.  Higher values result in more overhead and
     less responsiveness.  A value at least as long as the highest
     allowed time limit is generally advisable to prevent job
     starvation.  In order to limit the amount of data managed by
     the backfill scheduler, if the value of bf_window is increased,
     then it is generally advisable to also increase bf_resolution.
     This option applies only to SchedulerType=sched/backfill.
     Default: 1440 (1 day), Min: 1, Max: 43200 (30 days).

So since we have a 5 day option should bf_window=7200? What should bf_resolution be set to then?

But how does this affect/improve wait times?

...

...
On Tue, Jun 4, 2024 at 4:13 PM Ryan Novosielski novosirj@rutgers.edu

wrote:

...
This is relatively true of my system as well, and I believe it’s that

the backfill schedule is slower than the main scheduler.

...
-- #BlackLivesMatter ____ || \UTGERS,

|---------------------------*O*---------------------------

...
||_// the State | Ryan Novosielski - novosirj@rutgers.edu || \ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS

Campus

...
|| \ of NJ | Office of Advanced Research Computing - MSB A555B,

Newark

...
  `'
On Jun 4, 2024, at 16:03, Robert Kudyba via slurm-users <
slurm-users@lists.schedmd.com> wrote:

...
At the moment we have 2 nodes that are having long wait times.

Generally this is when the nodes are fully allocated. What would be the other

...
reasons if there is still enough available memory and CPU available,

that a job would take so long? Slurm version is 23.02.4 via Bright

...
Computing. Note the compute nodes have hyperthreading enabled but that

should be irrelevant. Is there a way to determine what else could

...
be holding jobs up?

srun --pty -t 0-01:00:00 --nodelist=node001 --gres=gpu:1 -A ourts -p

short /bin/bash

...
srun: job 672204 queued and waiting for resources

scontrol show node node001 NodeName=m001 Arch=x86_64 CoresPerSocket=48 CPUAlloc=24 CPUEfctv=192 CPUTot=192 CPULoad=20.37 AvailableFeatures=location=local ActiveFeatures=location=local Gres=gpu:A6000:8 NodeAddr=node001 NodeHostName=node001 Version=23.02.4 OS=Linux 5.14.0-70.13.1.el9_0.x86_64 #1 SMP PREEMPT Thu Apr 14

12:42:38 EDT 2022

...
RealMemory=1031883 AllocMem=1028096 FreeMem=222528 Sockets=2 Boards=1
State=MIXED ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A
MCS_label=N/A

...
Partitions=ours,short
BootTime=2024-04-29T16:18:30 SlurmdStartTime=2024-05-18T16:48:11
LastBusyTime=2024-06-03T10:49:49 ResumeAfterTime=None
CfgTRES=cpu=192,mem=1031883M,billing=192,gres/gpu=8
AllocTRES=cpu=24,mem=1004G,gres/gpu=2,gres/gpu:a6000=2
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
grep 672204 /var/log/slurmctld [2024-06-04T15:50:35.627] sched: _slurm_rpc_allocate_resources
JobId=672204 NodeList=(null) usec=852

...
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com

-- Dr. Loris Bennett (Herr/Mr) FUB-IT (ex-ZEDAT), Freie Universität Berlin

430

Age (days ago)

431

Last active (days ago)

slurm-users@lists.schedmd.com

5 comments

3 participants

tags (0)

participants (3)

Loris Bennett
Robert Kudyba
Ryan Novosielski