[slurm-users] Regression from slurm-22.05.2 to slurm-22.05.7 when using "--gpus=N" option.

Thu Jan 12 20:52:31 UTC 2023

Hello,

I have a small 2 compute node GPU cluster, where each node as 2 GPUs.

$ sinfo -o "%20N  %10c  %10m  %25f  %30G "

NODELIST              CPUS        MEMORY      AVAIL_FEATURES             GRES                          

o186i[126-127]        128         64000       (null)                     gpu:nvidia_a40:2(S:0-1) 

In my batch script, I request 4 GPUs and let Slurm decide how many nodes to automatically allocate.  I also tell it I want 1 task per node.

$ cat rig_batch.sh

#!/usr/bin/env bash

#SBATCH --ntasks-per-node=1

#SBATCH --nodes=1-9

#SBATCH --gpus=4

#SBATCH --error=/home/corujor/slurm-error.log

#SBATCH --output=/home/corujor/slurm-output.log

bash -c 'echo $(hostname):SLURM_JOBID=${SLURM_JOBID}:SLURM_PROCID=${SLURM_PROCID}:CUDA_VISIBLE_DEVICES=${CUDA_VISIBLE_DEVICES}'

I submit my batch script on slurm-22.05.2.

$ sbatch rig_batch.sh

Submitted batch job 7

I get the expected results.  That is, since each compute node has 2 GPUs and I requested 4 GPUs, Slurm allocated 2 nodes, and 1 task per node.

$ cat slurm-output.log

o186i126:SLURM_JOBID=7:SLURM_PROCID=0:CUDA_VISIBLE_DEVICES=0,1

o186i127:SLURM_JOBID=7:SLURM_PROCID=1:CUDA_VISIBLE_DEVICES=0,1

However, when I try to submit the same batch script on slurm-22.05.7, it fails.

$ sbatch rig_batch.sh

sbatch: error: Batch job submission failed: Requested node configuration is not available

Here is my configuration.

$ scontrol show config

Configuration data as of 2023-01-12T21:38:55

AccountingStorageBackupHost = (null)

AccountingStorageEnforce = none

AccountingStorageHost   = localhost

AccountingStorageExternalHost = (null)

AccountingStorageParameters = (null)

AccountingStoragePort   = 6819

AccountingStorageTRES   = cpu,mem,energy,node,billing,fs/disk,vmem,pages

AccountingStorageType   = accounting_storage/slurmdbd

AccountingStorageUser   = N/A

AccountingStoreFlags    = (null)

AcctGatherEnergyType    = acct_gather_energy/none

AcctGatherFilesystemType = acct_gather_filesystem/none

AcctGatherInterconnectType = acct_gather_interconnect/none

AcctGatherNodeFreq      = 0 sec

AcctGatherProfileType   = acct_gather_profile/none

AllowSpecResourcesUsage = No

AuthAltTypes            = (null)

AuthAltParameters       = (null)

AuthInfo                = (null)

AuthType                = auth/munge

BatchStartTimeout       = 10 sec

BcastExclude            = /lib,/usr/lib,/lib64,/usr/lib64

BcastParameters         = (null)

BOOT_TIME               = 2023-01-12T17:17:11

BurstBufferType         = (null)

CliFilterPlugins        = (null)

ClusterName             = grenoble_test

CommunicationParameters = (null)

CompleteWait            = 0 sec

CoreSpecPlugin          = core_spec/none

CpuFreqDef              = Unknown

CpuFreqGovernors        = OnDemand,Performance,UserSpace

CredType                = cred/munge

DebugFlags              = Gres

DefMemPerNode           = UNLIMITED

DependencyParameters    = (null)

DisableRootJobs         = Yes

EioTimeout              = 60

EnforcePartLimits       = ANY

Epilog                  = (null)

EpilogMsgTime           = 2000 usec

EpilogSlurmctld         = (null)

ExtSensorsType          = ext_sensors/none

ExtSensorsFreq          = 0 sec

FederationParameters    = (null)

FirstJobId              = 1

GetEnvTimeout           = 2 sec

GresTypes               = gpu

GpuFreqDef              = high,memory=high

GroupUpdateForce        = 1

GroupUpdateTime         = 600 sec

HASH_VAL                = Match

HealthCheckInterval     = 0 sec

HealthCheckNodeState    = ANY

HealthCheckProgram      = (null)

InactiveLimit           = 0 sec

InteractiveStepOptions  = --interactive --preserve-env --pty $SHELL

JobAcctGatherFrequency  = 30

JobAcctGatherType       = jobacct_gather/none

JobAcctGatherParams     = (null)

JobCompHost             = localhost

JobCompLoc              = /var/log/slurm_jobcomp.log

JobCompPort             = 0

JobCompType             = jobcomp/none

JobCompUser             = root

JobContainerType        = job_container/none

JobCredentialPrivateKey = /apps/slurm/etc/.slurm.key

JobCredentialPublicCertificate = /apps/slurm/etc/slurm.cert

JobDefaults             = (null)

JobFileAppend           = 0

JobRequeue              = 1

JobSubmitPlugins        = (null)

KillOnBadExit           = 0

KillWait                = 30 sec

LaunchParameters        = use_interactive_step

LaunchType              = launch/slurm

Licenses                = (null)

LogTimeFormat           = iso8601_ms

MailDomain              = (null)

MailProg                = /bin/mail

MaxArraySize            = 1001

MaxDBDMsgs              = 20008

MaxJobCount             = 10000

MaxJobId                = 67043328

MaxMemPerNode           = UNLIMITED

MaxNodeCount            = 2

MaxStepCount            = 40000

MaxTasksPerNode         = 512

MCSPlugin               = mcs/none

MCSParameters           = (null)

MessageTimeout          = 10 sec

MinJobAge               = 300 sec

MpiDefault              = pmix

MpiParams               = (null)

NEXT_JOB_ID             = 274

NodeFeaturesPlugins     = (null)

OverTimeLimit           = 0 min

PluginDir               = /apps/slurm-22-05-7-1/lib/slurm

PlugStackConfig         = (null)

PowerParameters         = (null)

PowerPlugin             =

PreemptMode             = OFF

PreemptType             = preempt/none

PreemptExemptTime       = 00:00:00

PrEpParameters          = (null)

PrEpPlugins             = prep/script

PriorityParameters      = (null)

PrioritySiteFactorParameters = (null)

PrioritySiteFactorPlugin = (null)

PriorityType            = priority/basic

PrivateData             = none

ProctrackType           = proctrack/linuxproc

Prolog                  = (null)

PrologEpilogTimeout     = 65534

PrologSlurmctld         = (null)

PrologFlags             = (null)

PropagatePrioProcess    = 0

PropagateResourceLimits = ALL

PropagateResourceLimitsExcept = (null)

RebootProgram           = (null)

ReconfigFlags           = (null)

RequeueExit             = (null)

RequeueExitHold         = (null)

ResumeFailProgram       = (null)

ResumeProgram           = (null)

ResumeRate              = 300 nodes/min

ResumeTimeout           = 60 sec

ResvEpilog              = (null)

ResvOverRun             = 0 min

ResvProlog              = (null)

ReturnToService         = 1

RoutePlugin             = route/default

SchedulerParameters     = (null)

SchedulerTimeSlice      = 30 sec

SchedulerType           = sched/backfill

ScronParameters         = (null)

SelectType              = select/cons_tres

SelectTypeParameters    = CR_CPU

SlurmUser               = slurm(1182)

SlurmctldAddr           = (null)

SlurmctldDebug          = debug

SlurmctldHost[0]        = o186i208

SlurmctldLogFile        = /var/log/slurmctld.log

SlurmctldPort           = 6817

SlurmctldSyslogDebug    = (null)

SlurmctldPrimaryOffProg = (null)

SlurmctldPrimaryOnProg  = (null)

SlurmctldTimeout        = 120 sec

SlurmctldParameters     = (null)

SlurmdDebug             = info

SlurmdLogFile           = /var/log/slurmd.log

SlurmdParameters        = (null)

SlurmdPidFile           = /var/run/slurmd.pid

SlurmdPort              = 6818

SlurmdSpoolDir          = /var/spool/slurmd

SlurmdSyslogDebug       = (null)

SlurmdTimeout           = 300 sec

SlurmdUser              = root(0)

SlurmSchedLogFile       = (null)

SlurmSchedLogLevel      = 0

SlurmctldPidFile        = /var/slurm/run/slurmctld.pid

SlurmctldPlugstack      = (null)

SLURM_CONF              = /apps/slurm-22-05-7-1/etc/slurm.conf

SLURM_VERSION           = 22.05.7

SrunEpilog              = (null)

SrunPortRange           = 0-0

SrunProlog              = (null)

StateSaveLocation       = /var/spool/slurmctld

SuspendExcNodes         = (null)

SuspendExcParts         = (null)

SuspendProgram          = (null)

SuspendRate             = 60 nodes/min

SuspendTime             = INFINITE

SuspendTimeout          = 30 sec

SwitchParameters        = (null)

SwitchType              = switch/none

TaskEpilog              = (null)

TaskPlugin              = task/affinity

TaskPluginParam         = (null type)

TaskProlog              = (null)

TCPTimeout              = 2 sec

TmpFS                   = /tmp

TopologyParam           = (null)

TopologyPlugin          = topology/none

TrackWCKey              = No

TreeWidth               = 50

UsePam                  = No

UnkillableStepProgram   = (null)

UnkillableStepTimeout   = 60 sec

VSizeFactor             = 0 percent

WaitTime                = 0 sec

X11Parameters           = (null)

MPI Plugins Configuration:

PMIxCliTmpDirBase       = (null)

PMIxCollFence           = (null)

PMIxDebug               = 0

PMIxDirectConn          = yes

PMIxDirectConnEarly     = no

PMIxDirectConnUCX       = no

PMIxDirectSameArch      = no

PMIxEnv                 = (null)

PMIxFenceBarrier        = no

PMIxNetDevicesUCX       = (null)

PMIxTimeout             = 300

PMIxTlsUCX              = (null)

Slurmctld(primary) at o186i208 is UP

The only difference when I run this with slurm-22.05.2, is that I have to make this change or Slurm will complain.  Other than that, the same configuration is used for both slurm-22.05.2 and slurm.05.7.  In both cases, I am running on the same cluster using the same compute nodes, just pointing to different versions of Slurm.

#MpiDefault=pmix
MpiDefault=none

Seems like a regression.

Thoughts?

Thank you,
Rigoberto

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230112/b818ac52/attachment-0001.htm>