[slurm-users] GPU jobs not running correctly
Andrey Malyutin
malyutinag at gmail.com
Fri Aug 20 21:42:26 UTC 2021
Thank you Samuel,
Slurm version is 20.02.6. I'm not entirely sure about the platform, RTX6000
nodes are about 2 years old, and 3090 node is very recent. Technically we
have 4 nodes (hence references to node04 in info below), but one of the
nodes is down and out of the system at the moment. As you see, the job
really wants to run on the downed node instead of going to node02 or
node03.
Thank you again,
Andrey
*scontrol info:*
JobId=283 JobName=cryosparc_P2_J214
UserId=cryosparc(1003) GroupId=cryosparc(1003) MCS_label=N/A
Priority=4294901572 Nice=0 Account=(null) QOS=normal
JobState=PENDING Reason=ReqNodeNotAvail,_UnavailableNodes:node04
Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2021-08-20T20:55:00 EligibleTime=2021-08-20T20:55:00
AccrueTime=2021-08-20T20:55:00
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2021-08-20T23:36:14
Partition=CSCluster AllocNode:Sid=headnode:108964
ReqNodeList=(null) ExcNodeList=(null)
NodeList=(null)
NumNodes=1 NumCPUs=4 NumTasks=4 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
TRES=cpu=4,mem=24000M,node=1,billing=4
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=24000M MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=NO Contiguous=0 Licenses=(null) Network=(null)
Command=/data/backups/takeda2/data/cryosparc_projects/P8/J214/queue_sub_script.sh
WorkDir=/ssd/CryoSparc/cryosparc_master
StdErr=/data/backups/takeda2/data/cryosparc_projects/P8/J214/job.log
StdIn=/dev/null
StdOut=/data/backups/takeda2/data/cryosparc_projects/P8/J214/job.log
Power=
TresPerNode=gpu:1
MailUser=cryosparc MailType=NONE
*Script:*
#SBATCH --job-name cryosparc_P2_J214
#SBATCH -n 4
#SBATCH --gres=gpu:1
#SBATCH -p CSCluster
#SBATCH --mem=24000MB
#SBATCH
--output=/data/backups/takeda2/data/cryosparc_projects/P8/J214/job.log
#SBATCH
--error=/data/backups/takeda2/data/cryosparc_projects/P8/J214/job.log
available_devs=""
for devidx in $(seq 0 15);
do
if [[ -z $(nvidia-smi -i $devidx --query-compute-apps=pid
--format=csv,noheader) ]] ; then
if [[ -z "$available_devs" ]] ; then
available_devs=$devidx
else
available_devs=$available_devs,$devidx
fi
fi
done
export CUDA_VISIBLE_DEVICES=$available_devs
/ssd/CryoSparc/cryosparc_worker/bin/cryosparcw run --project P2 --job J214
--master_hostname headnode.cm.cluster --master_command_core_port 39002 >
/data/backups/takeda2/data/cryosparc_projects/P8/J214/job.log 2>&1
*Slurm.conf*
# This section of this file was automatically generated by cmd. Do not edit
manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
# Server nodes
SlurmctldHost=headnode
AccountingStorageHost=master
#############################################################################################
#GPU Nodes
#############################################################################################
NodeName=node[02-04] Procs=64 CoresPerSocket=16 RealMemory=257024 Sockets=2
ThreadsPerCore=2 Feature=RTX6000 Gres=gpu:4
NodeName=node01 Procs=64 CoresPerSocket=16 RealMemory=386048 Sockets=2
ThreadsPerCore=2 Feature=RTX3090 Gres=gpu:4
#NodeName=node[05-08] Procs=8 Gres=gpu:4
#
#############################################################################################
# Partitions
#############################################################################################
PartitionName=defq Default=YES MinNodes=1 DefaultTime=UNLIMITED
MaxTime=UNLIMITED AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1
OverSubscribe=NO PreemptMode=OFF AllowAccounts=ALL AllowQos=ALL
Nodes=node[01-04]
PartitionName=CSLive MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED
AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO
PreemptMode=OFF AllowAccounts=ALL AllowQos=ALL Nodes=node01
PartitionName=CSCluster MinNodes=1 DefaultTime=UNLIMITED MaxTime=UNLIMITED
AllowGroups=ALL PriorityJobFactor=1 PriorityTier=1 OverSubscribe=NO
PreemptMode=OFF AllowAccounts=ALL AllowQos=ALL Nodes=node[02-04]
ClusterName=slurm
*Gres.conf*
# This section of this file was automatically generated by cmd. Do not edit
manually!
# BEGIN AUTOGENERATED SECTION -- DO NOT REMOVE
AutoDetect=NVML
# END AUTOGENERATED SECTION -- DO NOT REMOVE
#Name=gpu File=/dev/nvidia[0-3] Count=4
#Name=mic Count=0
*Sinfo:*
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up infinite 1 down* node04
defq* up infinite 3 idle node[01-03]
CSLive up infinite 1 idle node01
CSCluster up infinite 1 down* node04
CSCluster up infinite 2 idle node[02-03]
*Node1:*
NodeName=node01 Arch=x86_64 CoresPerSocket=16
CPUAlloc=0 CPUTot=64 CPULoad=0.04
AvailableFeatures=RTX3090
ActiveFeatures=RTX3090
Gres=gpu:4
NodeAddr=node01 NodeHostName=node01 Version=20.02.6
OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020
RealMemory=386048 AllocMem=0 FreeMem=16665 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=defq,CSLive
BootTime=2021-08-04T13:59:08 SlurmdStartTime=2021-08-10T09:32:43
CfgTRES=cpu=64,mem=377G,billing=64
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
*Node2-3*
NodeName=node02 Arch=x86_64 CoresPerSocket=16
CPUAlloc=0 CPUTot=64 CPULoad=0.48
AvailableFeatures=RTX6000
ActiveFeatures=RTX6000
Gres=gpu:4(S:0-1)
NodeAddr=node02 NodeHostName=node02 Version=20.02.6
OS=Linux 3.10.0-1160.11.1.el7.x86_64 #1 SMP Fri Dec 18 16:34:56 UTC 2020
RealMemory=257024 AllocMem=0 FreeMem=2259 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=defq,CSCluster
BootTime=2021-07-29T20:47:32 SlurmdStartTime=2021-08-10T09:32:55
CfgTRES=cpu=64,mem=251G,billing=64
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s
On Thu, Aug 19, 2021, 6:07 PM Fulcomer, Samuel <samuel_fulcomer at brown.edu>
wrote:
> What SLURM version are you running?
>
> What are the #SLURM directives in the batch script? (or the sbatch
> arguments)
>
> When the single GPU jobs are pending, what's the output of 'scontrol show
> job JOBID'?
>
> What are the node definitions in slurm.conf, and the lines in gres.conf?
>
> Are the nodes all the same host platform (motherboard)?
>
> We have P100s, TitanVs, Titan RTXs, Quadro RTX 6000s, 3090s, V100s, DGX
> 1s, A6000s, and A40s, with a mix of single and dual-root platforms, and
> haven't seen this problem with SLURM 20.02.6 or earlier versions.
>
> On Thu, Aug 19, 2021 at 8:38 PM Andrey Malyutin <malyutinag at gmail.com>
> wrote:
>
>> Hello,
>>
>> We are in the process of finishing up the setup of a cluster with 3
>> nodes, 4 GPUs each. One node has RTX3090s and the other 2 have RTX6000s.Any
>> job asking for 1 GPU in the submission script will wait to run on the 3090
>> node, no matter resource availability. Same job requesting 2 or more GPUs
>> will run on any node. I don't even know where to begin troubleshooting this
>> issue; entries for the 3 nodes are effectively identical in slurm.conf. Any
>> help would be appreciated. (If helpful - this cluster is used for
>> structural biology, with cryosparc and relion packages).
>>
>> Thank you,
>> Andrey
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210820/e52d932f/attachment-0001.htm>
More information about the slurm-users
mailing list