Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
[stsadmin@head ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6 lab test_slu stsadmin PD 0:00 1 (Nodes required for job are DOWN, DRAINED or reserved for jobs in higher priority partitions)
[stsadmin@head ~]$ scontrol show job 6
JobId=6 JobName=test_slurm
UserId=stsadmin(1000) GroupId=stsadmin(1000) MCS_label=N/A
Priority=1 Nice=0 Account=(null) QOS=normal
JobState=PENDING Reason=Nodes_required_for_job_are_DOWN,_DRAINED_or_reserved_for_jobs_in_higher_priority_partitions Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=01:00:00 TimeMin=N/A
SubmitTime=2024-04-09T10:43:14 EligibleTime=2024-04-09T10:43:14
AccrueTime=2024-04-09T10:43:14
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-04-09T10:43:23 Scheduler=Backfill:*
Partition=lab AllocNode:Sid=head:5147
ReqNodeList=(null) ExcNodeList=(null)
NodeList=
NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
ReqTRES=cpu=1,mem=1G,node=1,billing=1
AllocTRES=(null)
Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryCPU=1G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=YES Contiguous=0 Licenses=(null) Network=(null)
Command=/home/stsadmin/Downloads/test.sh
WorkDir=/home/stsadmin
StdErr=/home/stsadmin/test_slurm_output.txt
StdIn=/dev/null
StdOut=/home/stsadmin/test_slurm_output.txt
Power=
[stsadmin@head ~]$ scontrol show node head
NodeName=head CoresPerSocket=6
CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00
AvailableFeatures=(null)
ActiveFeatures=(null)
Gres=(null)
NodeAddr=head NodeHostName=head
RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1
State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
Partitions=lab
BootTime=None SlurmdStartTime=None
LastBusyTime=2024-04-09T10:42:53 ResumeAfterTime=None
CfgTRES=cpu=24,mem=184000M,billing=24
AllocTRES=
CapWatts=n/a
CurrentWatts=0 AveWatts=0
ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a
Reason=Not responding [slurm@2024-04-09T10:14:10]
I will take any advice to guide me in the proper direction, thank you!
-- Alison Peterson
IT Research Support Analyst
Information Technology
O: 619-594-3364
5500 Campanile Drive | San Diego, CA 92182-8080