Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
From: Alison Peterson via slurm-users slurm-users@lists.schedmd.com Sent: Tuesday, April 9, 2024 2:52 PM To: slurm-users@lists.schedmd.com Subject: [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
Hi Jeffrey, I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
*From:* Alison Peterson via slurm-users slurm-users@lists.schedmd.com *Sent:* Tuesday, April 9, 2024 2:52 PM *To:* slurm-users@lists.schedmd.com *Subject:* [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
Alison
Can you provide the output of the following commands:
* sinfo * scontrol show node name=head
and the job command that your trying to run?
From: Alison Peterson apeterson5@sdsu.edu Sent: Tuesday, April 9, 2024 3:03 PM To: Jeffrey R. Lang JRLang@uwyo.edu Cc: slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey, I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
From: Alison Peterson via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> Sent: Tuesday, April 9, 2024 2:52 PM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
Yes! here is the information:
[stsadmin@head ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lab* up infinite 1 down* head
[stsadmin@head ~]$ scontrol show node name=head Node name=head not found
[stsadmin@head ~]$ sbatch ~/Downloads/test.sh Submitted batch job 7
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 lab test_slu stsadmin PD 0:00 1 (ReqNodeNotAvail, UnavailableNodes:head)
On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
Can you provide the output of the following commands:
- sinfo
- scontrol show node name=head
and the job command that your trying to run?
*From:* Alison Peterson apeterson5@sdsu.edu *Sent:* Tuesday, April 9, 2024 3:03 PM *To:* Jeffrey R. Lang JRLang@uwyo.edu *Cc:* slurm-users@lists.schedmd.com *Subject:* Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey,
I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
*From:* Alison Peterson via slurm-users slurm-users@lists.schedmd.com *Sent:* Tuesday, April 9, 2024 2:52 PM *To:* slurm-users@lists.schedmd.com *Subject:* [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
Alison
The sinfo shows that your head node is down due to come configuration error.
Are you running slurmd on the head node? If slurmd, is running find the log file for it and pass along the entries from it.
Can you redo the scontrol command and “node name” should be “nodename” one word.
I need to see what’s in the test.sh file to get an idea of how your job is setup.
jeff
From: Alison Peterson apeterson5@sdsu.edu Sent: Tuesday, April 9, 2024 3:15 PM To: Jeffrey R. Lang JRLang@uwyo.edu Cc: slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Yes! here is the information:
[stsadmin@head ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lab* up infinite 1 down* head
[stsadmin@head ~]$ scontrol show node name=head Node name=head not found
[stsadmin@head ~]$ sbatch ~/Downloads/test.sh Submitted batch job 7
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 lab test_slu stsadmin PD 0:00 1 (ReqNodeNotAvail, UnavailableNodes:head)
On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
Can you provide the output of the following commands:
· sinfo
· scontrol show node name=head
and the job command that your trying to run?
From: Alison Peterson <apeterson5@sdsu.edumailto:apeterson5@sdsu.edu> Sent: Tuesday, April 9, 2024 3:03 PM To: Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey, I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
From: Alison Peterson via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> Sent: Tuesday, April 9, 2024 2:52 PM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
Aha! That is probably the issue slurmd ! I know slurmd runs on the compute nodes, I need to deploy this for a lab but I only have one of the servers with me. I will be adding them 1 by 1 after the first one is set up, to not disrupt their current setup. I want to be able to use the resources from the head and also the compute nodes once it's completed.
[stsadmin@head ~]$ sudo systemctl status slurmd Unit slurmd.service could not be found.
[stsadmin@head ~]$ scontrol show node head NodeName=head CoresPerSocket=6 CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=head NodeHostName=head RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=lab BootTime=None SlurmdStartTime=None LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None CfgTRES=cpu=24,mem=184000M,billing=24 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a Reason=Not responding [slurm@2024-04-09T10:14:10]
[stsadmin@head ~]$ cat ~/Downloads/test.sh #!/bin/bash #SBATCH --job-name=test_slurm #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=01:00:00 #SBATCH --output=test_slurm_output.txt
echo "Starting the SLURM test job on: $(date)" echo "Running on hostname: $(hostname)" echo "SLURM_JOB_ID: $SLURM_JOB_ID" echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST" echo "SLURM_NTASKS: $SLURM_NTASKS"
# Here you can place the commands you want to run on the compute node # For example, a simple sleep command or any application that needs to be tested sleep 60
echo "SLURM test job completed on: $(date)"
On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
The sinfo shows that your head node is down due to come configuration error.
Are you running slurmd on the head node? If slurmd, is running find the log file for it and pass along the entries from it.
Can you redo the scontrol command and “node name” should be “nodename” one word.
I need to see what’s in the test.sh file to get an idea of how your job is setup.
jeff
*From:* Alison Peterson apeterson5@sdsu.edu *Sent:* Tuesday, April 9, 2024 3:15 PM *To:* Jeffrey R. Lang JRLang@uwyo.edu *Cc:* slurm-users@lists.schedmd.com *Subject:* Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Yes! here is the information:
[stsadmin@head ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lab* up infinite 1 down* head
[stsadmin@head ~]$ scontrol show node name=head Node name=head not found
[stsadmin@head ~]$ sbatch ~/Downloads/test.sh Submitted batch job 7
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 lab test_slu stsadmin PD 0:00 1 (ReqNodeNotAvail, UnavailableNodes:head)
On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
Can you provide the output of the following commands:
· sinfo
· scontrol show node name=head
and the job command that your trying to run?
*From:* Alison Peterson apeterson5@sdsu.edu *Sent:* Tuesday, April 9, 2024 3:03 PM *To:* Jeffrey R. Lang JRLang@uwyo.edu *Cc:* slurm-users@lists.schedmd.com *Subject:* Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey,
I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
*From:* Alison Peterson via slurm-users slurm-users@lists.schedmd.com *Sent:* Tuesday, April 9, 2024 2:52 PM *To:* slurm-users@lists.schedmd.com *Subject:* [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
Alison
In your case since you are using head as both a slurm management node and a compute node you’ll need to setup slurmd on the head node.
Once the slurmd is running use “sinfo” to see what the status of the node is. Most likely down hopefully without an astrick. If that’s the case then use
scontrol update node=head state=resume
and then check the status again. Hopwfully the node with show idle meaning that it’s should be ready to accept jobs.
Jeff
From: Alison Peterson apeterson5@sdsu.edu Sent: Tuesday, April 9, 2024 3:40 PM To: Jeffrey R. Lang JRLang@uwyo.edu Cc: slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Aha! That is probably the issue slurmd ! I know slurmd runs on the compute nodes, I need to deploy this for a lab but I only have one of the servers with me. I will be adding them 1 by 1 after the first one is set up, to not disrupt their current setup. I want to be able to use the resources from the head and also the compute nodes once it's completed.
[stsadmin@head ~]$ sudo systemctl status slurmd Unit slurmd.service could not be found.
[stsadmin@head ~]$ scontrol show node head NodeName=head CoresPerSocket=6 CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=head NodeHostName=head RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=lab BootTime=None SlurmdStartTime=None LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None CfgTRES=cpu=24,mem=184000M,billing=24 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a Reason=Not responding [slurm@2024-04-09T10:14:10]
[stsadmin@head ~]$ cat ~/Downloads/test.sh #!/bin/bash #SBATCH --job-name=test_slurm #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=01:00:00 #SBATCH --output=test_slurm_output.txt
echo "Starting the SLURM test job on: $(date)" echo "Running on hostname: $(hostname)" echo "SLURM_JOB_ID: $SLURM_JOB_ID" echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST" echo "SLURM_NTASKS: $SLURM_NTASKS"
# Here you can place the commands you want to run on the compute node # For example, a simple sleep command or any application that needs to be tested sleep 60
echo "SLURM test job completed on: $(date)"
On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
The sinfo shows that your head node is down due to come configuration error.
Are you running slurmd on the head node? If slurmd, is running find the log file for it and pass along the entries from it.
Can you redo the scontrol command and “node name” should be “nodename” one word.
I need to see what’s in the test.sh file to get an idea of how your job is setup.
jeff
From: Alison Peterson <apeterson5@sdsu.edumailto:apeterson5@sdsu.edu> Sent: Tuesday, April 9, 2024 3:15 PM To: Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Yes! here is the information:
[stsadmin@head ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lab* up infinite 1 down* head
[stsadmin@head ~]$ scontrol show node name=head Node name=head not found
[stsadmin@head ~]$ sbatch ~/Downloads/test.sh Submitted batch job 7
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 lab test_slu stsadmin PD 0:00 1 (ReqNodeNotAvail, UnavailableNodes:head)
On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
Can you provide the output of the following commands:
• sinfo
• scontrol show node name=head
and the job command that your trying to run?
From: Alison Peterson <apeterson5@sdsu.edumailto:apeterson5@sdsu.edu> Sent: Tuesday, April 9, 2024 3:03 PM To: Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey, I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
From: Alison Peterson via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> Sent: Tuesday, April 9, 2024 2:52 PM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
Thank you so much!!! I have installed slurmd on the head node. Started and enabled the service, restarted slurmctld. I sent 2 jobs and they are running!
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 10 lab test_slu stsadmin R 0:01 1 head 9 lab test_slu stsadmin R 0:09 1 head
On Tue, Apr 9, 2024 at 1:54 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
In your case since you are using head as both a slurm management node and a compute node you’ll need to setup slurmd on the head node.
Once the slurmd is running use “sinfo” to see what the status of the node is. Most likely down hopefully without an astrick. If that’s the case then use
scontrol update node=head state=resume
and then check the status again. Hopwfully the node with show idle meaning that it’s should be ready to accept jobs.
Jeff
*From:* Alison Peterson apeterson5@sdsu.edu *Sent:* Tuesday, April 9, 2024 3:40 PM *To:* Jeffrey R. Lang JRLang@uwyo.edu *Cc:* slurm-users@lists.schedmd.com *Subject:* Re: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Aha! That is probably the issue slurmd ! I know slurmd runs on the compute nodes, I need to deploy this for a lab but I only have one of the servers with me. I will be adding them 1 by 1 after the first one is set up, to not disrupt their current setup. I want to be able to use the resources from the head and also the compute nodes once it's completed.
[stsadmin@head ~]$ sudo systemctl status slurmd Unit slurmd.service could not be found.
[stsadmin@head ~]$ scontrol show node head NodeName=head CoresPerSocket=6 CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=head NodeHostName=head RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=lab BootTime=None SlurmdStartTime=None LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None CfgTRES=cpu=24,mem=184000M,billing=24 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a Reason=Not responding [slurm@2024-04-09T10:14:10]
[stsadmin@head ~]$ cat ~/Downloads/test.sh #!/bin/bash #SBATCH --job-name=test_slurm #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=01:00:00 #SBATCH --output=test_slurm_output.txt
echo "Starting the SLURM test job on: $(date)" echo "Running on hostname: $(hostname)" echo "SLURM_JOB_ID: $SLURM_JOB_ID" echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST" echo "SLURM_NTASKS: $SLURM_NTASKS"
# Here you can place the commands you want to run on the compute node # For example, a simple sleep command or any application that needs to be tested sleep 60
echo "SLURM test job completed on: $(date)"
On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
The sinfo shows that your head node is down due to come configuration error.
Are you running slurmd on the head node? If slurmd, is running find the log file for it and pass along the entries from it.
Can you redo the scontrol command and “node name” should be “nodename” one word.
I need to see what’s in the test.sh file to get an idea of how your job is setup.
jeff
*From:* Alison Peterson apeterson5@sdsu.edu *Sent:* Tuesday, April 9, 2024 3:15 PM *To:* Jeffrey R. Lang JRLang@uwyo.edu *Cc:* slurm-users@lists.schedmd.com *Subject:* Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Yes! here is the information:
[stsadmin@head ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lab* up infinite 1 down* head
[stsadmin@head ~]$ scontrol show node name=head Node name=head not found
[stsadmin@head ~]$ sbatch ~/Downloads/test.sh Submitted batch job 7
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 lab test_slu stsadmin PD 0:00 1 (ReqNodeNotAvail, UnavailableNodes:head)
On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
Can you provide the output of the following commands:
· sinfo
· scontrol show node name=head
and the job command that your trying to run?
*From:* Alison Peterson apeterson5@sdsu.edu *Sent:* Tuesday, April 9, 2024 3:03 PM *To:* Jeffrey R. Lang JRLang@uwyo.edu *Cc:* slurm-users@lists.schedmd.com *Subject:* Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey,
I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang JRLang@uwyo.edu wrote:
Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
*From:* Alison Peterson via slurm-users slurm-users@lists.schedmd.com *Sent:* Tuesday, April 9, 2024 2:52 PM *To:* slurm-users@lists.schedmd.com *Subject:* [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
--
*Alison Peterson*
IT Research Support Analyst *Information Technology*
apeterson5@sdsu.edu mfarley@sdsu.edu
O: 619-594-3364
*San Diego State University | **SDSU.edu http://sdsu.edu/*
5500 Campanile Drive | San Diego, CA 92182-8080
Alison
I’m glad I was able to help. Good luck.
Jeff
From: Alison Peterson apeterson5@sdsu.edu Sent: Tuesday, April 9, 2024 4:09 PM To: Jeffrey R. Lang JRLang@uwyo.edu Cc: slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Thank you so much!!! I have installed slurmd on the head node. Started and enabled the service, restarted slurmctld. I sent 2 jobs and they are running!
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 10 lab test_slu stsadmin R 0:01 1 head 9 lab test_slu stsadmin R 0:09 1 head
On Tue, Apr 9, 2024 at 1:54 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
In your case since you are using head as both a slurm management node and a compute node you’ll need to setup slurmd on the head node.
Once the slurmd is running use “sinfo” to see what the status of the node is. Most likely down hopefully without an astrick. If that’s the case then use
scontrol update node=head state=resume
and then check the status again. Hopwfully the node with show idle meaning that it’s should be ready to accept jobs.
Jeff
From: Alison Peterson <apeterson5@sdsu.edumailto:apeterson5@sdsu.edu> Sent: Tuesday, April 9, 2024 3:40 PM To: Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Aha! That is probably the issue slurmd ! I know slurmd runs on the compute nodes, I need to deploy this for a lab but I only have one of the servers with me. I will be adding them 1 by 1 after the first one is set up, to not disrupt their current setup. I want to be able to use the resources from the head and also the compute nodes once it's completed.
[stsadmin@head ~]$ sudo systemctl status slurmd Unit slurmd.service could not be found.
[stsadmin@head ~]$ scontrol show node head NodeName=head CoresPerSocket=6 CPUAlloc=0 CPUEfctv=24 CPUTot=24 CPULoad=0.00 AvailableFeatures=(null) ActiveFeatures=(null) Gres=(null) NodeAddr=head NodeHostName=head RealMemory=184000 AllocMem=0 FreeMem=N/A Sockets=2 Boards=1 State=DOWN+NOT_RESPONDING ThreadsPerCore=2 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A Partitions=lab BootTime=None SlurmdStartTime=None LastBusyTime=2024-04-09T13:20:04 ResumeAfterTime=None CfgTRES=cpu=24,mem=184000M,billing=24 AllocTRES= CapWatts=n/a CurrentWatts=0 AveWatts=0 ExtSensorsJoules=n/a ExtSensorsWatts=0 ExtSensorsTemp=n/a Reason=Not responding [slurm@2024-04-09T10:14:10]
[stsadmin@head ~]$ cat ~/Downloads/test.sh #!/bin/bash #SBATCH --job-name=test_slurm #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --time=01:00:00 #SBATCH --output=test_slurm_output.txt
echo "Starting the SLURM test job on: $(date)" echo "Running on hostname: $(hostname)" echo "SLURM_JOB_ID: $SLURM_JOB_ID" echo "SLURM_JOB_NODELIST: $SLURM_JOB_NODELIST" echo "SLURM_NTASKS: $SLURM_NTASKS"
# Here you can place the commands you want to run on the compute node # For example, a simple sleep command or any application that needs to be tested sleep 60
echo "SLURM test job completed on: $(date)"
On Tue, Apr 9, 2024 at 1:21 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
The sinfo shows that your head node is down due to come configuration error.
Are you running slurmd on the head node? If slurmd, is running find the log file for it and pass along the entries from it.
Can you redo the scontrol command and “node name” should be “nodename” one word.
I need to see what’s in the test.sh file to get an idea of how your job is setup.
jeff
From: Alison Peterson <apeterson5@sdsu.edumailto:apeterson5@sdsu.edu> Sent: Tuesday, April 9, 2024 3:15 PM To: Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Yes! here is the information:
[stsadmin@head ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST lab* up infinite 1 down* head
[stsadmin@head ~]$ scontrol show node name=head Node name=head not found
[stsadmin@head ~]$ sbatch ~/Downloads/test.sh Submitted batch job 7
[stsadmin@head ~]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 7 lab test_slu stsadmin PD 0:00 1 (ReqNodeNotAvail, UnavailableNodes:head)
On Tue, Apr 9, 2024 at 1:07 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
Can you provide the output of the following commands:
• sinfo
• scontrol show node name=head
and the job command that your trying to run?
From: Alison Peterson <apeterson5@sdsu.edumailto:apeterson5@sdsu.edu> Sent: Tuesday, April 9, 2024 3:03 PM To: Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> Cc: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: Re: [EXT] RE: [slurm-users] Nodes required for job are down, drained or reserved
Hi Jeffrey, I'm sorry I did add the head node in the compute nodes configuration, this is the slurm.conf
# COMPUTE NODES NodeName=head CPUs=24 RealMemory=184000 Sockets=2 CoresPerSocket=6 ThreadsPerCore=2 State=UNKNOWN PartitionName=lab Nodes=ALL Default=YES MaxTime=INFINITE State=UP OverSubscribe=Force
On Tue, Apr 9, 2024 at 12:57 PM Jeffrey R. Lang <JRLang@uwyo.edumailto:JRLang@uwyo.edu> wrote: Alison
The error message indicates that there are no resources to execute jobs. Since you haven’t defined any compute nodes you will get this error.
I would suggest that you create at least one compute node. Once, you do that this error should go away.
Jeff
From: Alison Peterson via slurm-users <slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com> Sent: Tuesday, April 9, 2024 2:52 PM To: slurm-users@lists.schedmd.commailto:slurm-users@lists.schedmd.com Subject: [slurm-users] Nodes required for job are down, drained or reserved
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi everyone, I'm conducting some tests. I've just set up SLURM on the head node and haven't added any compute nodes yet. I'm trying to test it to ensure it's working, but I'm encountering an error: 'Nodes required for the job are DOWN, DRAINED, or reserved for jobs in higher priority partitions.
Any guidance will be appreciated thank you!
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]
-- Alison Peterson IT Research Support Analyst Information Technology apeterson5@sdsu.edumailto:mfarley@sdsu.edu O: 619-594-3364 San Diego State University | SDSU.eduhttp://sdsu.edu/ 5500 Campanile Drive | San Diego, CA 92182-8080 [https://brand.sdsu.edu/_images/sdsu-monogram-email.png]