If you run "scontrol show jobid <jobid>" of your pending job with the "(Resources)" tag you may see more about what is unavailable to your job. Slurm default configs can cause an entire compute node of resources to be "allocated" to a running job regardless of whether it needs all of them or not so you may need to alter one or both of the following settings to allow more than one job to run on a single node at once. You'll find these in your slurm.conf. Don't forget to "scontrol reconf"…
[View More] and even potentially restart both "slurmctld" & "slurmd" on your nodes if you do end up making changes.
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
I hope this helps.
Kind regards,
Jason
----
Jason Macklin
Manager Cyberinfrastructure, Research Cyberinfrastructure
860.837.2142 t | 860.202.7779 m
jason.macklin(a)jax.org
The Jackson Laboratory
Maine | Connecticut | California | Shanghai
www.jax.org<http://www.jax.org>
The Jackson Laboratory: Leading the search for tomorrow's cures
________________________________
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com> on behalf of slurm-users-request(a)lists.schedmd.com <slurm-users-request(a)lists.schedmd.com>
Sent: Friday, January 19, 2024 9:24 AM
To: slurm-users(a)lists.schedmd.com <slurm-users(a)lists.schedmd.com>
Subject: [EXTERNAL]slurm-users Digest, Vol 75, Issue 31
Send slurm-users mailing list submissions to
slurm-users(a)lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users
or, via email, send a message with subject or body 'help' to
slurm-users-request(a)lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner(a)lists.schedmd.com
When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."
Today's Topics:
1. Re: Need help with running multiple instances/executions of a
batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
(Marko Markoc)
2. Re: Need help with running multiple instances/executions of a
batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
(?mit Seren)
----------------------------------------------------------------------
Message: 1
Date: Fri, 19 Jan 2024 06:12:24 -0800
From: Marko Markoc <mmarkoc(a)pdx.edu>
To: Slurm User Community List <slurm-users(a)lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple
instances/executions of a batch script in parallel (with NVIDIA HGX
A100 GPU as a Gres)
Message-ID:
<CABnuMe4JTA0e6=VbO8D+To=8FGO+3Byv1dK_MC+OuRitzN5dXg(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
+1 on checking the memory allocation.
Or add/check if you have any DefMemPerX set in your slurm.conf
On Fri, Jan 19, 2024 at 12:33?AM mohammed shambakey <shambakey1(a)gmail.com>
wrote:
> Hi
>
> I'm not an expert, but is it possible that the currently running jobs is
> consuming the whole node because it is allocated the whole memory of the
> node (so the other 2 jobs had to wait until it finishes)?
> Maybe if you try to restrict the required memory for each job?
>
> Regards
>
> On Thu, Jan 18, 2024 at 4:46?PM ?mit Seren <uemit.seren(a)gmail.com> wrote:
>
>> This line also has tobe changed:
>>
>>
>> #SBATCH --gpus-per-node=4 ? #SBATCH --gpus-per-node=1
>>
>> --gpus-per-node seems to be the new parameter that is replacing the --gres=
>> one, so you can remove the ?gres line completely.
>>
>>
>>
>> Best
>>
>> ?mit
>>
>>
>>
>> *From: *slurm-users <slurm-users-bounces(a)lists.schedmd.com> on behalf of
>> Kherfani, Hafedh (Professional Services, TC) <hafedh.kherfani(a)hpe.com>
>> *Date: *Thursday, 18. January 2024 at 15:40
>> *To: *Slurm User Community List <slurm-users(a)lists.schedmd.com>
>> *Subject: *Re: [slurm-users] Need help with running multiple
>> instances/executions of a batch script in parallel (with NVIDIA HGX A100
>> GPU as a Gres)
>>
>> Hi Noam and Matthias,
>>
>>
>>
>> Thanks both for your answers.
>>
>>
>>
>> I changed the ?#SBATCH --gres=gpu:4? directive (in the batch script) with
>> ?#SBATCH --gres=gpu:1? as you suggested, but it didn?t make a difference,
>> as running this batch script 3 times will result in the first job to be in
>> a running state, while the second and third jobs will still be in a pending
>> state ?
>>
>>
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ cat gpu-job.sh
>>
>> #!/bin/bash
>>
>> #SBATCH --job-name=gpu-job
>>
>> #SBATCH --partition=gpu
>>
>> #SBATCH --nodes=1
>>
>> #SBATCH --gpus-per-node=4
>>
>> #SBATCH --gres=gpu:1 # <<<< Changed from ?4?
>> to ?1?
>>
>> #SBATCH --tasks-per-node=1
>>
>> #SBATCH --output=gpu_job_output.%j
>>
>> #SBATCH --error=gpu_job_error.%j
>>
>>
>>
>> hostname
>>
>> date
>>
>> sleep 40
>>
>> pwd
>>
>>
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *217*
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ squeue
>>
>> JOBID PARTITION NAME USER ST TIME NODES
>> NODELIST(REASON)
>>
>> 217 gpu gpu-job slurmtes R 0:02 1
>> c-a100-cn01
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *218*
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *219*
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ squeue
>>
>> JOBID PARTITION NAME USER ST TIME NODES
>> NODELIST(REASON)
>>
>> 219 gpu gpu-job slurmtes *PD* 0:00 1
>> (Priority)
>>
>> 218 gpu gpu-job slurmtes *PD* 0:00 1
>> (Resources)
>>
>> 217 gpu gpu-job slurmtes *R* 0:07 1
>> c-a100-cn01
>>
>>
>>
>> Basically I?m seeking for some help/hints on how to tell Slurm, from the
>> batch script for example: ?I want only 1 or 2 GPUs to be used/consumed by
>> the job?, and then I run the batch script/job a couple of times with sbatch
>> command, and confirm that we can indeed have multiple jobs using a GPU and
>> running in parallel, at the same time.
>>
>>
>>
>> Makes sense ?
>>
>>
>>
>>
>>
>> Best regards,
>>
>>
>>
>> *Hafedh *
>>
>>
>>
>> *From:* slurm-users <slurm-users-bounces(a)lists.schedmd.com> *On Behalf
>> Of *Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
>> *Sent:* jeudi 18 janvier 2024 2:30 PM
>> *To:* Slurm User Community List <slurm-users(a)lists.schedmd.com>
>> *Subject:* Re: [slurm-users] Need help with running multiple
>> instances/executions of a batch script in parallel (with NVIDIA HGX A100
>> GPU as a Gres)
>>
>>
>>
>> On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose(a)mindcode.de> wrote:
>>
>>
>>
>> Hi Hafedh,
>>
>> Im no expert in the GPU side of SLURM, but looking at you current
>> configuration to me its working as intended at the moment. You have defined
>> 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait
>> for the ressource the be free again.
>>
>> I think what you need to look into is the MPS plugin, which seems to do
>> what you are trying to achieve:
>> https://slurm.schedmd.com/gres.html#MPS_Management
>>
>>
>>
>> I agree with the first paragraph. How many GPUs are you expecting each
>> job to use? I'd have assumed, based on the original text, that each job is
>> supposed to use 1 GPU, and the 4 jobs were supposed to be running
>> side-by-side on the one node you have (with 4 GPUs). If so, you need to
>> tell each job to request only 1 GPU, and currently each one is requesting 4.
>>
>>
>>
>> If your jobs are actually supposed to be using 4 GPUs each, I still don't
>> see any advantage to MPS (at least in what is my usual GPU usage pattern):
>> all the jobs will take longer to finish, because they are sharing the fixed
>> resource. If they take turns, at least the first ones finish as fast as
>> they can, and the last one will finish no later than it would have if they
>> were all time-sharing the GPUs. I guess NVIDIA had something in mind when
>> they developed MPS, so I guess our pattern may not be typical (or at least
>> not universal), and in that case the MPS plugin may well be what you need.
>>
>
>
> --
> Mohammed
>
[View Less]
Hi all,
I am having some issue with the new version of slurm 23.11.0-1.
I had already installed and configured slurm 23.02.3-1 on my cluster and
all the services were active and running properly.
After I install with the same procedure the new version of slurm I have that
the slurmctld and slurmdbd daemons fail to start all with the same error:
(code=exited, status=217/USER)
And investigating the problem with the command journalctl -xe I find:
slurmctld.service: Failed to determine user …
[View More]credentials: No such process
slurmctld.service: Failed at step USER spawning /usr/sbin/slurmctld: No
such process
I had a look at the slurmctld.service file for both the slurm versions and
I found the following differences in the [Service] section.
>From the slurmctld.service file of slurm 23.02.3-1:
[Service]
Type=simple
EnvironmentFile=-/etc/sysconfig/slurmctld
EnvironmentFile=-/etc/default/slurmctld
ExecStart=/usr/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
TasksMax=infinity
>From the slurmctld.service file of slurm 23.11.0-1:
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/slurmctld
EnvironmentFile=-/etc/default/slurmctld
User=slurm
Group=slurm
ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
TasksMax=infinity
I think the presence of the new lines regarding the slurm user might be
the problem
but I am not sure and I have no idea how to solve it.
Can anyone halp me?
Thanks in advance,
Miriam
[View Less]
Recently, i have built an hpc cluster with slurm as workload. The test
jobs with quatum chemistry codes have worked fine. However, production
jobs with lammps have shown an unexpected behavior when the first job
completed, normally or not, cause the termination of the others in the
same compute node. Initially, I thought that was due to mpi malfunction,
but this behavior is algo observed for serial lammps code. The lammps group
said to me that behavior could be generated by slurm. My …
[View More]question to you is
about what parameter in slurm.conf could be responsible for the termination
of the other jobs. I am using an epilogue script that work normally in
another cluster.
Thanks.
[View Less]
Hi,
What are potential bad side effects of using a large/larger MessageTimeout?
And is there a value at which this setting is too large (long)?
Thanks,
Herc
Hello
I started a new AMD node, and the error is as follows:
"CPU frequency setting not configured for this node"
extended looks like this:
[2024-01-18T18:28:06.682] CPU frequency setting not configured for this node
[2024-01-18T18:28:06.691] slurmd started on Thu, 18 Jan 2024 18:28:06 +0200
[2024-01-18T18:28:06.691] CPUs=128 Boards=1 Sockets=1 Cores=64 Threads=2
Memory=256786 TmpDisk=875797 Uptime=4569 CPUSpecList=(null)
FeaturesAvail=(null) FeaturesActive=(null)
In the configuration …
[View More]file I have the following:
NodeName=awn-1[04] NodeAddr=192.168.4.[111] CPUs=128 RealMemory=256000
Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 Feature=HyperThread
Could you please help me?
Thank you
Felix
--
Dr. Eng. Farcas Felix
National Institute of Research and Development of Isotopic and Molecular Technology,
IT - Department - Cluj-Napoca, Romania
Mobile: +40742195323
[View Less]
If you run "scontrol show jobid <jobid>" of your pending job with the "(Resources)" tag you may see more about what is unavailable to your job. Slurm default configs can cause an entire compute node of resources to be "allocated" to a running job regardless of whether it needs all of them or not so you may need to alter one or both of the following settings to allow more than one job to run on a single node at once. You'll find these in your slurm.conf. Don't forget to "scontrol reconf"…
[View More] and even potentially restart both "slurmctld" & "slurmd" on your nodes if you do end up making changes.
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
I hope this helps.
Kind regards,
Jason
----
Jason Macklin
Manager Cyberinfrastructure, Research Cyberinfrastructure
860.837.2142 t | 860.202.7779 m
jason.macklin(a)jax.org
The Jackson Laboratory
Maine | Connecticut | California | Shanghai
www.jax.org<http://www.jax.org>
The Jackson Laboratory: Leading the search for tomorrow's cures
________________________________
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com> on behalf of slurm-users-request(a)lists.schedmd.com <slurm-users-request(a)lists.schedmd.com>
Sent: Thursday, January 18, 2024 9:46 AM
To: slurm-users(a)lists.schedmd.com <slurm-users(a)lists.schedmd.com>
Subject: [BULK] slurm-users Digest, Vol 75, Issue 26
Send slurm-users mailing list submissions to
slurm-users(a)lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users
or, via email, send a message with subject or body 'help' to
slurm-users-request(a)lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner(a)lists.schedmd.com
When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."
Today's Topics:
1. Re: Need help with running multiple instances/executions of a
batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
(Baer, Troy)
----------------------------------------------------------------------
Message: 1
Date: Thu, 18 Jan 2024 14:46:48 +0000
From: "Baer, Troy" <troy(a)osc.edu>
To: Slurm User Community List <slurm-users(a)lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple
instances/executions of a batch script in parallel (with NVIDIA HGX
A100 GPU as a Gres)
Message-ID:
<CH0PR01MB6924127AF471DED69151805BCF712(a)CH0PR01MB6924.prod.exchangelabs.com>
Content-Type: text/plain; charset="utf-8"
Hi Hafedh,
Your job script has the sbatch directive ??gpus-per-node=4? set. I suspect that if you look at what?s allocated to the running job by doing ?scontrol show job <jobid>? and looking at the TRES field, it?s been allocated 4 GPUs instead of one.
Regards,
--Troy
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com> On Behalf Of Kherfani, Hafedh (Professional Services, TC)
Sent: Thursday, January 18, 2024 9:38 AM
To: Slurm User Community List <slurm-users(a)lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
Hi Noam and Matthias, Thanks both for your answers. I changed the ?#SBATCH --gres=gpu:?4? directive (in the batch script) with ?#SBATCH --gres=gpu:?1? as you suggested, but it didn?t make a difference, as running
Hi Noam and Matthias,
Thanks both for your answers.
I changed the ?#SBATCH --gres=gpu:4? directive (in the batch script) with ?#SBATCH --gres=gpu:1? as you suggested, but it didn?t make a difference, as running this batch script 3 times will result in the first job to be in a running state, while the second and third jobs will still be in a pending state ?
[slurmtest@c-a100-master test-batch-scripts]$ cat gpu-job.sh
#!/bin/bash
#SBATCH --job-name=gpu-job
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --gres=gpu:1 # <<<< Changed from ?4? to ?1?
#SBATCH --tasks-per-node=1
#SBATCH --output=gpu_job_output.%j
#SBATCH --error=gpu_job_error.%j
hostname
date
sleep 40
pwd
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 217
[slurmtest@c-a100-master test-batch-scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
217 gpu gpu-job slurmtes R 0:02 1 c-a100-cn01
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 218
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 219
[slurmtest@c-a100-master test-batch-scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
219 gpu gpu-job slurmtes PD 0:00 1 (Priority)
218 gpu gpu-job slurmtes PD 0:00 1 (Resources)
217 gpu gpu-job slurmtes R 0:07 1 c-a100-cn01
Basically I?m seeking for some help/hints on how to tell Slurm, from the batch script for example: ?I want only 1 or 2 GPUs to be used/consumed by the job?, and then I run the batch script/job a couple of times with sbatch command, and confirm that we can indeed have multiple jobs using a GPU and running in parallel, at the same time.
Makes sense ?
Best regards,
Hafedh
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com<mailto:slurm-users-bounces@lists.schedmd.com>> On Behalf Of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Sent: jeudi 18 janvier 2024 2:30 PM
To: Slurm User Community List <slurm-users(a)lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>
Subject: Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose(a)mindcode.de<mailto:m.loose@mindcode.de>> wrote:
Hi Hafedh,
Im no expert in the GPU side of SLURM, but looking at you current configuration to me its working as intended at the moment. You have defined 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait for the ressource the be free again.
I think what you need to look into is the MPS plugin, which seems to do what you are trying to achieve:
https://slurm.schedmd.com/gres.html#MPS_Management<https://urldefense.com/v3/__https:/slurm.schedmd.com/gres.html*MPS_Manageme…>
I agree with the first paragraph. How many GPUs are you expecting each job to use? I'd have assumed, based on the original text, that each job is supposed to use 1 GPU, and the 4 jobs were supposed to be running side-by-side on the one node you have (with 4 GPUs). If so, you need to tell each job to request only 1 GPU, and currently each one is requesting 4.
If your jobs are actually supposed to be using 4 GPUs each, I still don't see any advantage to MPS (at least in what is my usual GPU usage pattern): all the jobs will take longer to finish, because they are sharing the fixed resource. If they take turns, at least the first ones finish as fast as they can, and the last one will finish no later than it would have if they were all time-sharing the GPUs. I guess NVIDIA had something in mind when they developed MPS, so I guess our pattern may not be typical (or at least not universal), and in that case the MPS plugin may well be what you need.
[View Less]
Hello all,
Is there an env variable in SLURM to tell where the slurm.conf is?
We would like to have on the same client node, 2 type of possible submissions to address 2 different cluster.
Thanks in advance,
Christine