If you run "scontrol show jobid <jobid>" of your pending job with the "(Resources)" tag you may see more about what is unavailable to your job. Slurm default configs can cause an entire compute node of resources to be "allocated" to a running job regardless of whether it needs all of them or not so you may need to alter one or both of the following settings to allow more than one job to run on a single node at once. You'll find these in your slurm.conf. Don't forget to "scontrol reconf"…
[View More] and even potentially restart both "slurmctld" & "slurmd" on your nodes if you do end up making changes.
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
I hope this helps.
Kind regards,
Jason
----
Jason Macklin
Manager Cyberinfrastructure, Research Cyberinfrastructure
860.837.2142 t | 860.202.7779 m
jason.macklin(a)jax.org
The Jackson Laboratory
Maine | Connecticut | California | Shanghai
www.jax.org<http://www.jax.org>
The Jackson Laboratory: Leading the search for tomorrow's cures
________________________________
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com> on behalf of slurm-users-request(a)lists.schedmd.com <slurm-users-request(a)lists.schedmd.com>
Sent: Friday, January 19, 2024 9:24 AM
To: slurm-users(a)lists.schedmd.com <slurm-users(a)lists.schedmd.com>
Subject: [EXTERNAL]slurm-users Digest, Vol 75, Issue 31
Send slurm-users mailing list submissions to
slurm-users(a)lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users
or, via email, send a message with subject or body 'help' to
slurm-users-request(a)lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner(a)lists.schedmd.com
When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."
Today's Topics:
1. Re: Need help with running multiple instances/executions of a
batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
(Marko Markoc)
2. Re: Need help with running multiple instances/executions of a
batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
(?mit Seren)
----------------------------------------------------------------------
Message: 1
Date: Fri, 19 Jan 2024 06:12:24 -0800
From: Marko Markoc <mmarkoc(a)pdx.edu>
To: Slurm User Community List <slurm-users(a)lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple
instances/executions of a batch script in parallel (with NVIDIA HGX
A100 GPU as a Gres)
Message-ID:
<CABnuMe4JTA0e6=VbO8D+To=8FGO+3Byv1dK_MC+OuRitzN5dXg(a)mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
+1 on checking the memory allocation.
Or add/check if you have any DefMemPerX set in your slurm.conf
On Fri, Jan 19, 2024 at 12:33?AM mohammed shambakey <shambakey1(a)gmail.com>
wrote:
> Hi
>
> I'm not an expert, but is it possible that the currently running jobs is
> consuming the whole node because it is allocated the whole memory of the
> node (so the other 2 jobs had to wait until it finishes)?
> Maybe if you try to restrict the required memory for each job?
>
> Regards
>
> On Thu, Jan 18, 2024 at 4:46?PM ?mit Seren <uemit.seren(a)gmail.com> wrote:
>
>> This line also has tobe changed:
>>
>>
>> #SBATCH --gpus-per-node=4 ? #SBATCH --gpus-per-node=1
>>
>> --gpus-per-node seems to be the new parameter that is replacing the --gres=
>> one, so you can remove the ?gres line completely.
>>
>>
>>
>> Best
>>
>> ?mit
>>
>>
>>
>> *From: *slurm-users <slurm-users-bounces(a)lists.schedmd.com> on behalf of
>> Kherfani, Hafedh (Professional Services, TC) <hafedh.kherfani(a)hpe.com>
>> *Date: *Thursday, 18. January 2024 at 15:40
>> *To: *Slurm User Community List <slurm-users(a)lists.schedmd.com>
>> *Subject: *Re: [slurm-users] Need help with running multiple
>> instances/executions of a batch script in parallel (with NVIDIA HGX A100
>> GPU as a Gres)
>>
>> Hi Noam and Matthias,
>>
>>
>>
>> Thanks both for your answers.
>>
>>
>>
>> I changed the ?#SBATCH --gres=gpu:4? directive (in the batch script) with
>> ?#SBATCH --gres=gpu:1? as you suggested, but it didn?t make a difference,
>> as running this batch script 3 times will result in the first job to be in
>> a running state, while the second and third jobs will still be in a pending
>> state ?
>>
>>
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ cat gpu-job.sh
>>
>> #!/bin/bash
>>
>> #SBATCH --job-name=gpu-job
>>
>> #SBATCH --partition=gpu
>>
>> #SBATCH --nodes=1
>>
>> #SBATCH --gpus-per-node=4
>>
>> #SBATCH --gres=gpu:1 # <<<< Changed from ?4?
>> to ?1?
>>
>> #SBATCH --tasks-per-node=1
>>
>> #SBATCH --output=gpu_job_output.%j
>>
>> #SBATCH --error=gpu_job_error.%j
>>
>>
>>
>> hostname
>>
>> date
>>
>> sleep 40
>>
>> pwd
>>
>>
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *217*
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ squeue
>>
>> JOBID PARTITION NAME USER ST TIME NODES
>> NODELIST(REASON)
>>
>> 217 gpu gpu-job slurmtes R 0:02 1
>> c-a100-cn01
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *218*
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
>>
>> Submitted batch job *219*
>>
>> [slurmtest@c-a100-master test-batch-scripts]$ squeue
>>
>> JOBID PARTITION NAME USER ST TIME NODES
>> NODELIST(REASON)
>>
>> 219 gpu gpu-job slurmtes *PD* 0:00 1
>> (Priority)
>>
>> 218 gpu gpu-job slurmtes *PD* 0:00 1
>> (Resources)
>>
>> 217 gpu gpu-job slurmtes *R* 0:07 1
>> c-a100-cn01
>>
>>
>>
>> Basically I?m seeking for some help/hints on how to tell Slurm, from the
>> batch script for example: ?I want only 1 or 2 GPUs to be used/consumed by
>> the job?, and then I run the batch script/job a couple of times with sbatch
>> command, and confirm that we can indeed have multiple jobs using a GPU and
>> running in parallel, at the same time.
>>
>>
>>
>> Makes sense ?
>>
>>
>>
>>
>>
>> Best regards,
>>
>>
>>
>> *Hafedh *
>>
>>
>>
>> *From:* slurm-users <slurm-users-bounces(a)lists.schedmd.com> *On Behalf
>> Of *Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
>> *Sent:* jeudi 18 janvier 2024 2:30 PM
>> *To:* Slurm User Community List <slurm-users(a)lists.schedmd.com>
>> *Subject:* Re: [slurm-users] Need help with running multiple
>> instances/executions of a batch script in parallel (with NVIDIA HGX A100
>> GPU as a Gres)
>>
>>
>>
>> On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose(a)mindcode.de> wrote:
>>
>>
>>
>> Hi Hafedh,
>>
>> Im no expert in the GPU side of SLURM, but looking at you current
>> configuration to me its working as intended at the moment. You have defined
>> 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait
>> for the ressource the be free again.
>>
>> I think what you need to look into is the MPS plugin, which seems to do
>> what you are trying to achieve:
>> https://slurm.schedmd.com/gres.html#MPS_Management
>>
>>
>>
>> I agree with the first paragraph. How many GPUs are you expecting each
>> job to use? I'd have assumed, based on the original text, that each job is
>> supposed to use 1 GPU, and the 4 jobs were supposed to be running
>> side-by-side on the one node you have (with 4 GPUs). If so, you need to
>> tell each job to request only 1 GPU, and currently each one is requesting 4.
>>
>>
>>
>> If your jobs are actually supposed to be using 4 GPUs each, I still don't
>> see any advantage to MPS (at least in what is my usual GPU usage pattern):
>> all the jobs will take longer to finish, because they are sharing the fixed
>> resource. If they take turns, at least the first ones finish as fast as
>> they can, and the last one will finish no later than it would have if they
>> were all time-sharing the GPUs. I guess NVIDIA had something in mind when
>> they developed MPS, so I guess our pattern may not be typical (or at least
>> not universal), and in that case the MPS plugin may well be what you need.
>>
>
>
> --
> Mohammed
>
[View Less]
Hi all,
I am having some issue with the new version of slurm 23.11.0-1.
I had already installed and configured slurm 23.02.3-1 on my cluster and
all the services were active and running properly.
After I install with the same procedure the new version of slurm I have that
the slurmctld and slurmdbd daemons fail to start all with the same error:
(code=exited, status=217/USER)
And investigating the problem with the command journalctl -xe I find:
slurmctld.service: Failed to determine user …
[View More]credentials: No such process
slurmctld.service: Failed at step USER spawning /usr/sbin/slurmctld: No
such process
I had a look at the slurmctld.service file for both the slurm versions and
I found the following differences in the [Service] section.
>From the slurmctld.service file of slurm 23.02.3-1:
[Service]
Type=simple
EnvironmentFile=-/etc/sysconfig/slurmctld
EnvironmentFile=-/etc/default/slurmctld
ExecStart=/usr/sbin/slurmctld -D -s $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
TasksMax=infinity
>From the slurmctld.service file of slurm 23.11.0-1:
[Service]
Type=notify
EnvironmentFile=-/etc/sysconfig/slurmctld
EnvironmentFile=-/etc/default/slurmctld
User=slurm
Group=slurm
ExecStart=/usr/sbin/slurmctld --systemd $SLURMCTLD_OPTIONS
ExecReload=/bin/kill -HUP $MAINPID
LimitNOFILE=65536
TasksMax=infinity
I think the presence of the new lines regarding the slurm user might be
the problem
but I am not sure and I have no idea how to solve it.
Can anyone halp me?
Thanks in advance,
Miriam
[View Less]
Recently, i have built an hpc cluster with slurm as workload. The test
jobs with quatum chemistry codes have worked fine. However, production
jobs with lammps have shown an unexpected behavior when the first job
completed, normally or not, cause the termination of the others in the
same compute node. Initially, I thought that was due to mpi malfunction,
but this behavior is algo observed for serial lammps code. The lammps group
said to me that behavior could be generated by slurm. My …
[View More]question to you is
about what parameter in slurm.conf could be responsible for the termination
of the other jobs. I am using an epilogue script that work normally in
another cluster.
Thanks.
[View Less]
Hi,
What are potential bad side effects of using a large/larger MessageTimeout?
And is there a value at which this setting is too large (long)?
Thanks,
Herc
Hello
I started a new AMD node, and the error is as follows:
"CPU frequency setting not configured for this node"
extended looks like this:
[2024-01-18T18:28:06.682] CPU frequency setting not configured for this node
[2024-01-18T18:28:06.691] slurmd started on Thu, 18 Jan 2024 18:28:06 +0200
[2024-01-18T18:28:06.691] CPUs=128 Boards=1 Sockets=1 Cores=64 Threads=2
Memory=256786 TmpDisk=875797 Uptime=4569 CPUSpecList=(null)
FeaturesAvail=(null) FeaturesActive=(null)
In the configuration …
[View More]file I have the following:
NodeName=awn-1[04] NodeAddr=192.168.4.[111] CPUs=128 RealMemory=256000
Sockets=1 CoresPerSocket=64 ThreadsPerCore=2 Feature=HyperThread
Could you please help me?
Thank you
Felix
--
Dr. Eng. Farcas Felix
National Institute of Research and Development of Isotopic and Molecular Technology,
IT - Department - Cluj-Napoca, Romania
Mobile: +40742195323
[View Less]
If you run "scontrol show jobid <jobid>" of your pending job with the "(Resources)" tag you may see more about what is unavailable to your job. Slurm default configs can cause an entire compute node of resources to be "allocated" to a running job regardless of whether it needs all of them or not so you may need to alter one or both of the following settings to allow more than one job to run on a single node at once. You'll find these in your slurm.conf. Don't forget to "scontrol reconf"…
[View More] and even potentially restart both "slurmctld" & "slurmd" on your nodes if you do end up making changes.
SelectType=select/cons_tres
SelectTypeParameters=CR_Core_Memory
I hope this helps.
Kind regards,
Jason
----
Jason Macklin
Manager Cyberinfrastructure, Research Cyberinfrastructure
860.837.2142 t | 860.202.7779 m
jason.macklin(a)jax.org
The Jackson Laboratory
Maine | Connecticut | California | Shanghai
www.jax.org<http://www.jax.org>
The Jackson Laboratory: Leading the search for tomorrow's cures
________________________________
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com> on behalf of slurm-users-request(a)lists.schedmd.com <slurm-users-request(a)lists.schedmd.com>
Sent: Thursday, January 18, 2024 9:46 AM
To: slurm-users(a)lists.schedmd.com <slurm-users(a)lists.schedmd.com>
Subject: [BULK] slurm-users Digest, Vol 75, Issue 26
Send slurm-users mailing list submissions to
slurm-users(a)lists.schedmd.com
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.schedmd.com/cgi-bin/mailman/listinfo/slurm-users
or, via email, send a message with subject or body 'help' to
slurm-users-request(a)lists.schedmd.com
You can reach the person managing the list at
slurm-users-owner(a)lists.schedmd.com
When replying, please edit your Subject line so it is more specific
than "Re: Contents of slurm-users digest..."
Today's Topics:
1. Re: Need help with running multiple instances/executions of a
batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
(Baer, Troy)
----------------------------------------------------------------------
Message: 1
Date: Thu, 18 Jan 2024 14:46:48 +0000
From: "Baer, Troy" <troy(a)osc.edu>
To: Slurm User Community List <slurm-users(a)lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple
instances/executions of a batch script in parallel (with NVIDIA HGX
A100 GPU as a Gres)
Message-ID:
<CH0PR01MB6924127AF471DED69151805BCF712(a)CH0PR01MB6924.prod.exchangelabs.com>
Content-Type: text/plain; charset="utf-8"
Hi Hafedh,
Your job script has the sbatch directive ??gpus-per-node=4? set. I suspect that if you look at what?s allocated to the running job by doing ?scontrol show job <jobid>? and looking at the TRES field, it?s been allocated 4 GPUs instead of one.
Regards,
--Troy
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com> On Behalf Of Kherfani, Hafedh (Professional Services, TC)
Sent: Thursday, January 18, 2024 9:38 AM
To: Slurm User Community List <slurm-users(a)lists.schedmd.com>
Subject: Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
Hi Noam and Matthias, Thanks both for your answers. I changed the ?#SBATCH --gres=gpu:?4? directive (in the batch script) with ?#SBATCH --gres=gpu:?1? as you suggested, but it didn?t make a difference, as running
Hi Noam and Matthias,
Thanks both for your answers.
I changed the ?#SBATCH --gres=gpu:4? directive (in the batch script) with ?#SBATCH --gres=gpu:1? as you suggested, but it didn?t make a difference, as running this batch script 3 times will result in the first job to be in a running state, while the second and third jobs will still be in a pending state ?
[slurmtest@c-a100-master test-batch-scripts]$ cat gpu-job.sh
#!/bin/bash
#SBATCH --job-name=gpu-job
#SBATCH --partition=gpu
#SBATCH --nodes=1
#SBATCH --gpus-per-node=4
#SBATCH --gres=gpu:1 # <<<< Changed from ?4? to ?1?
#SBATCH --tasks-per-node=1
#SBATCH --output=gpu_job_output.%j
#SBATCH --error=gpu_job_error.%j
hostname
date
sleep 40
pwd
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 217
[slurmtest@c-a100-master test-batch-scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
217 gpu gpu-job slurmtes R 0:02 1 c-a100-cn01
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 218
[slurmtest@c-a100-master test-batch-scripts]$ sbatch gpu-job.sh
Submitted batch job 219
[slurmtest@c-a100-master test-batch-scripts]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
219 gpu gpu-job slurmtes PD 0:00 1 (Priority)
218 gpu gpu-job slurmtes PD 0:00 1 (Resources)
217 gpu gpu-job slurmtes R 0:07 1 c-a100-cn01
Basically I?m seeking for some help/hints on how to tell Slurm, from the batch script for example: ?I want only 1 or 2 GPUs to be used/consumed by the job?, and then I run the batch script/job a couple of times with sbatch command, and confirm that we can indeed have multiple jobs using a GPU and running in parallel, at the same time.
Makes sense ?
Best regards,
Hafedh
From: slurm-users <slurm-users-bounces(a)lists.schedmd.com<mailto:slurm-users-bounces@lists.schedmd.com>> On Behalf Of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA)
Sent: jeudi 18 janvier 2024 2:30 PM
To: Slurm User Community List <slurm-users(a)lists.schedmd.com<mailto:slurm-users@lists.schedmd.com>>
Subject: Re: [slurm-users] Need help with running multiple instances/executions of a batch script in parallel (with NVIDIA HGX A100 GPU as a Gres)
On Jan 18, 2024, at 7:31 AM, Matthias Loose <m.loose(a)mindcode.de<mailto:m.loose@mindcode.de>> wrote:
Hi Hafedh,
Im no expert in the GPU side of SLURM, but looking at you current configuration to me its working as intended at the moment. You have defined 4 GPUs and start multiple jobs each consuming 4 GPUs each. So the jobs wait for the ressource the be free again.
I think what you need to look into is the MPS plugin, which seems to do what you are trying to achieve:
https://slurm.schedmd.com/gres.html#MPS_Management<https://urldefense.com/v3/__https:/slurm.schedmd.com/gres.html*MPS_Manageme…>
I agree with the first paragraph. How many GPUs are you expecting each job to use? I'd have assumed, based on the original text, that each job is supposed to use 1 GPU, and the 4 jobs were supposed to be running side-by-side on the one node you have (with 4 GPUs). If so, you need to tell each job to request only 1 GPU, and currently each one is requesting 4.
If your jobs are actually supposed to be using 4 GPUs each, I still don't see any advantage to MPS (at least in what is my usual GPU usage pattern): all the jobs will take longer to finish, because they are sharing the fixed resource. If they take turns, at least the first ones finish as fast as they can, and the last one will finish no later than it would have if they were all time-sharing the GPUs. I guess NVIDIA had something in mind when they developed MPS, so I guess our pattern may not be typical (or at least not universal), and in that case the MPS plugin may well be what you need.
[View Less]
Hello all,
Is there an env variable in SLURM to tell where the slurm.conf is?
We would like to have on the same client node, 2 type of possible submissions to address 2 different cluster.
Thanks in advance,
Christine
Hi,
In my HPC center, I found a SLURM job that was submitted with --gres=gpu:6 whereas the cluster has only four GPUs per node each. It is a parallel job. Here are some relevant field printout:
AllocCPUS 30
AllocGRES gpu:6
AllocTRES billing=30,cpu=30,gres/gpu=6,node=3
CPUTime 1-01:23:00
CPUTimeRAW 91380
Elapsed 00:50:46
…
[View More]JobID 20073
JobIDRaw 20073
JobName simple_cuda
NCPUS 30
NGPUS 6.0
What happened in this case? This job was asking for 3 nodes, 10 core per node. When the user specified “--gres=gpu:6”, does this mean six GPUs for the entire job, or six GPUs per node? Per the description in https://slurm.schedmd.com/gres.html#Running_Jobs, it says: gres is “Generic resources required per node”. So it is illogical to request six GPUs per node. So what happened? Did SLURM quietly ignore the request and grant just one, or grant the max number (4)? Because apparently the job ran without error.
Wirawan Purwanto
Computational Scientist, HPC Group
Information Technology Services
Old Dominion University
Norfolk, VA 23529
[View Less]
Dear All,
I tried to implement a strict limit on the GrpTRESMins for
each user. The effect I'm trying to achieve is that after the
limit of GPU minutes is reached, no new jobs can be run.
No decay, no automatic resource replenishment. After the
limit on GPU minutes is reached, each user should ask for
more minutes.
But despite exceeding the limits users *can* run new jobs.
* When I'm adding a user to the cluster I set:
sacctmgr --immediate add user name=...
...
QOS=2gpu2d
…
[View More]GrpTRESMins=gres/gpu=20000
* In the "slurm.conf" ("safe" means limits and associations
are automatically set). Storage is MariaDB with SlurmDBD:
GresTypes=gpu
AccountingStorageTRES=gres/gpu
AccountingStorageEnforce=qos,safe
# This disables GPU minutes replenishing.
PriorityDecayHalfLife=0
PriorityUsageResetPeriod=NONE
But when I look at a user's account info and usage, you can
see that the limits are not enforced.
Account User Partition QOS GrpTRESMins
---------- ---------------- ------------ ------------ --------------------
redacted redacted a6000 2gpu2d
gres/gpu=10000
--------------------------------------------------------------------------------
Top 1 Users 2024-01-05T00:00:00 - 2024-01-17T19:59:59 (1108800 secs)
Usage reported in TRES Minutes
--------------------------------------------------------------------------------
Login Used TRES Name
------------ -------- ----------------
redacted 184311 gres/gpu
redacted 1558558 cpu
Could someone explain, where could the problem be? Am I missing
something? Apparently yes :)
Kind regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
[View Less]
I would like to add a preemptable queue to our cluster. Actually I already
have. We simply want jobs submitted to that queue be preempted if there are
no resources available for jobs in other (high priority) queues.
Conceptually very simple, no conditionals, no choices, just what I wrote.
However it does not work as desired.
This is the relevant part:
grep -i Preemp /opt/slurm/slurm.conf
#PreemptType = preempt/partition_prio
PartitionName=regular DefMemPerCPU=4580 Default=True Nodes=node[01-…
[View More]12]
State=UP PreemptMode=off PriorityTier=200
PartitionName=All DefMemPerCPU=4580 Nodes=node[01-36] State=UP
PreemptMode=off PriorityTier=500
PartitionName=lowpriority DefMemPerCPU=4580 Nodes=node[01-36] State=UP
PreemptMode=cancel PriorityTier=100
That PreemptType setting (now commented) fully breaks slurm, everything
refuses to run with errors like
$ squeue
squeue: error: PreemptType and PreemptMode values incompatible
squeue: fatal: Unable to process configuration file
If I understand correctly the documentation at
https://slurm.schedmd.com/preempt.html that is because preemption cannot
cancel jobs based on partition priority, which (if true) is really
unfortunate. I understand that allowing cross-partition time-slicing could
be tricky and so I understand why that isn't allowed, but cancelling?
Anyway, I have to questions:
1) is that correct and so should I avoid using either partition priority or
cancelling?
2) is there an easy way to trick slurm into requeing and then have those
jobs cancelled instead?
3) I guess the cleanest option would be to implement QoS, but I've never
done it and we don't really need it for anything else other than this. The
documentation looks complicated, but is it? The great Ole's website is
unavailable at the moment...
Thanks!!
[View Less]
Yes, that makes sense. Thank you!
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over …
[View More]unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
[View Less]
What am I misunderstanding about how sacct filtering works here? I would have expected the second command to show the exact same results as the first.
[root@mickey ddrucker]# sacct --starttime $(date -d "7 days ago" +"%Y-%m-%d") -X --format JobID,JobName,State,Elapsed --name zsh
JobID JobName State Elapsed
------------ ---------- ---------- ----------
257713 zsh COMPLETED 00:01:02
257714 zsh COMPLETED 00:04:01
257715 zsh …
[View More]COMPLETED 00:03:01
257716 zsh COMPLETED 00:03:01
[root@mickey ddrucker]# sacct --starttime $(date -d "7 days ago" +"%Y-%m-%d") -X --format JobID,JobName,State,Elapsed --name zsh --state COMPLETED
JobID JobName State Elapsed
------------ ---------- ---------- ----------
[root@mickey ddrucker]# sinfo --version
slurm 21.08.8-2
--
Daniel M. Drucker, Ph.D.
Director of IT, MGB Imaging at Belmont
McLean Hospital, a Harvard Medical School Affiliate
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
[View Less]
> All I can say is that this has to do with --starttime and that you have to read the manual really carefully about how they interact, including when you have --endtime set. It’s a bit fiddly and annoying, IMO, and I can never quite remember how it works.
Oh, I think I understand. --starttime actually behaves differently when --state is present:
If states are given with the '-s' option then only jobs in this state at this time will be returned.
So is there a way to do what I want? I …
[View More]want to see jobs which
- started later than 7 days ago
- whose state is COMPLETED
Surely that's possible without resorting to grep?
Daniel
The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Mass General Brigham Compliance HelpLine at https://www.massgeneralbrigham.org/complianceline <https://www.massgeneralbrigham.org/complianceline> .
Please note that this e-mail is not secure (encrypted). If you do not wish to continue communication over unencrypted e-mail, please notify the sender of this message immediately. Continuing to send or respond to e-mail after receiving this message means you understand and accept this risk and wish to continue to communicate over unencrypted e-mail.
[View Less]
We have shuttered two clusters and need to remove them from the database. To do this, do we remove the table spaces associated with the cluster names from the Slurm database?
Thanks,
Jeff
All good ideas Mick –
- I've restarted slurmd on all nodes – no effect
- Ran this on all nodes:
#!/bin/bash
uname -n
id slurm
id 59999
scontrol show config | grep SlurmUser
All show slurm being that 59999 user.
- The firewalld already has the internal network interface being used set to the trusted zone
I get a bit more info out of setting the slurmctld to debug level, but I'm not sure what to make of it TBH. I'm not sure what "_handle_mult_rc_ret: PERSIST_RC is 2002 from DBD_SEND_MULT_MSG(…
[View More]1474)" is trying to tell me.
Jan 10 10:48:26 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:48:31 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:48:35 kirby slurmctld[461138]: slurmctld: debug: accounting_storage/slurmdbd: _handle_mult_rc_ret: PERSIST_RC is 2002 from DBD_SEND_MULT_MSG(1474): DBD_SEND_MULT_MSG message from invalid uid
Jan 10 10:48:36 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:48:41 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:48:46 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:48:46 kirby slurmctld[461138]: slurmctld: debug: sched/backfill: _attempt_backfill: beginning
Jan 10 10:48:46 kirby slurmctld[461138]: slurmctld: debug: sched/backfill: _attempt_backfill: no jobs to backfill
Jan 10 10:48:51 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:48:53 kirby slurmctld[461138]: slurmctld: debug: accounting_storage/slurmdbd: _handle_mult_rc_ret: PERSIST_RC is 2002 from DBD_SEND_MULT_MSG(1474): DBD_SEND_MULT_MSG message from invalid uid
Jan 10 10:48:56 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:49:01 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:49:06 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:49:11 kirby slurmctld[461138]: slurmctld: debug: accounting_storage/slurmdbd: _handle_mult_rc_ret: PERSIST_RC is 2002 from DBD_SEND_MULT_MSG(1474): DBD_SEND_MULT_MSG message from invalid uid
Jan 10 10:49:11 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:49:16 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
Jan 10 10:49:17 kirby slurmctld[461138]: slurmctld: debug: sched: Running job scheduler for full queue.
Jan 10 10:49:21 kirby slurmctld[461138]: slurmctld: error: DBD_SEND_MULT_JOB_START message from invalid uid
A bit more info / another possible clue. While "sacctmgr list Account" or "sacctmgr list user" shows expected account groups and users, "sreport user top start=12/1/23" and "sreport cluster utilization start=12/1/23" both report empty tables.
Craig Stark, Ph.D.
Professor, Department of Neurobiology and Behavior
Director, Facility for Imaging and Brain Research (FIBRE)
Director, Campus Center for Neuroimaging (CCNI)
School of Biological Sciences, University of California, Irvine
cestark(a)uci.edu<mailto:cestark@uci.edu>
[View Less]
I'm just learning about slurm. I understand that different different
partitions can be prioritized separately, and can have different max time
limits. I was wondering whether or not there was a way to have a
finer-grained prioritization based on the time limit specified by a job,
within a single partition. Or perhaps this is already happening by default?
Would the backfill scheduler be best for this?
This ticket with SchedMD implies it's a munged issue:
https://urldefense.com/v3/__https://bugs.schedmd.com/show_bug.cgi?id=1293__…
Is the munge daemon running on all systems? If it is, are all servers running a network time daemon such chronyd or ntpd and the time is in sync on all hosts?
Thanks Mick,
munge is seemingly running on all systems (systemctl status munge). I do get a warning about the munge file changing on disk, but I'm pretty sure that's from warewulf sync'ing files every …
[View More]minute. A sha256sum on the munge.key file on the compute nodes and host node says they're the same, so I think I can put that aside.
The management node runs chrony and the compute nodes sync to the management node.
[root@kirby uber]# chronyc tracking
Reference ID : 4A06A849 (t2.time.gq1.yahoo.com)
Stratum : 3
Ref time (UTC) : Mon Jan 08 22:26:44 2024
System time : 0.000032525 seconds slow of NTP time
Last offset : -0.000021390 seconds
RMS offset : 0.000055729 seconds
Frequency : 38.797 ppm slow
Residual freq : +0.001 ppm
Skew : 0.018 ppm
Root delay : 0.033342984 seconds
Root dispersion : 0.000524800 seconds
Update interval : 256.8 seconds
Leap status : Normal
vs
[root@sonic01 ~]# chronyc tracking
Reference ID : C0A80102 (warewulf)
Stratum : 4
Ref time (UTC) : Mon Jan 08 22:31:02 2024
System time : 0.000000120 seconds slow of NTP time
Last offset : -0.000000092 seconds
RMS offset : 0.000014737 seconds
Frequency : 47.495 ppm slow
Residual freq : +0.000 ppm
Skew : 0.066 ppm
Root delay : 0.033458963 seconds
Root dispersion : 0.000283949 seconds
Update interval : 64.2 seconds
Leap status : Normal
So, the compute node is talking to the host and the host is talking to generic NTP sources. "date" shows the same time on the compute nodes
[View Less]
3rd time trying to get this to come through to the list - hopefully this time works.
I've been running SLURM for several years now, but in setting it up on a new cluster, I'm hitting a recurring issue. I'm using a MariaDB and configured it just as I had in my several-year-ago setup and in the docs. There's a "slurm" user (59999) on the OS (Rocky 9), that's on all the nodes, and I've added the slurm@localhost as instructed (grant all on slurm_acct_db.* TO 'slurm'@'localhost' identified by '…
[View More]PASSWORD'). But, I keep getting things like this:
```
Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: DBD_SEND_MULT_MSG message from invalid uid 59999
Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: Processing last message from connection 7(192.168.1.2) uid(59999)
Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: CONN:7 DBD_REGISTER_CTLD message from invalid uid 59999
Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: CONN:7 Security violation, DBD_REGISTER_CTLD
Dec 22 14:22:07 kirby slurmdbd[14518]: slurmdbd: error: Processing last message from connection 7(192.168.1.2) uid(59999)
```
I'm a total SQL noob, but can at least verify that the user is in there:
MariaDB [(none)]> SELECT User, Host, Password FROM mysql.user;
+-------------+-----------+-------------------------------------------+
| User | Host | Password |
+-------------+-----------+-------------------------------------------+
| mariadb.sys | localhost | |
| root | localhost | invalid |
| mysql | localhost | invalid |
| slurm | localhost | *D6665ECF4F3CB12BCA836117F7727B6D0B78D644 |
+-------------+-----------+-------------------------------------------+
4 rows in set (0.002 sec)
Any thoughts as to where I might look to fix this?
Craig
[View Less]
Dear All,
I have a question regarding the fair-share factor of the multifactor
priority algorithm. My current understanding is that the fair-share
makes sure that different *accounts* have a fair share of the
computational power.
But what if my organisation structure is flat and I have only one
account where all my user reside. Is fair-share algorithm working
in this situation -- does it take into account users (associations)
from this single account, and tries to assing a fair-factor to each
…
[View More]user? Or each user from this account have the same fair-factor at
each iteration?
And what if I have, say 3 accounts, but I do not wan't to calculate
fair-share between accounts, but between all associations from all
3 accounts? In other words, is there a fair-share factor for
users/associations instead of accounts?
Kind regards
--
Kamil Wilczek [https://keys.openpgp.org/]
[D415917E84B8DA5A60E853B6E676ED061316B69B]
[View Less]
Hi all,
Happy new year everyone!
I've been looking for a simple tool that reports how much resources are
actually consumed by a job to help my colleagues and I adjust job
requirements. I could not find such a tool, or the ones mentioned on
this ML were not easy to install and use, so I have written a new one:
https://github.com/CEA-LIST/sprofile
It's a simple python script which parses cgroup and nvml data from the
nvidia driver. It reports duration, cpu load, peak RAM, GPU load and
…
[View More]peak GPU memory like so:
|-- sprofile report (node03) -- Time: 0:00:03 / 1:00:00 CPU load: 2.0 /
4.0 RAM peak mem: 7G / 8G GPU load: 0.2 / 2.0 GPU peak mem: 7G / 40G|
The requirements are to use the slurm cgroup plugin and to enable
accounting on the GPU (nvidia-smi --accounting-mode=1).
I hope you find this useful and let me know I you find bugs or want to
contribute.
Regards,
Nicolas Granger
[View Less]
Hello,
We are soon to install new Slurm cluster at our site. That means that we will have a total of three clusters running Slurm. Only two, that is the new clusters, will share a common file system. The original cluster has its own file system is independent of the new arrivals. If possible, we would like to try to prevent users from making significant user of all the clusters and get a 'triple whammy'. In other words, is there any way to share the fairshare information between the clusters …
[View More]so that a user's use of one of the clusters impacts their usage on the other clusters – if that makes sense. Does anyone have any thoughts on this question, please?
Am I correct in thinking that federating clusters is related to my question? Do I gather correctly, however, that federation only works if there is a common database on a shared file system?
Best regards,
David
[View Less]
I get an HTTP 404 when I try to GET /slurmdb/v0.0.39/clusters or any other /slurmdb endpoint. I get this against multiple versions of Slurm, including 23.11.1. Using GET against /slurm/v0.0.39/ping works just fine. Is there something I need to do to turn slurmdb endpoints on?
--
Gary
In my case I would like to use a slurm cluster for a sw ci/cd like solution for building sw images
My current scripted full system build takes 3-5 hours and is done serially we could easily find places where we can choose to build things in parallel hence the idea is to spawn parallel builds on the Linux slurm cluster
example: we have a list of images we iterate over in a for loop to build each thing the steps are: cd somedir then type make or run a shell script in that directory
The last …
[View More]step after the for loop would be wait for all of the child builds to complete
Once all child jobs are done we have a single job that combines or packages all the intermediate images
Really want to use slurm because our FPGA team will have a giant slurm linux cluster for Xilinix FPGA builds and those nodes can do what we need for sw purposes (reusing the existing cluster is a huge win for us)
My question is this:
Can somebody point me to some sw build examples for or using slurm? All I can seem to find is how to install
I see the srun and sbatch command man pages but no good examples
Bonus would be something that integrates into a gitlab runner example or Jenkins in some way
All I can seem to find is how to install and administer slurm not a how to use slurm
Sent from my iPhone
[View Less]
Hi all,
A problem on slurm-23.02.4-1, 10.6.16-MariaDB; Maria and Slurmctld in
active/active, SlurmDB in active/off, shared IP. Shared spool via Gluster.
DB is an upgraded version of Slurm from somewhere 2017 (upgraded various
times). The question is whether we should give up and start from scratch or
if there's an easy fix.
Problem: whenever we add a new user and add it to sacctmgr, the user shows
up properly in sacct/mgr – but never shows up with the sshare commands
after running some jobs. …
[View More]After restarting slurm a couple of times it shows
up. Problem seems to be there also in the previous version.
Only error we can see in slurmdb log:
[2023-12-21T09:43:30.586] error: slurm_persist_conn_open: Something
happened with the receiving/processing of the persistent connection init
message to 10.141.255.253:6817
: (null)
[2023-12-21T09:43:30.586] error: slurmdb_send_accounting_update_persist:
Unable to open connection to registered cluster cluster.
[2023-12-21T09:43:30.586] error: slurm_receive_msg: No response to
persist_init
[2023-12-21T09:43:30.586] error: update cluster: No error to cluster at
10.141.255.253(6817)
[2023-12-21T09:43:30.586] debug2: DBD_FINI: CLOSE:1 COMMIT:0
[2023-12-21T09:43:30.586] debug4: accounting_storage/as_mysql:
acct_storage_p_commit: got 0 commits
AccountingStorageType=accounting_storage/slurmdbd
# jobaccounting
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
SlurmctldTimeout=60
SlurmdTimeout=60
TCPTimeout=60
MessageTimeout=60
Best regards,
Alex
[View Less]
Hi all,
I had a slurm partition gpu_gmx with the following configuration (Slurm version: 20.11.9):
> NodeName=node[09-11] Gres=gpu:rtx4080:1 Sockets=1 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=62000 State=UNKNOWN
> NodeName=node[12-14] Gres=gpu:rtx4070ti:1 Sockets=1 CoresPerSocket=8 ThreadsPerCore=1 RealMemory=62000 State=UNKNOWN
> PartitionName=gpu_gmx Nodes=node[09-14] Default=NO MaxTime=UNLIMITED State=UP
A job running on node11 had a problem, which then triggered the reboot …
[View More]of all nodes (node[09-14]) within the same partition (cat /var/log/slurmctld.log):
> [2023-12-26T23:04:23.200] Batch JobId=25061 missing from batch node node11 (not found BatchStartTime after startup), Requeuing job
> [2023-12-26T23:04:23.200] _job_complete: JobId=25061 WTERMSIG 126
> [2023-12-26T23:04:23.200] _job_complete: JobId=25061 cancelled by node failure
> [2023-12-26T23:04:23.200] _job_complete: requeue JobId=25061 due to node failure
> [2023-12-26T23:04:23.200] _job_complete: JobId=25061 done
> [2023-12-26T23:04:23.200] validate_node_specs: Node node11 unexpectedly rebooted boot_time=1703603052 last response=1703602983
> [2023-12-26T23:04:23.222] validate_node_specs: Node node09 unexpectedly rebooted boot_time=1703603052 last response=1703602983
> [2023-12-26T23:04:23.579] Batch JobId=25060 missing from batch node node10 (not found BatchStartTime after startup), Requeuing job
> [2023-12-26T23:04:23.579] _job_complete: JobId=25060 WTERMSIG 126
> [2023-12-26T23:04:23.579] _job_complete: JobId=25060 cancelled by node failure
> [2023-12-26T23:04:23.579] _job_complete: requeue JobId=25060 due to node failure
> [2023-12-26T23:04:23.579] _job_complete: JobId=25060 done
> [2023-12-26T23:04:23.579] validate_node_specs: Node node10 unexpectedly rebooted boot_time=1703603052 last response=1703602983
> [2023-12-26T23:04:23.581] validate_node_specs: Node node14 unexpectedly rebooted boot_time=1703603051 last response=1703602983
> [2023-12-26T23:04:23.654] validate_node_specs: Node node13 unexpectedly rebooted boot_time=1703603052 last response=1703602983
> [2023-12-26T23:04:24.681] validate_node_specs: Node node12 unexpectedly rebooted boot_time=1703603053 last response=1703602983
> [2023-12-27T04:46:42.461] _slurm_rpc_kill_job: REQUEST_KILL_JOB JobId=25060 uid 0
> [2023-12-27T04:46:43.822] _slurm_rpc_kill_job: REQUEST_KILL_JOB JobId=25061 uid 0
The operating systems are CentOS 7.9.2009 on master node, and CentOS 8.5.2111 on node[09-14]. Does anyone have a similar experience and have a clue how to resolve this?
Thanks in advance.
Best,
Jinglei
[View Less]