Manual compilation of 24.05.4. slurmctld and slurmd run on same server. All works ok but all test jobs end up pending with InvalidAccount message. I do not use slurm database and have not enabled accounting. Can not find an answer for this behavior or a misconfiguration. slurm.conf file was generated using easy config tool. Any ideas how to fix this? Thx,
-Henk
## looks like all users have access to test queue
[hmeij@sharptail2 slurm]$ sinfo -o "%g %.10R %.20l"
GROUPS PARTITION …
[View More]TIMELIMIT
all test infinite
[hmeij@sharptail2 slurm]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
test* up infinite 1 idle sharptail2
## simple sleep job
[hmeij@sharptail2 slurm]$ sbatch sleep
Submitted batch job 8
[hmeij@sharptail2 slurm]$ squeue
JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_MEMORY NODELIST(REASON)
8 test sleep hmeij PD 0:00 1 1 1G (InvalidAccount)
[hmeij@sharptail2 slurm]$ scontrol show job 8
JobId=8 JobName=sleep
UserId=hmeij(8216) GroupId=its(623) MCS_label=N/A
Priority=1 Nice=0 Account=(null) QOS=(null)
JobState=PENDING Reason=InvalidAccount Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
RunTime=00:00:00 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2024-11-11T13:27:14 EligibleTime=2024-11-11T13:27:14
AccrueTime=2024-11-11T13:27:14
StartTime=Unknown EndTime=Unknown Deadline=N/A
SuspendTime=None SecsPreSuspend=0 LastSchedEval=2024-11-11T13:27:14 Scheduler=Main
Partition=test AllocNode:Sid=sharptail2:644662
ReqNodeList=(null) ExcNodeList=(null)
NodeList=
NumNodes=1-1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:1:1
ReqTRES=cpu=1,mem=1G,node=1,billing=1
AllocTRES=(null)
Socks/Node=1 NtasksPerN:B:S:C=1:0:*:* CoreSpec=*
MinCPUsNode=1 MinMemoryNode=1G MinTmpDiskNode=0
Features=(null) DelayBoot=00:00:00
OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/zfshomes/hmeij/slurm/sleep
WorkDir=/zfshomes/hmeij/slurm
StdErr=/zfshomes/hmeij/slurm/err
StdIn=/dev/null
StdOut=/zfshomes/hmeij/slurm/out
TresPerTask=cpu=1
## within a minute or so that InvalidAccount changes to None (
## but job remains pending; 1-7 stuck over the weekend)
[hmeij@sharptail2 slurm]$ squeue
JOBID PARTITION NAME USER ST TIME NODES CPUS MIN_MEMORY NODELIST(REASON)
8 test sleep hmeij PD 0:00 1 1 1G (None)
## in the slurmctld.log
slurmctld: sched: JobId=8 has invalid account
slurmctld: debug: set_job_failed_assoc_qos_ptr: Filling in assoc for JobId=8 Assoc=0
slurmctld: debug: sched: Running job scheduler for full queue.
slurmctld: error: _refresh_assoc_mgr_qos_list: no new list given back keeping cached one.
##and the slurm.conf accounting section (both AccountingStorageType lines yield same behavior)
#AccountingStorageType=
AccountingStorageType=accounting_storage/none
#JobAcctGatherFrequency=30
#JobAcctGatherType=
## using
SchedulerType = sched/builtin
[View Less]
Hello,
is there any way to listen to job state changes of slurm 23.x or newer?
I’d like to kind of subscribe to job state changes instead of polling
for job states.
Adding this feature to slurm accounting DB seems to be last option right
now, although I’d like to avoid it.
Thanks&Best
Egonle
Dear Slurm User list,
I would like to startup all ~idle (idle and powered down) nodes and
check programmatically if all came up as expected. For context: this is
for a program that sets up slurm clusters with on demand cloud scheduling.
In the most easiest fashion this could be executing a command like *srun
FORALL hostname* which would return the names of the nodes if it
succeeds and an error message otherwise. However, there's no such input
value like FORALL as far as I am aware. One could …
[View More]use -N{total node
number} as all nodes are ~idle when this executes, but I don't know an
easy way to get the total number of nodes.
Best regards,
Xaver
[View Less]
Hi,
I'm using slurm on a small 8 nodes cluster. I've recently added one GPU
node with two Nvidia A100, one with 40Gb of RAM and one with 80Gb.
As using this GPU resource increase I would like to manage this resource
with Gres to avoid usage conflict. But at this time my setup do not
works as I can reach a GPU without reserving it:
srun -n 1 -p tenibre-gpu ./a.out
can use a GPU even if the reservation do not specify this resource
(checked with running nvidia-smi on the node). "…
[View More]tenibre-gpu" is a slurm
partition with only this gpu node.
From the documentation I've created a gres.conf file and it is
propagated on all the nodes (9 compute nodes, 1 login node and the
management node) and slurmd has been restarted.
gres.conf is:*
## GPU setup on tenibre-gpu-0
NodeName=tenibre-gpu-0 Name=gpu Type=A100-40 File=/dev/nvidia0
Flags=nvidia_gpu_env
NodeName=tenibre-gpu-0 Name=gpu Type=A100-80 File=/dev/nvidia1
Flags=nvidia_gpu_env
*
*
In slurm.conf I have checked these flags:
## Basic scheduling
SelectTypeParameters=CR_Core_Memory
SchedulerType=sched/backfill
SelectType=select/cons_tres
## Generic resources
GresTypes=gpu
## Nodes list
....
Nodename=tenibre-gpu-0 RealMemory=257270 Sockets=2 CoresPerSocket=16
ThreadsPerCore=1 State=UNKNOWN
....
#partitions
PartitionName=tenibre-gpu MaxTime=48:00:00 DefaultTime=12:00:00
DefMemPerCPU=4096 MaxMemPerCPU=8192 Shared=YES State=UP
Nodes=tenibre-gpu-0
...
May be I've missed something ? I'm running Slurm 20.11.7-1.
Thanks for your advices.
Patrick
[View Less]
Hi,
I'm trying to compile Slurm with NVIDIA NVML support, but the result is
unexpected. I get /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so, but when
I do "ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so" there is no
reference to /lib/x86_64-linux-gnu/libnvidia-ml.so.1 (which I would
expect).
~$ ldd /usr/lib/x86_64-linux-gnu/slurm/gpu_nvml.so
linux-vdso.so.1 (0x00007ffd9a3f4000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0bc2c06000)
/lib64/ld-linux-…
[View More]x86-64.so.2 (0x00007f0bc2e47000)
/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is present during compilation.
Also I can see that NVML headers where found in config.status (else I
wouldn't get gpu_nvml.so at all to my understanding).
Our old cluster was deployed with NVIDIA deepops (which compiles Slurm
on every node) and also has NVML support. There ldd brings the expected
result
~$ ldd /usr/local/lib/slurm/gpu_nvml.so
...
libnvidia-ml.so.1 => /lib/x86_64-linux-gnu/libnvidia-ml.so.1
(0x00007f3b10120000)
...
I can't test actual functionality with my new binaries because I don't
have a node with GPUs yet.
Am I missing something?
thank you
Matthias
[View Less]
We are pleased to announce the availability of Slurm release candidate
24.11.0rc1.
To highlight some new features coming in 24.11:
- New gpu/nvidia plugin. This does not rely on any NVIDIA libraries, and
will build by default on all systems. It supports basic GPU detection
and management, but cannot currently identify GPU-to-GPU links, or
provide usage data as these are not exposed by the kernel driver.
- Add autodetected GPUs to the output from "slurmd -C".
- Added new QOS-based reports …
[View More]to "sreport".
- Revamped network I/O with the "conmgr" thread-pool model.
- Added new "hostlist function" syntax for management commands and
configuration files.
- switch/hpe_slingshot - Added support for hardware collectives setup
through the fabric manager. (Requires SlurmctldParameters=enable_stepmgr)
- Added SchedulerParameters=bf_allow_magnetic_slot configuration option
to allow backfill planning for magnetic reservations.
- Added new "scontrol listjobs" and "liststeps" commands to complement
"listpids", and provide --json/--yaml output for all three subcommands.
- Allow jobs to be submitted against multiple QOSes.
- Added new experimental "oracle" backfill scheduling support, which
permits jobs to be delayed if the oracle function determines the reduced
fragmentation of the network topology is sufficiently advantageous.
- Improved responsiveness of the controller when jobs are requeued by
replacing the "db_index" identifier with a slurmctld-generated unique
identifier. ("SLUID")
- New options to job_container/tmpfs to permit site-specific scripts to
modify the namespace before user steps are launched, and to ensure all
steps are completely captured within that namespace.
This is the first release candidate of the upcoming 24.11 release
series, and represents the end of development for this release, and a
finalization of the RPC and state file formats.
If any issues are identified with this release candidate, please report
them through https://bugs.schedmd.com against the 24.11.x version and we
will address them before the first production 24.11.0 release is made.
Please note that the release candidates are not intended for production use.
A preview of the updated documentation can be found at
https://slurm.schedmd.com/archive/slurm-master/ .
Slurm can be downloaded from https://www.schedmd.com/downloads.php .
--
Marshall Garey
Release Management, Support, and Development
SchedMD LLC - Commercial Slurm Development and Support
[View Less]
Hi,
I'm running on Ubuntu 20.04. I've got a clean configuration of slurmctld
and slurmd on one node.
1) I've configured oci.conf to the defaults defined by "OCI.CONF EXAMPLE
FOR RUNC USING RUN (RECOMMENDED OVER USING CREATE/START):".
I have a container that I can run by hand:
runc --rootless true run -b
/opt/pilot_results/results.20241108-184831/step1 test
sh-4.2# exit
exit
and it returns.
However, when I
srun --container=/opt/pilot_results/results.20241108-184831/step1 ls
it hangs …
[View More]after completing the ls, and I have to double ctrl-c out of it.
2) I tried using the configuration for RUNC with Create/Start and it hangs
on start.
3) I tried using the configuration for CRUN using RUN, I can run the
container by hand with crun, but srun fails with:
srun --container=/opt/pilot_results/results.20241108-194914/step1 bash
bind socket to `/run/user/1008//pd-builds-bench-1.jrp.34.0.0/notify`:
Address already in use
sync socket closed
srun: error: pd-builds-bench-1: task 0: Exited with exit code 1
4) I tried using the configuration for CRUN with Create/Start and it errors
repeatedly with:
slurmstepd: error: _get_container_state: RunTimeQuery failed rc:256
output:error opening file
`/run/user/1008//pd-builds-bench-1.jrp.51.0.0/status`: No such file or
directory
I went through the (open and closed) support tickets and couldn't find
anything that reflects any of these errors, and I'm pretty stuck at this
point.
Any help would be welcome.
Thanks,
JRP
[View Less]
We are running into a problem where slurmctld is segfaulting a few
times a day. We had this problem with SLURM 23.11.8 and now with 23.11.10
as well, though the problem only appears on one of the several SLURM
clusters we have, and all of them use one of those versions of SLURM. I was
wondering if anyone has encountered a similar issue and has any thoughts on
how to prevent this.
Obviously we use "SchedulerType=sched/backfill" but strangely when
I switched to sched/builtin for a while …
[View More]there were still slurmctld
segfaults. We also set
"SchedulerParameters=enable_user_top,bf_max_job_test=2000". I have tried
turning those off but it did not help. I have also tried tweaking several
other settings to no avail. Most of the cluster runs Rocky Linux 8.10
(including the slurmctld system) though we still have some Scientific Linux
7.9 compute nodes (we compile SLURM separately for those).
Here is the crash-time error from journalctl:
Oct 02 06:31:20 our.host.name kernel: sched_agent[2048355]: segfault at 8
ip 00007fec755d7ea8 sp 00007fec6bffe7e8 error 4 in
libslurmfull.so[7fec7555a000+1f4000]
Oct 02 06:31:20 our.host.name kernel: Code: 48 39 c1 7e 19 48 c1 f8 06 ba
01 00 00 00 48 d3 e2 48 f7 da 48 0b 54 c6 10 48 21 54 c7 10 c3 b8 00 00 00
00 eb da 48 8b 4f 08 <48> 39 4e 08 48 0f 4e 4e 08 49 89 c9 48 83 f9 3f 76
4e ba 40 00 00
Oct 02 06:31:20 our.host.name systemd[1]: Started Process Core Dump (PID
2169426/UID 0).
Oct 02 06:31:20 our.host.name systemd-coredump[2169427]: Process 2048344 (
slurmctld) of user 991 dumped core.
This is followed by a list of each of the dozen or so related threads. The
one which is dumping core is first and looks like this:
Stack trace of thread 2048355:
#0 0x00007fec755d7ea8 bit_and_not (libslurmfull.so)
#1 0x000000000044531f _job_alloc (slurmctld)
#2 0x000000000044576b _job_alloc_whole_node_internal (slurmctld)
#3 0x0000000000446e6d gres_ctld_job_alloc_whole_node (slurmctld)
#4 0x00007fec722e29b8 job_res_add_job (select_cons_tres.so)
#5 0x00007fec722f7c32 select_p_select_nodeinfo_set (select_cons_tres.so)
#6 0x00007fec756e7dc7 select_g_select_nodeinfo_set (libslurmfull.so)
#7 0x0000000000496eb3 select_nodes (slurmctld)
#8 0x0000000000480826 _schedule (slurmctld)
#9 0x00007fec753421ca start_thread (libpthread.so.0)
#10 0x00007fec745f78d3 __clone (libc.so.6)
I have run slurmctld with "debug5" level logging and it appears that
the error occurs right after backfill considers a large number of jobs.
Slurmctld could be failing at the end of backfill or when doing something
which happens just after backfill runs. Usually this is the last message
before the crash:
[2024-09-25T18:39:42.076] slurmscriptd: debug: _slurmscriptd_mainloop:
finished
If anyone has any thoughts or advice on this that would be
appreciated. Thank you.
--
Marcus Lauer
Systems Administrator
CETS Group, Research Support
[View Less]
I am trying to find the GPU hour utilization for a user during a specific time period using the sacct and sreport commands. However, I am noticing a significant difference between the outputs of these two commands.
Could you explain the reasons for this discrepancy? Are there specific factors or configurations in SLURM that could lead to variations in the reported GPU hours?
There is a significant discrepancy in the results produced for GPU hour utilization by the sacct and sreport commands …
[View More]in SLURM
Thanks,
Manisha
------------------------------------------------------------------------------------------------------------
[ C-DAC is on Social-Media too. Kindly follow us at:
Facebook: https://www.facebook.com/CDACINDIA & Twitter: @cdacindia ]
This e-mail is for the sole use of the intended recipient(s) and may
contain confidential and privileged information. If you are not the
intended recipient, please contact the sender by reply e-mail and destroy
all copies and the original message. Any unauthorized review, use,
disclosure, dissemination, forwarding, printing or copying of this email
is strictly prohibited and appropriate legal action will be taken.
------------------------------------------------------------------------------------------------------------
[View Less]
I have cluster which uses Slurm 23.11.6
When I submit a multi-node job and run something like
clush -b -w $SLURM_JOB_NODELIST "date"
very often the ssh command fails with:
Access denied by pam_slurm_adopt: you have no active jobs on this node
This will happen maybe on 50% of the nodes
There is the same behaviour of I salloc a number of nodes then try to ssh
to a node.
I have traced this to slurmstepd spawning a long sleep, which I believe
allows proctrackd to 'see' if a job is active.
On …
[View More]nodes that I can ssh into:
root 3211 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmd
--systemd
root 3227 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmstepd
infinity
root 24322 1 0 15:40 ? 00:00:00 slurmstepd:
[15709.extern]
root 24326 24322 0 15:40 ? 00:00:00 \_ sleep 100000000
On nodes where I cannot ssh:
root 3226 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmd
--systemd
root 3258 1 0 Nov08 ? 00:00:00 /usr/sbin/slurmstepd
infinity
Maybe I am not understanding something here?
ps. I ahve tried to run the pam_slurm_adopt module with options to debug,
and have not found anything useful
John H
[View Less]
Hi,
Is there a way to change/control the primary node (i.e. where the initial task starts) as part of a job's allocation.
For eg, if a job requires 6 CPUs & its allocation is distributed over 3 hosts h1, h2 & h3 I find that it always starts the task in 1 particularnode (say h1) irrespective of how many slots were available in the hosts.
Can we somehow let slurm have the primary node as h2?
Is there any C-API inside select plugin which can do this trick if we were to control it through …
[View More]the configured select plugin?
Thanks.-Bhaskar.
[View Less]
Hello,
Is there any DS in slurmctld which portrays the dynamic relative priority of pending jobs?
We are trying to use slurm for developing a scheduling solution and 1 of the problems we face at the outset is how to determinethe order of scheduling for pending jobs.
One option is to find scheduling iteration window begin & close pointers & cache the job ids as seen in order & then make them the priority order at that point of time.
( This means for 500 pending jobs, say, if we can …
[View More]find which are the slurmctld calls which mark the beginning & end of a sched iteration then we can use the scheduling orderof jobs as the relative priority order for that period of time, of course it may change depending on fairshare, user initiated priority modification etc. )
A concrete existing data structure showing the dynamic priority itself from slurmctld would be handy.
Help appreciated.
Thanks!
Bhaskar.
[View Less]
Hello,
Just to add some context here. We plan to use slurm for developing a sched solution which interacts with a backend system.
Now, the backend system has pieces of h/w which require specific host in the allocation to be the primary/master host wherein the initial task would be launched, this in turn is driven by the job's placement orientation on the h/w itself.
So, our primary task should launch in the asked primary host while secondary / remote tasks would subsequently get started on …
[View More]other hosts.
Hope this brings some context to the problem as to why a specific host is necessary to be the starting host.
Regards,Bhaskar.
On Thursday 31 October, 2024 at 12:04:37 am IST, Laura Hild <lsh(a)jlab.org> wrote:
I think if you tell the list why you care which of the Nodes is BatchHost, they may be able to provide you with a better solution.
________________________________________
Od: Bhaskar Chakraborty via slurm-users <slurm-users(a)lists.schedmd.com>
Poslano: sreda, 30. oktober 2024 12:35
Za: slurm-users(a)schedmd.com
Zadeva: [slurm-users] Change primary alloc node
Hi,
Is there a way to change/control the primary node (i.e. where the initial task starts) as part of a job's allocation.
For eg, if a job requires 6 CPUs & its allocation is distributed over 3 hosts h1, h2 & h3 I find that it always starts the task in 1 particular
node (say h1) irrespective of how many slots were available in the hosts.
Can we somehow let slurm have the primary node as h2?
Is there any C-API inside select plugin which can do this trick if we were to control it through the configured select plugin?
Thanks.
-Bhaskar.
[View Less]
I have set AllowAccounts=sunlabc5hpc,root, but it doesn’t seem to work. User c010637 is not part of the sunlabc5hpc account but is still able to use the sunlabc5hpc partition. I have tried setting EnforcePartLimits to ALL, ANY, and NO, but none of these options resolved the issue.
[c010637@sl-login ~]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
cpu* up infinite 3 mix sl-c[0035,0042-0043]
cpu* up infinite 1 idle sl-c0036
gpu up …
[View More]infinite 3 idle sl-c[0045-0047]
sunlabc5hpc up infinite 1 idle sl-c0048
[c010637@sl-login ~]$ scontrol show partition sunlabc5hpc
PartitionName=sunlabc5hpc
AllowGroups=ALL AllowAccounts=sunlabc5hpc,root AllowQos=ALL
AllocNodes=ALL Default=NO QoS=N/A
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED
Nodes=sl-c0048
PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO
OverTimeLimit=NONE PreemptMode=OFF
State=UP TotalCPUs=256 TotalNodes=1 SelectTypeParameters=NONE
JobDefaults=(null)
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
TRES=cpu=256,mem=515000M,node=1,billing=256,gres/gpu=8
[c010637@sl-login ~]$ sacctmgr list assoc format=cluster,user,account%20,qos user=$USER
Cluster User Account QOS
---------- ---------- -------------------- --------------------
snowhpc c010637 c010637_bank normal
[c010637@sl-login ~]$ sacctmgr list account sunlabc5hpc
Account Descr Org
---------- -------------------- --------------------
sunlabc5h+ sunlabc5hpc sunlabc5hpc
[c010637@sl-login ~]$ sacctmgr show assoc where Account=sunlabc5hpc format=User,Account
User Account
---------- ----------
sunlabc5h+
c010751 sunlabc5h+
snowdai sunlabc5h+
[View Less]
Thanks for all your help. So it seems we can skip the trouble of compiling SLURM over different mariadb versions.
Tianyang Zhang
SJTU Network Information Center
发件人: Sid Young <sid.young(a)gmail.com>
发送时间: 2024年10月30日 7:19
收件人: Andrej Sec <andrej.sec(a)savba.sk>
抄送: taleintervenor(a)sjtu.edu.cn; slurm-users(a)lists.schedmd.com
主题: Re: [slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?
I recently upgraded from 20.11 to 24.…
[View More]05.2, before moving the cluster from CentOS 7.9 to Oracle Linux 8.10
The DB upgrade should be pretty simple, do a mysqldump first, then uninstall the old DB, change the repo's and install the new DB version. It should recognise the DB files on disk and access them. Do another DB backup on the new DB version. then roll through the Slurm upgrades.
I picked the first and last version of each release, and systematically went through each node till it was done. First the slurm controller node, then the compute nodes. To avoid Job loss, drain the nodes or you end up with a situation where the slurmd can't talk to the running slurmstepd and the job(s) gets lost. (Shows as a "Protocol Error").
Ole sent me a link to this guide which mostly worked.
https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrade-slurm…
Sid Young
W: https://off-grid-engineering.com
On Tue, Oct 29, 2024 at 6:33 PM Andrej Sec via slurm-users <slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> > wrote:
Hi,
we are facing a similar task. We have a Slurm 22.05 / MariaDB 5.5.68 environment and want to upgrade to a newer version. According to the documentation, it’s recommended to upgrade from 22.05 to a maximum of 23.11 in one step. With the MariaDB upgrade, there’s a challenge between 10.1 and 10.2+ due to incompatible changes (https://mariadb.com/kb/en/changes-improvements-in-mariadb-10-2). This upgrade, as I understand from the documentation, requires at least slurm 22.05, where it is automatically handled by the slurmdbd service.
In the test lab, we performed the following tests:
a. Incremental upgrade - according to MariaDB recommendations:
1. Upgrade MariaDB 5.5.68 -> 10.1.48 -> 10.2.44
2. Start the Slurm suite 22.05, checking content after each MariaDB upgrade step. During the 10.1 -> 10.2 upgrade, the slurmdbd service automatically converted the database to the required format. We had enabled general.log in MariaDB, allowing detailed inspection of database changes during conversion.
3. Upgrade slurmdbd to version 23.11
4. Upgrade slurmctld to version 23.11
5. Upgrade slurmd to version 23.11
6. Check the database content and compare tests before and after the upgrade (we used various reports with scontrol, sreport, sacct, sacctmgr for verification).
b. Direct MariaDB upgrade from 5.5.68 to 10.2.44 using the same approach. According to the tests, this resulted in the same state as the incremental approach.
PS: If you proceed with the upgrade, I would appreciate it if you could let us know about any potential challenges you encountered.
Andrej Sec
nscc, Bratislava, Slovakia
_____
Od: "hermes via slurm-users" <slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> >
Komu: slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
Odoslané: pondelok, 28. október 2024 8:48:19
Predmet: [slurm-users] =?eucgb2312_cn?q?=D7=AA=B7=A2=3A_What_is_the_safe_upgrade_path_when_upgrade_from_slurm21=2E08_and_mariadb5=2E5=3F?=
Hi everyone:
We are currently running business on SLURM21.08 and mariadb5.5.
When talking about the upgrade, we need to keep all the users and jobs history data. And we see the official document wrote:
“When upgrading an existing accounting database to MariaDB 10.2.1 or later from an older version of MariaDB or any version of MySQL, ensure you are running slurmdbd 22.05.7 or later. These versions will gracefully handle changes to MariaDB default values that can cause problems for slurmdbd.”
So is this mean we have to firstly build SLURM>22.05 over mariadb5.5, and do the SLURM upgrade. Then upgrade the mariadb to newer version, and rebuild the same version of SLURM over new mariadb-devel?
And is it safe to jump directly from mariadb5.5 to latest version? How can we check whether the slurm have correctly inherited the historical data?
Thanks,
Tianyang Zhang
SJTU Network Information Center
--
slurm-users mailing list -- slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com>
--
slurm-users mailing list -- slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com>
To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com>
[View Less]
Hi everyone:
We are currently running business on SLURM21.08 and mariadb5.5.
When talking about the upgrade, we need to keep all the users and jobs
history data. And we see the official document wrote:
“When upgrading an existing accounting database to MariaDB 10.2.1 or later
from an older version of MariaDB or any version of MySQL, ensure you are
running slurmdbd 22.05.7 or later. These versions will gracefully handle
changes to MariaDB default values that can cause problems for …
[View More]slurmdbd.”
So is this mean we have to firstly build SLURM>22.05 over mariadb5.5, and do
the SLURM upgrade. Then upgrade the mariadb to newer version, and rebuild
the same version of SLURM over new mariadb-devel?
And is it safe to jump directly from mariadb5.5 to latest version? How can
we check whether the slurm have correctly inherited the historical data?
Thanks,
Tianyang Zhang
SJTU Network Information Center
[View Less]
Hi,
Is there an option in slurm to launch a custom script at the time of job submission through sbatchor salloc? The script should run with submit user permission in submit area.
The idea is that we need to enquire something which characterises our job’s requirement like CPUslots, memory etc from a central server and we do need read access to user area prior to that.
In our use case the user doesn’t necessarily know beforehand what kind of resource his job needs.(Hence, the need for such a …
[View More]script which will contact the server with user area info.)
Based on it we can modify the job a little later. A post submit script, if available, would inform us the slurm job id as well, it would get called just after the job has entered the system and prior to its scheduling.
Thanks,Bhaskar.
Sent from Yahoo Mail for iPad
[View Less]
We have a 'gpu' partition with 30 or so nodes, some with A100s, some with
H100s, and a few others.
It appears that when (for example) all of the A100 GPUs are in use, if
there are additional jobs requesting A100 GPUs pending, and those jobs have
the highest priority in the partition, then jobs submitted for H100s won't
run even if there are idle H100s. This is a small subset of our present
pending queue- the four bottom jobs should be running, but aren't. The top
pending job shows reason '…
[View More]Resources' while the rest all show 'Priority'.
Any thoughts on why this might be happening?
JOBID PRIORITY TRES_ALLOC
8317749 501490
cpu=4,mem=80000M,node=1,billing=48,gres/gpu=1,gres/gpu:a100=1
8317750 501490
cpu=4,mem=80000M,node=1,billing=48,gres/gpu=1,gres/gpu:a100=1
8317745 501490
cpu=4,mem=80000M,node=1,billing=48,gres/gpu=1,gres/gpu:a100=1
8317746 501490
cpu=4,mem=80000M,node=1,billing=48,gres/gpu=1,gres/gpu:a100=1
8338679 500060
cpu=4,mem=64G,node=1,billing=144,gres/gpu=1,gres/gpu:h100=1
8338678 500060
cpu=4,mem=64G,node=1,billing=144,gres/gpu=1,gres/gpu:h100=1
8338677 500060
cpu=4,mem=64G,node=1,billing=144,gres/gpu=1,gres/gpu:h100=1
8338676 500060
cpu=4,mem=64G,node=1,billing=144,gres/gpu=1,gres/gpu:h100=1
Thanks,
Kevin
--
Kevin Hildebrand
University of Maryland
Division of IT
[View Less]
I am unable to limit the number of jobs per user per partition. I
have searched the internet, the forums and the slurm documentation.
I created a partition with a QOS having MaxJobsPU=1 and MaxJobsPA=1
Created a user stephen with account=stephen and MaxJobs=1
However if I sbatch a test job (sleep 180) multiple times they all
run concurrently. I am at a loss of what else to do. Help would be very
appreciated .
Thank you
--
Stephen Connolly
JSI Data Systems Ltd
613-727-9353
stephen(a)jsidata.ca
Hello everyone,
I’ve recently encountered an issue where some nodes in our cluster enter
a drain state randomly, typically after completing long-running jobs.
Below is the output from the |sinfo| command showing the reason *“Prolog
error”* :
|root@controller-node:~# sinfo -R REASON USER TIMESTAMP NODELIST Prolog
error slurm 2024-09-24T21:18:05 node[24,31] |
When checking the |slurmd.log| files on the nodes, I noticed the
following errors:
|[2024-09-24T17:18:22.386] [217703.extern] …
[View More]error:
_handle_add_extern_pid_internal: Job 217703 can't add pid 3311892 to
jobacct_gather plugin in the extern_step. **(repeated 90 times)**
[2024-09-24T17:18:22.917] [217703.extern] error:
_handle_add_extern_pid_internal: Job 217703 can't add pid 3313158 to
jobacct_gather plugin in the extern_step. ... [2024-09-24T21:17:45.162]
launch task StepId=217703.0 request from UID:54059 GID:1600
HOST:<SLURMCTLD_IP> PORT:53514 [2024-09-24T21:18:05.166] error: Waiting
for JobId=217703 REQUEST_LAUNCH_PROLOG notification failed, giving up
after 20 sec [2024-09-24T21:18:05.166] error: slurm_send_node_msg:
[(null)] slurm_bufs_sendto(msg_type=RESPONSE_SLURM_RC_MSG) failed:
Unexpected missing socket error [2024-09-24T21:18:05.166] error:
_rpc_launch_tasks: unable to send return code to
address:port=<SLURMCTLD_IP>:53514 msg_type=6001: No such file or directory |
If you know how to solve these errors, please let me know. I would
greatly appreciate any guidance or suggestions for further troubleshooting.
Thank you in advance for your assistance.
Best regards,
--
Télécom Paris <https://www.telecom-paris.fr>
*Nacereddine LADDAOUI*
Ingénieur de Recherche et de Développement
19 place Marguerite Perey
CS 20031
91123 Palaiseau Cedex
Site web Télécom Paris <https://www.telecom-paris.fr>X Télécom Paris
<https://twitter.com/TelecomParis_>Facebook Télécom Paris
<https://www.facebook.com/TelecomParis>LinkedIn Télécom Paris
<https://www.linkedin.com/school/telecom-paris/>Instagram Télécom Paris
<https://www.instagram.com/telecom_paris/>Blog Télécom Paris
<https://imtech.wp.imt.fr>
Une école de l'IMT <https://www.imt.fr>
[View Less]
Has anyone else noticed, somewhere between versions 22.05.11 and 23.11.9, losing fixed Features defined for a node in slurm.conf, and instead now just having those controlled by a NodeFeaturesPlugin like node_features/knl_generic?
Slurm version 24.05.4 is now available and includes a fix for a recently
discovered security issue with the new stepmgr subsystem.
SchedMD customers were informed on October 9th and provided a patch on
request; this process is documented in our security policy. [1]
A mistake in authentication handling in stepmgr could permit an attacker
to execute processes under other users' jobs. This is limited to jobs
explicitly running with --stepmgr, or on systems that have globally
enabled stepmgr …
[View More]through "SlurmctldParameters=enable_stepmgr" in their
configuration. CVE-2024-48936.
Downloads are available at https://www.schedmd.com/downloads.php .
Release notes follow below.
- Tim
[1] https://www.schedmd.com/security-policy/
--
Tim Wickberg
Chief Technology Officer, SchedMD LLC
Commercial Slurm Development and Support
> * Changes in Slurm 24.05.4
> ==========================
> -- Fix generic int sort functions.
> -- Fix user look up using possible unrealized uid in the dbd.
> -- Fix FreeBSD compile issue with tls/none plugin.
> -- slurmrestd - Fix regressions that allowed slurmrestd to be run as SlurmUser
> when SlurmUser was not root.
> -- mpi/pmix fix race conditions with het jobs at step start/end which could
> make srun to hang.
> -- Fix not showing some SelectTypeParameters in scontrol show config.
> -- Avoid assert when dumping removed certain fields in JSON/YAML.
> -- Improve how shards are scheduled with affinity in mind.
> -- Fix MaxJobsAccruePU not being respected when MaxJobsAccruePA is set
> in the same QOS.
> -- Prevent backfill from planning jobs that use overlapping resources for the
> same time slot if the job's time limit is less than bf_resolution.
> -- Fix memory leak when requesting typed gres and --[cpus|mem]-per-gpu.
> -- Prevent backfill from breaking out due to "system state changed" every 30
> seconds if reservations use REPLACE or REPLACE_DOWN flags.
> -- slurmrestd - Make sure that scheduler_unset parameter defaults to true even
> when the following flags are also set: show_duplicates, skip_steps,
> disable_truncate_usage_time, run_away_jobs, whole_hetjob,
> disable_whole_hetjob, disable_wait_for_result, usage_time_as_submit_time,
> show_batch_script, and or show_job_environment. Additionaly, always make
> sure show_duplicates and disable_truncate_usage_time default to true when
> the following flags are also set: scheduler_unset, scheduled_on_submit,
> scheduled_by_main, scheduled_by_backfill, and or job_started. This effects
> the following endpoints:
> 'GET /slurmdb/v0.0.40/jobs'
> 'GET /slurmdb/v0.0.41/jobs'
> -- Ignore --json and --yaml options for scontrol show config to prevent mixing
> output types.
> -- Fix not considering nodes in reservations with Maintenance or Overlap flags
> when creating new reservations with nodecnt or when they replace down nodes.
> -- Fix suspending/resuming steps running under a 23.02 slurmstepd process.
> -- Fix options like sprio --me and squeue --me for users with a uid greater
> than 2147483647.
> -- fatal() if BlockSizes=0. This value is invalid and would otherwise cause the
> slurmctld to crash.
> -- sacctmgr - Fix issue where clearing out a preemption list using
> preempt='' would cause the given qos to no longer be preempt-able until set
> again.
> -- Fix stepmgr creating job steps concurrently.
> -- data_parser/v0.0.40 - Avoid dumping "Infinity" for NO_VAL tagged "number"
> fields.
> -- data_parser/v0.0.41 - Avoid dumping "Infinity" for NO_VAL tagged "number"
> fields.
> -- slurmctld - Fix a potential leak while updating a reservation.
> -- slurmctld - Fix state save with reservation flags when a update fails.
> -- Fix reservation update issues with parameters Accounts and Users, when
> using +/- signs.
> -- slurmrestd - Don't dump warning on empty wckeys in:
> 'GET /slurmdb/v0.0.40/config'
> 'GET /slurmdb/v0.0.41/config'
> -- Fix slurmd possibly leaving zombie processes on start up in configless when
> the initial attempt to fetch the config fails.
> -- Fix crash when trying to drain a non-existing node (possibly deleted
> before).
> -- slurmctld - fix segfault when calculating limit decay for jobs with an
> invalid association.
> -- Fix IPMI energy gathering with multiple sensors.
> -- data_parser/v0.0.39 - Remove xassert requiring errors and warnings to have a
> source string.
> -- slurmrestd - Prevent potential segfault when there is an error parsing an
> array field which could lead to a double xfree. This applies to several
> endpoints in data_parser v0.0.39, v0.0.40 and v0.0.41.
> -- scancel - Fix a regression from 23.11.6 where using both the --ctld and
> --sibling options would cancel the federated job on all clusters instead of
> only the cluster(s) specified by --sibling.
> -- accounting_storage/mysql - Fix bug when removing an association
> specified with an empty partition.
> -- Fix setting multiple partition state restore on a job correctly.
> -- Fix difference in behavior when swapping partition order in job submission.
> -- Fix security issue in stepmgr that could permit an attacker to execute
> processes under other users' jobs. CVE-2024-48936.
[View Less]
I have a SLURM configuration of 2 hosts with 6 + 4 CPUs.
I am submitting jobs with sbatch -n <CPU slots> <job script>.
However, I see that even when I have exhausted all 10 CPU slots for the running jobs it's still allowing subsequent jobs to run !
The CPU slots availability is also show as full for the 2 hosts. No job is found pending.
What could be problem?
My Slurm.conf looks like (host names are changed to generic):
ClusterName=MyClusterControlMachine=host1ControlAddr=<some …
[View More]address>SlurmUser=slurmsa#AuthType=auth/mungeStateSaveLocation=/var/spool/slurmdSlurmdSpoolDir=/var/spool/slurmdSlurmctldLogFile=/var/log/slurm/slurmctld.logSlurmdDebug=3SlurmctldDebug=6SlurmdLogFile=/var/log/slurm/slurmd.logAccountingStorageType=accounting_storage/slurmdbdAccountingStorageHost=host1#AccountingStoragePass=medslurmpass#AccountingStoragePass=/var/run/munge/munge.socket.2AccountingStorageUser=slurmsa#TaskPlugin=task/cgroupNodeName=host1 CPUs=6 SocketsPerBoard=3 CoresPerSocket=2 ThreadsPerCore=1 State=UNKNOWNNodeName=host2 CPUs=4 ThreadsPerCore=1 State=UNKNOWNPartitionName=debug Nodes=host1,host2 Default=YES MaxTime=INFINITE State=UPJobAcctGatherType=jobacct_gather/linuxJobAcctGatherFrequency=30
SelectType=select/cons_tresSelectTypeParameters=CR_CPUTaskPlugin=task/affinity
Thanks in advance for any help!
Regards,Bhaskar.
[View Less]