October 2024 - slurm-users - lists.schedmd.com

slurmctld hourly: Unexpected missing socket error
by Jason Ellul 13 Jan '26

13 Jan '26

Hi all, I am hoping someone can help with our problem. Every hour after restarting slurmctld the controller becomes unresponsive to commands for 1 sec, reporting errors such as: [2024-07-15T11:45:48.509] error: slurm_send_node_msg: [socket:[934767]] slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed: Unexpected missing socket error [2024-07-15T11:45:48.509] error: slurm_send_node_msg: [socket:[934760]] slurm_bufs_sendto(msg_type=RESPONSE_SLURM_RC) failed: Unexpected missing socket error [2024-07-15T11:45:48.509] error: slurm_send_node_msg: [socket:[934875]] slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed: Unexpected missing socket error [2024-07-15T11:45:48.509] error: slurm_send_node_msg: [socket:[934906]] slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed: Unexpected missing socket error [2024-07-15T11:45:48.509] error: slurm_send_node_msg: [socket:[939016]] slurm_bufs_sendto(msg_type=RESPONSE_JOB_INFO) failed: Unexpected missing socket error It occurs consistently at around the hour mark, but generally not at other times, unless we run a reconfigure or restart the controller. We don’t see any issues in the slurmdbd.log and the errors are also always msg type RESPONSE. We have tried building a new server on different infrastructure, but the problem has persisted. Yesterday we even tried updating slurm to v24.05.1 in the hope that may provide a fix. During our troubleshooting we have: Set: * SchedulerParameters = max_rpc_cnt=400,sched_min_interval=50000,sched_max_job_start=300,batch_sched_delay=20,bf_resolution=600,bf_min_prio_reserve=2000,bf_min_age_reserve=600 * SlurmctldPort = 6808-6817 But although the stats in sdiag have improved we still see the errors. On our monitoring software we also see a drop in network and disk activity during this 1 second, always at approx. 1 hour after restarting the controller. Many Thanks in advance Jason Jason Ellul Head - Research Computing Facility Office of Cancer Research Peter MacCallum Cancer Centre

6 8

job_container/tmpfs and srun.
by Phill Harvey-Smith 17 Jan '25

17 Jan '25

Hi all, On our setup we are using job_container/tmpfs to give each job it's own temp space. Since our compute nodes have reasonably sized disks for tasks that do a lot of disk I/O on user's data we have asked users to copy their data to the local disk at the beginning of the task and (if needed) copy it back at the end. This saves lots of NFS thrashing slowing down both the task and the NFS servers. However some of our users are having problems with this, their initial sbatch script will create a temp directory in their private /tmp copy their data to it and then try to srun a program. The srun will fall over as it doesn't seem to have have access to the copied data. I suspect this is because the srun task is getting it's own private /tmp. So my question is, is there a way to have the srun task inherit the /tmp of the initial sbatch? I'll include a sample of the script our user is using below. If any further information is required please feel free to ask. Cheers. Phill. #!/usr/bin/bash #SBATCH --nodes 1 #SBATCH --ntasks-per-node=1 #SBATCH --cpus-per-task=1 #SBATCH --time=00:00:10 #SBATCH --mem-per-cpu=3999 #SBATCH --output=script_out.log #SBATCH --error=script_error.log # The above options puts the STDOUT and STDERR of sbatch in # log files prefixed with 'script_'. # Create a randomly-named directory under /tmp jobtmpdir=$(mktemp -d) # Register a function to try and cleanup in case of job failure cleanup_handler() { echo "Cleaning up ${jobtmpdir}" rm -rf ${jobtmpdir} } trap 'cleanup_handler' SIGTERM EXIT # Change working directory to this directory cd ${jobtmpdir} # Copy the executable and input files from # where the job was submitted to the temporary directory. cp ${SLURM_SUBMIT_DIR}/a.out . cp ${SLURM_SUBMIT_DIR}/input.txt . # Run the executable, handling the collection of stdout # and stderr ourselves by redirecting to file srun ./a.out 2> task_error.log > task_out.log # Copy output data back to the submit directory. cp output.txt ${SLURM_SUBMIT_DIR} cp task_out.log ${SLURM_SUBMIT_DIR} cp task_error.log ${SLURM_SUBMIT_DIR} # Cleanup cd ${SLURM_SUBMIT_DIR} cleanup_handler

2 1

slurmctld keeps segfaulting, possibly during or just after backfill
by Marcus Lauer 12 Nov '24

12 Nov '24

We are running into a problem where slurmctld is segfaulting a few times a day. We had this problem with SLURM 23.11.8 and now with 23.11.10 as well, though the problem only appears on one of the several SLURM clusters we have, and all of them use one of those versions of SLURM. I was wondering if anyone has encountered a similar issue and has any thoughts on how to prevent this. Obviously we use "SchedulerType=sched/backfill" but strangely when I switched to sched/builtin for a while there were still slurmctld segfaults. We also set "SchedulerParameters=enable_user_top,bf_max_job_test=2000". I have tried turning those off but it did not help. I have also tried tweaking several other settings to no avail. Most of the cluster runs Rocky Linux 8.10 (including the slurmctld system) though we still have some Scientific Linux 7.9 compute nodes (we compile SLURM separately for those). Here is the crash-time error from journalctl: Oct 02 06:31:20 our.host.name kernel: sched_agent[2048355]: segfault at 8 ip 00007fec755d7ea8 sp 00007fec6bffe7e8 error 4 in libslurmfull.so[7fec7555a000+1f4000] Oct 02 06:31:20 our.host.name kernel: Code: 48 39 c1 7e 19 48 c1 f8 06 ba 01 00 00 00 48 d3 e2 48 f7 da 48 0b 54 c6 10 48 21 54 c7 10 c3 b8 00 00 00 00 eb da 48 8b 4f 08 <48> 39 4e 08 48 0f 4e 4e 08 49 89 c9 48 83 f9 3f 76 4e ba 40 00 00 Oct 02 06:31:20 our.host.name systemd[1]: Started Process Core Dump (PID 2169426/UID 0). Oct 02 06:31:20 our.host.name systemd-coredump[2169427]: Process 2048344 ( slurmctld) of user 991 dumped core. This is followed by a list of each of the dozen or so related threads. The one which is dumping core is first and looks like this: Stack trace of thread 2048355: #0 0x00007fec755d7ea8 bit_and_not (libslurmfull.so) #1 0x000000000044531f _job_alloc (slurmctld) #2 0x000000000044576b _job_alloc_whole_node_internal (slurmctld) #3 0x0000000000446e6d gres_ctld_job_alloc_whole_node (slurmctld) #4 0x00007fec722e29b8 job_res_add_job (select_cons_tres.so) #5 0x00007fec722f7c32 select_p_select_nodeinfo_set (select_cons_tres.so) #6 0x00007fec756e7dc7 select_g_select_nodeinfo_set (libslurmfull.so) #7 0x0000000000496eb3 select_nodes (slurmctld) #8 0x0000000000480826 _schedule (slurmctld) #9 0x00007fec753421ca start_thread (libpthread.so.0) #10 0x00007fec745f78d3 __clone (libc.so.6) I have run slurmctld with "debug5" level logging and it appears that the error occurs right after backfill considers a large number of jobs. Slurmctld could be failing at the end of backfill or when doing something which happens just after backfill runs. Usually this is the last message before the crash: [2024-09-25T18:39:42.076] slurmscriptd: debug: _slurmscriptd_mainloop: finished If anyone has any thoughts or advice on this that would be appreciated. Thank you. -- Marcus Lauer Systems Administrator CETS Group, Research Support

2 2

Change primary alloc node
by Bhaskar Chakraborty 04 Nov '24

04 Nov '24

Hi, Is there a way to change/control the primary node (i.e. where the initial task starts) as part of a job's allocation. For eg, if a job requires 6 CPUs & its allocation is distributed over 3 hosts h1, h2 & h3 I find that it always starts the task in 1 particularnode (say h1) irrespective of how many slots were available in the hosts. Can we somehow let slurm have the primary node as h2? Is there any C-API inside select plugin which can do this trick if we were to control it through the configured select plugin? Thanks.-Bhaskar.

2 4

Slurm Job Sched Priority
by Bhaskar Chakraborty 03 Nov '24

03 Nov '24

Hello, Is there any DS in slurmctld which portrays the dynamic relative priority of pending jobs? We are trying to use slurm for developing a scheduling solution and 1 of the problems we face at the outset is how to determinethe order of scheduling for pending jobs. One option is to find scheduling iteration window begin & close pointers & cache the job ids as seen in order & then make them the priority order at that point of time. ( This means for 500 pending jobs, say, if we can find which are the slurmctld calls which mark the beginning & end of a sched iteration then we can use the scheduling orderof jobs as the relative priority order for that period of time, of course it may change depending on fairshare, user initiated priority modification etc. ) A concrete existing data structure showing the dynamic priority itself from slurmctld would be handy. Help appreciated. Thanks! Bhaskar.

3 4

Re: Change primary alloc node
by Bhaskar Chakraborty 31 Oct '24

31 Oct '24

Hello, Just to add some context here. We plan to use slurm for developing a sched solution which interacts with a backend system. Now, the backend system has pieces of h/w which require specific host in the allocation to be the primary/master host wherein the initial task would be launched, this in turn is driven by the job's placement orientation on the h/w itself. So, our primary task should launch in the asked primary host while secondary / remote tasks would subsequently get started on other hosts. Hope this brings some context to the problem as to why a specific host is necessary to be the starting host. Regards,Bhaskar. On Thursday 31 October, 2024 at 12:04:37 am IST, Laura Hild <lsh(a)jlab.org> wrote: I think if you tell the list why you care which of the Nodes is BatchHost, they may be able to provide you with a better solution. ________________________________________ Od: Bhaskar Chakraborty via slurm-users <slurm-users(a)lists.schedmd.com> Poslano: sreda, 30. oktober 2024 12:35 Za: slurm-users(a)schedmd.com Zadeva: [slurm-users] Change primary alloc node Hi, Is there a way to change/control the primary node (i.e. where the initial task starts) as part of a job's allocation. For eg, if a job requires 6 CPUs & its allocation is distributed over 3 hosts h1, h2 & h3 I find that it always starts the task in 1 particular node (say h1) irrespective of how many slots were available in the hosts. Can we somehow let slurm have the primary node as h2? Is there any C-API inside select plugin which can do this trick if we were to control it through the configured select plugin? Thanks. -Bhaskar.

3 2

Why AllowAccounts not work in slurm-23.11.6
by daijiangkuicgo＠gmail.com 30 Oct '24

30 Oct '24

I have set AllowAccounts=sunlabc5hpc,root, but it doesn’t seem to work. User c010637 is not part of the sunlabc5hpc account but is still able to use the sunlabc5hpc partition. I have tried setting EnforcePartLimits to ALL, ANY, and NO, but none of these options resolved the issue. [c010637@sl-login ~]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST cpu* up infinite 3 mix sl-c[0035,0042-0043] cpu* up infinite 1 idle sl-c0036 gpu up infinite 3 idle sl-c[0045-0047] sunlabc5hpc up infinite 1 idle sl-c0048 [c010637@sl-login ~]$ scontrol show partition sunlabc5hpc PartitionName=sunlabc5hpc AllowGroups=ALL AllowAccounts=sunlabc5hpc,root AllowQos=ALL AllocNodes=ALL Default=NO QoS=N/A DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED MaxCPUsPerSocket=UNLIMITED Nodes=sl-c0048 PriorityJobFactor=1 PriorityTier=1 RootOnly=NO ReqResv=NO OverSubscribe=NO OverTimeLimit=NONE PreemptMode=OFF State=UP TotalCPUs=256 TotalNodes=1 SelectTypeParameters=NONE JobDefaults=(null) DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED TRES=cpu=256,mem=515000M,node=1,billing=256,gres/gpu=8 [c010637@sl-login ~]$ sacctmgr list assoc format=cluster,user,account%20,qos user=$USER Cluster User Account QOS ---------- ---------- -------------------- -------------------- snowhpc c010637 c010637_bank normal [c010637@sl-login ~]$ sacctmgr list account sunlabc5hpc Account Descr Org ---------- -------------------- -------------------- sunlabc5h+ sunlabc5hpc sunlabc5hpc [c010637@sl-login ~]$ sacctmgr show assoc where Account=sunlabc5hpc format=User,Account User Account ---------- ---------- sunlabc5h+ c010751 sunlabc5h+ snowdai sunlabc5h+

5 9

Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?
by taleintervenor＠sjtu.edu.cn 29 Oct '24

29 Oct '24

Thanks for all your help. So it seems we can skip the trouble of compiling SLURM over different mariadb versions. Tianyang Zhang SJTU Network Information Center 发件人: Sid Young <sid.young(a)gmail.com> 发送时间: 2024年10月30日 7:19 收件人: Andrej Sec <andrej.sec(a)savba.sk> 抄送: taleintervenor(a)sjtu.edu.cn; slurm-users(a)lists.schedmd.com 主题: Re: [slurm-users] Re: 转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5? I recently upgraded from 20.11 to 24.05.2, before moving the cluster from CentOS 7.9 to Oracle Linux 8.10 The DB upgrade should be pretty simple, do a mysqldump first, then uninstall the old DB, change the repo's and install the new DB version. It should recognise the DB files on disk and access them. Do another DB backup on the new DB version. then roll through the Slurm upgrades. I picked the first and last version of each release, and systematically went through each node till it was done. First the slurm controller node, then the compute nodes. To avoid Job loss, drain the nodes or you end up with a situation where the slurmd can't talk to the running slurmstepd and the job(s) gets lost. (Shows as a "Protocol Error"). Ole sent me a link to this guide which mostly worked. https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_installation/#upgrade-slurm… Sid Young W: https://off-grid-engineering.com On Tue, Oct 29, 2024 at 6:33 PM Andrej Sec via slurm-users <slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> > wrote: Hi, we are facing a similar task. We have a Slurm 22.05 / MariaDB 5.5.68 environment and want to upgrade to a newer version. According to the documentation, it’s recommended to upgrade from 22.05 to a maximum of 23.11 in one step. With the MariaDB upgrade, there’s a challenge between 10.1 and 10.2+ due to incompatible changes (https://mariadb.com/kb/en/changes-improvements-in-mariadb-10-2). This upgrade, as I understand from the documentation, requires at least slurm 22.05, where it is automatically handled by the slurmdbd service. In the test lab, we performed the following tests: a. Incremental upgrade - according to MariaDB recommendations: 1. Upgrade MariaDB 5.5.68 -> 10.1.48 -> 10.2.44 2. Start the Slurm suite 22.05, checking content after each MariaDB upgrade step. During the 10.1 -> 10.2 upgrade, the slurmdbd service automatically converted the database to the required format. We had enabled general.log in MariaDB, allowing detailed inspection of database changes during conversion. 3. Upgrade slurmdbd to version 23.11 4. Upgrade slurmctld to version 23.11 5. Upgrade slurmd to version 23.11 6. Check the database content and compare tests before and after the upgrade (we used various reports with scontrol, sreport, sacct, sacctmgr for verification). b. Direct MariaDB upgrade from 5.5.68 to 10.2.44 using the same approach. According to the tests, this resulted in the same state as the incremental approach. PS: If you proceed with the upgrade, I would appreciate it if you could let us know about any potential challenges you encountered. Andrej Sec nscc, Bratislava, Slovakia _____ Od: "hermes via slurm-users" <slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> > Komu: slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> Odoslané: pondelok, 28. október 2024 8:48:19 Predmet: [slurm-users] =?eucgb2312_cn?q?=D7=AA=B7=A2=3A_What_is_the_safe_upgrade_path_when_upgrade_from_slurm21=2E08_and_mariadb5=2E5=3F?= Hi everyone: We are currently running business on SLURM21.08 and mariadb5.5. When talking about the upgrade, we need to keep all the users and jobs history data. And we see the official document wrote: “When upgrading an existing accounting database to MariaDB 10.2.1 or later from an older version of MariaDB or any version of MySQL, ensure you are running slurmdbd 22.05.7 or later. These versions will gracefully handle changes to MariaDB default values that can cause problems for slurmdbd.” So is this mean we have to firstly build SLURM>22.05 over mariadb5.5, and do the SLURM upgrade. Then upgrade the mariadb to newer version, and rebuild the same version of SLURM over new mariadb-devel? And is it safe to jump directly from mariadb5.5 to latest version? How can we check whether the slurm have correctly inherited the historical data? Thanks, Tianyang Zhang SJTU Network Information Center -- slurm-users mailing list -- slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com> -- slurm-users mailing list -- slurm-users(a)lists.schedmd.com <mailto:slurm-users@lists.schedmd.com> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com <mailto:slurm-users-leave@lists.schedmd.com>

1 0

转发: What is the safe upgrade path when upgrade from slurm21.08 and mariadb5.5?
by taleintervenor＠sjtu.edu.cn 29 Oct '24

29 Oct '24

Hi everyone: We are currently running business on SLURM21.08 and mariadb5.5. When talking about the upgrade, we need to keep all the users and jobs history data. And we see the official document wrote: “When upgrading an existing accounting database to MariaDB 10.2.1 or later from an older version of MariaDB or any version of MySQL, ensure you are running slurmdbd 22.05.7 or later. These versions will gracefully handle changes to MariaDB default values that can cause problems for slurmdbd.” So is this mean we have to firstly build SLURM>22.05 over mariadb5.5, and do the SLURM upgrade. Then upgrade the mariadb to newer version, and rebuild the same version of SLURM over new mariadb-devel? And is it safe to jump directly from mariadb5.5 to latest version? How can we check whether the slurm have correctly inherited the historical data? Thanks, Tianyang Zhang SJTU Network Information Center

3 2

Job pre / post submit scripts
by Bhaskar Chakraborty 29 Oct '24

29 Oct '24

Hi, Is there an option in slurm to launch a custom script at the time of job submission through sbatchor salloc? The script should run with submit user permission in submit area. The idea is that we need to enquire something which characterises our job’s requirement like CPUslots, memory etc from a central server and we do need read access to user area prior to that. In our use case the user doesn’t necessarily know beforehand what kind of resource his job needs.(Hence, the need for such a script which will contact the server with user area info.) Based on it we can modify the job a little later. A post submit script, if available, would inform us the slurm job id as well, it would get called just after the job has entered the system and prior to its scheduling. Thanks,Bhaskar. Sent from Yahoo Mail for iPad

4 4

2026

2025

2024

slurm-users October 2024