- slurm-users - lists.schedmd.com

Node (anti?) Feature / attribute
by David Magda 17 Jun '24

17 Jun '24

Hello, What I’m looking for is a way for a node to continue to be in the same partition, and have the same QoS(es), but only be chosen if a particular capability is being asked for. This is because we are rolling something (OS upgrade) out slowly to a small batch of nodes at first, and then more and more over time, and do not want to interrupt users’ workflows: we want them to default the ‘current’ nodes and only land on the ‘special’ ones if requested. (At a certain point the ‘special’ ones will become the majority and we’d swap the behaviour.) Slurm has the well-known feature item that can be put on a node(s): > A comma-delimited list of arbitrary strings indicative of some characteristic associated with the node. There is no value or count associated with a feature at this time, a node either has a feature or it does not. A desired feature may contain a numeric component indicating, for example, processor speed but this numeric component will be considered to be part of the feature string. Features are intended to be used to filter nodes eligible to run jobs via the --constraintargument. By default a node has no features. Also see Gres for being able to have more control such as types and count. Using features is faster than scheduling against GRES but is limited to Boolean operations. https://slurm.schedmd.com/slurm.conf.html#OPT_Features So if there are (a bunch of) partitions, and nodes with-in those partitions, a job can be submitted to a partition and it can be run any any available node, or even be requested to run a particular node (--nodelist). With the above (and --constraint / --prefer), a particular sub-set of node(s) can be requested. But (AIUI) that sub-set is also available generally to everyone, even if a particular feature is not requested. Is there a way to tell Slurm to not schedule a job on a node UNLESS a flag or option is set? Or is it necessary to set up new partition(s) or QoS(es)? I see that AllowAccounts (and AllowGroups) is applicable only to Partitions, and not (AFAICT) on a per node basis. We’re currently on 22.05.x, but upgrading is fine. Regards, David

4 6

Issue with starting slurmctld
by Rafał Lalik 17 Jun '24

17 Jun '24

Hello, I have encountered issues with running slurmctld. From logs, I see these errors: [2024-06-14T17:37:57.587] slurmctld version 24.05.0 started on cluster laura [2024-06-14T17:37:57.587] error: plugin_load_from_file: dlopen(/usr/lib64/slurm/jobacct_gather_cgroup.so): /usr/lib64/slurm/jobacct_gather_cgroup.so: undefined symbol: xcpuinfo_init [2024-06-14T17:37:57.587] error: Couldn't load specified plugin name for jobacct_gather/cgroup: Dlopen of plugin file failed [2024-06-14T17:37:57.587] error: cannot create jobacct_gather context for jobacct_gather/cgroup [2024-06-14T17:37:57.587] fatal: failed to initialize jobacct_gather plugin [2024-06-14T17:39:07.741] Not running as root. Can't drop supplementary groups Aftre setting #JobAcctGatherType= the problem changed to: [2024-06-14T17:39:07.742] slurmctld version 24.05.0 started on cluster laura [2024-06-14T17:39:07.742] error: plugin_load_from_file: dlopen(/usr/lib64/slurm/prep_script.so): /usr/lib64/slurm/prep_script.so: undefined symbol: send_slurmd_conf_lite [2024-06-14T17:39:07.742] error: Couldn't load specified plugin name for prep/script: Dlopen of plugin file failed [2024-06-14T17:39:07.742] error: prep_g_init: cannot create prep context for prep/script [2024-06-14T17:39:07.742] fatal: failed to initialize prep plugin I also tried that with git-master: [2024-06-14T17:48:21.691] Not running as root. Can't drop supplementary groups [2024-06-14T17:48:21.691] error: Job accounting information gathered, but not stored [2024-06-14T17:48:21.692] slurmctld version 24.11.0-0rc1 started on cluster laura [2024-06-14T17:48:21.692] error: plugin_load_from_file: dlopen(/usr/lib64/slurm/jobacct_gather_cgroup.so): /usr/lib64/slurm/jobacct_gather_cgroup.so: undefined symbol: xcpuinfo_init [2024-06-14T17:48:21.692] error: Couldn't load specified plugin name for jobacct_gather/cgroup: Dlopen of plugin file failed [2024-06-14T17:48:21.692] error: cannot create jobacct_gather context for jobacct_gather/cgroup [2024-06-14T17:48:21.692] fatal: failed to initialize jobacct_gather plugin Any idea what may be wrong? Regards, Rafał -- Dr. inż. Rafal Lalik Jagiellonian University Faculty of Physics, Astronomy and Applied Computer Science ul. prof. Stanisława Łojasiewicza 11 30-348 Kraków, Poland Room : B-2-62 E-mail : rafal.lalik(a)uj.edu.pl Tel : (+48) 012 664 4587

2 2

Ill-formed config from the online configurator
by Rafał Lalik 17 Jun '24

17 Jun '24

The online configurator for JobAcctGatherType set to anything that none, generates config file with a such entry: #JobAcctGatherTypejobacct_gather/cgroup= or #JobAcctGatherTypejobacct_gather/linux= Regards, Rafał

1 0

cgroup issue on non-systemd system
by Rafał Lalik 14 Jun '24

14 Jun '24

Hello, per documentation, it is possible to run slurm on non systemd system with IgnoreSystemd=yes in cgroup.conf. However I had an error with slurmd: error: common_file_write_content: unable to open '/sys/fs/cgroup/system.slice/cgroup.subtree_control' for writing: No such file or directory error: Cannot enable cpuset in /sys/fs/cgroup/system.slice/cgroup.subtree_control: No such file or directory error: common_file_write_content: unable to open '/sys/fs/cgroup/system.slice/cgroup.subtree_control' for writing: No such file or directory error: Cannot enable memory in /sys/fs/cgroup/system.slice/cgroup.subtree_control: No such file or directory error: common_file_write_content: unable to open '/sys/fs/cgroup/system.slice/cgroup.subtree_control' for writing: No such file or directory error: Cannot enable cpu in /sys/fs/cgroup/system.slice/cgroup.subtree_control: No such file or directory error: Could not create scope directory /sys/fs/cgroup/system.slice/slurmstepd.scope: No such file or directory error: Couldn't load specified plugin name for cgroup/v2: Plugin init() callback failed error: cannot create cgroup context for cgroup/v2 error: Unable to initialize cgroup plugin error: slurmd initialization failed I could link it to the fact that '/sys/fs/cgroup/system.slice' does not exists when creating '/sys/fs/cgroup/system.slice/slurmstepd.scope' subdir. I propose small patch to the slurm which allows to create the path recursively (aka `mkdir -p` from shell): (also here: https://github.com/rlalik/slurm/tree/recurse_mkdir) diff --git a/src/plugins/cgroup/v2/cgroup_v2.c b/src/plugins/cgroup/v2/cgroup_v2.c index a18b9e62bd..6b7e2a2e4a 100644 --- a/src/plugins/cgroup/v2/cgroup_v2.c +++ b/src/plugins/cgroup/v2/cgroup_v2.c @@ -743,11 +743,40 @@ static int _init_stepd_system_scope(pid_t pid) return SLURM_SUCCESS; } +/* This create path recursively.*/ +bool recurse_mkdir(const char *dirname, int mode) +{ + const char *p = dirname; + char *temp = calloc(1, strlen(dirname)+1); + int ret = 0; + + while ((p = strchr(p, '/')) != NULL) { + /* Skip empty elements. Could be just multiple separators + * which is okay. */ + if (p != dirname && *(p-1) == '/') { + p++; + continue; + } + /* Put the path up to this point into a temporary to + * pass to the make directory function. */ + memcpy(temp, dirname, p-dirname); + temp[p-dirname] = '\0'; + p++; + if ((ret = mkdir(temp, mode)) != 0) { + if (errno != EEXIST) { + break; + } + } + } + free(temp); + return ret; +} + static int _init_new_scope(char *scope_path) { int rc; - rc = mkdir(scope_path, 0755); + rc = recurse_mkdir(scope_path, 0755); if (rc && (errno != EEXIST)) { error("Could not create scope directory %s: %m", scope_path); return SLURM_ERROR;

1 0

Debian RPM build for arm64?
by Christopher Harrop - NOAA Affiliate 14 Jun '24

14 Jun '24

Hello, Are the instructions for building Debian RPMs found at https://slurm.schedmd.com/quickstart_admin.html#debuild expected to work on ARM machines? I am having trouble with the "debuild -b -uc -us” step. #10 29.01 configure: exit 1 #10 29.01 dh_auto_configure: error: cd obj-aarch64-linux-gnu && ../configure --build=aarch64-linux-gnu --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-silent-rules --libdir=\${prefix}/lib/aarch64-linux-gnu --runstatedir=/run --disable-maintainer-mode --disable-dependency-tracking --sysconfdir=/etc/slurm --disable-debug --with-slurmrestd --with-pmix --enable-pam --with-systemdsystemunitdir=/lib/systemd/system/ SUCMD=/bin/su SLEEP_CMD=/bin/sleep returned exit code 1 #10 29.01 make[1]: *** [debian/rules:21: override_dh_auto_configure] Error 25 #10 29.01 make[1]: Leaving directory '/tmp/slurm-23.11.7' #10 29.02 make: *** [debian/rules:6: build] Error 2 #10 29.02 dpkg-buildpackage: error: debian/rules build subprocess returned exit status 2 #10 29.02 debuild: fatal error at line 1182: #10 29.02 dpkg-buildpackage -us -uc -ui -b failed #10 ERROR: process "/bin/sh -c cd /tmp && wget https://download.schedmd.com/slurm/slurm-23.11.7.tar.bz2 && tar -xaf slurm-23.11.7.tar.bz2 && cd slurm-23.11.7 && mk-build-deps -t \"apt-get -o Debug::pkgProblemResolver=yes -y\" -i debian/control && debuild -b -uc -us && cd .. && ARCH=$(dpkg --print-architecture) && dpkg --install slurm-smd_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-client_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-dev_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-doc_23.11.7-1_all.deb && dpkg --install slurm-smd-libnss-slurm_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-libpam-slurm-adopt_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-libpmi0_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-libpmi2-0_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-libslurm-perl_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-sackd_23.11.7-1_${ARCH}.deb && dpkg --install slurm-smd-sview_23.11.7-1_${ARCH}.deb" did not complete successfully: exit code: 29 Chris --------------------------------------------------------------------------------------------------- Christopher W. Harrop voice: (720) 649-0316 NOAA Global Systems Laboratory, R/GSL6 fax: (303) 497-7259 325 Broadway Boulder, CO 80303

3 3

Re: slurmstepd: error: task_g_set_affinity: Operation not permitted
by Christopher Harrop - NOAA Affiliate 14 Jun '24

14 Jun '24

There is a permission problem somewhere, but I don’t know where. If I run as root, it works: admin@slurmfrontend:~$ srun hostname srun: error: task 0 launch failed: Slurmd could not execve job slurmstepd: error: task_g_set_affinity: Operation not permitted slurmstepd: error: _exec_wait_child_wait_for_parent: failed: No error admin@slurmfrontend:~$ sudo srun hostname slurmnode1 admin@slurmfrontend:~$ sudo srun -N 3 hostname slurmnode1 slurmnode3 slurmnode2 admin@slurmfrontend:~$ Chris --------------------------------------------------------------------------------------------------- Christopher W. Harrop voice: (720) 649-0316 NOAA Global Systems Laboratory, R/GSL6 fax: (303) 497-7259 325 Broadway Boulder, CO 80303

2 1

Re: Limit GPU depending on type
by Gestió Servidors 14 Jun '24

14 Jun '24

Hi, because of my real scenario (in mi first post I explained my testing scenario), with several differents users of differents types (researchers, university students and/or teachers, etc), I have distributed my GPUs in 3 differents partitions: * PartitionName=cuda-staff.q Nodes=gpu-[1-4] OverSubscribe=No MaxTime=INFINITE State=UP AllocNodes=node[0-22],node-login,node-login-bak AllowGroups=caos,profesor * PartitionName=cuda-int.q Nodes=gpu-[2,4] OverSubscribe=No MaxTime=30:00 State=UP AllocNodes=node[0-22] * PartitionName=cuda-ext.q Nodes=gpu-[1,3] OverSubscribe=No MaxTime=30:00 State=UP AllocNodes=node[0-22],node-login,node-login-bak Explanation: * In “cuda-staff.q”, only teachers could submit and they can submit from each lab node or each login node. * In “cuda-int.q” everybody can submit, but only from lab nodes. * In “cuda-ex.q” everybody can also submit, but in this case, from lab nodes and login nodes. Since now, I have not used “QoS”... but I’m going to install a new data/user server and I want to reconfigure SLURM. If I distributed GPUs in the way that Gerhard Strangar explains (both similar RTX3080) in a partition with restricted QoS and all other GPUs in other partition, some GPUs that now are restricted for “inside lab user” will be accessible from “outside lab user”. So I think (all teachers want this way ☹ ) I must have these partitions distribution. However, if I apply a QoS limiting only one GPU in each partition. it woulb be possible that a user could user one RTX3080 from outside lab and the other RTX3080 from inside lab… and this is what I want to deny. I will reread documentation. All help will be appreciated, of course!!!! Thanks.

1 0

Slurm version 23.11.8 is now available
by Marshall Garey 13 Jun '24

13 Jun '24

We are pleased to announce the availability of Slurm version 23.11.8. The 23.11.8 release fixes some potential crashes in slurmctld, slurmrestd, and slurmd when using less common features; two issues in auth/slurm; and a few other minor bugs. Slurm can be downloaded from https://www.schedmd.com/downloads.php . -Marshall > -- Fix slurmctld crash when reconfiguring with a PrologSlurmctld is running. > -- Fix slurmctld crash after a job has been resized. > -- Fix slurmctld and slurmdbd potentially stopping instead of performing a > logrotate when recieving SIGUSR2 when using auth/slurm. > -- Fix not having a disabled value for keepalive CommunicationParameters in > slurm.conf when these parameters are not set. This can log an error when > setting a socket, for example during slurmdbd registration with ctld. > -- switch/hpe_slingshot - Fix slurmctld crash when upgrading from 23.02. > -- Fix "Could not find group" errors from validate_group() when using > AllowGroups with large /etc/group files. > -- slurmrestd - Prevent a slurmrestd segfault when parsing the crontab field, > which was never usable. Now it explicitly ignores the value and emits a > warning if it is used for the following endpoints: > 'POST /slurm/v0.0.39/job/{job_id}' > 'POST /slurm/v0.0.39/job/submit' > 'POST /slurm/v0.0.40/job/{job_id}' > 'POST /slurm/v0.0.40/job/submit' > -- Fix getting user environment when using sbatch with "--get-user-env" or > "--export=" when there is a user profile script that reads /proc. > -- Prevent slurmd from crashing if acct_gather_energy/gpu is configured but > GresTypes is not configured. > -- Do not log the following errors when AcctGatherEnergyType plugins are used > but a node does not have or cannot find sensors: > "error: _get_joules_task: can't get info from slurmd" > "error: slurm_get_node_energy: Zero Bytes were transmitted or received" > However, the following error will continue to be logged: > "error: Can't get energy data. No power sensors are available. Try later" > -- Fix cloud nodes not being able to forward to nodes that restarted with new > IP addresses. > -- sacct - Fix printing of job group for job steps. > -- Fix error in scrontab jobs when using slurm.conf:PropagatePrioProcess=1. > -- Fix slurmctld crash on a batch job submission with "--nodes 0,...". > -- Fix dynamic IP address fanout forwarding when using auth/slurm.

1 0

slurmstepd: error: task_g_set_affinity: Operation not permitted
by Christopher Harrop - NOAA Affiliate 13 Jun '24

13 Jun '24

Hi, I am building a containerized Slurm cluster with Ubuntu 20.04 and have it almost working. The daemons start, and an “sinfo” command shows compute nodes up and available: admin@slurmfrontend:~$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST slurmpar* up infinite 3 idle slurmnode[1-3] admin@slurmfrontend:~$ However, if I try to use “srun” to test a job submission it fails saying it could not execve the job: admin@slurmfrontend:~$ srun hostname srun: error: task 0 launch failed: Slurmd could not execve job slurmstepd: error: task_g_set_affinity: Operation not permitted slurmstepd: error: _exec_wait_child_wait_for_parent: failed: No error admin@slurmfrontend:~$ If I go to the slurmnode1 container where the job should run, and look at the slurmd log, all I see is this: admin@slurmnode1:/$ sudo cat /var/log/slurmd.log [2024-06-13T14:58:36.238] CPU frequency setting not configured for this node [2024-06-13T14:58:36.239] warning: Core limit is only 0 KB [2024-06-13T14:58:36.239] slurmd version 23.11.7 started [2024-06-13T14:58:36.243] slurmd started on Thu, 13 Jun 2024 14:58:36 +0000 [2024-06-13T14:58:36.243] CPUs=8 Boards=1 Sockets=8 Cores=1 Threads=1 Memory=47926 TmpDisk=59767 Uptime=71713 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) [2024-06-13T14:58:47.230] launch task StepId=1.0 request from UID:1000 GID:1000 HOST:172.20.0.2 PORT:50618 [2024-06-13T14:58:47.230] task/affinity: lllp_distribution: JobId=1 implicit auto binding: sockets,one_thread, dist 8192 [2024-06-13T14:58:47.230] task/affinity: _task_layout_lllp_cyclic: _task_layout_lllp_cyclic [2024-06-13T14:58:47.230] task/affinity: _lllp_generate_cpu_bind: _lllp_generate_cpu_bind jobid [1]: mask_cpu,one_thread, 0x01 [2024-06-13T14:58:47.243] [1.0] error: task_g_set_affinity: Operation not permitted [2024-06-13T14:58:47.243] [1.0] error: _exec_wait_child_wait_for_parent: failed: No error [2024-06-13T14:58:47.244] [1.0] error: job_manager: exiting abnormally: Slurmd could not execve job [2024-06-13T14:58:47.247] [1.0] stepd_cleanup: done with step (rc[0xfb4]:Slurmd could not execve job, cleanup_rc[0xfb4]:Slurmd could not execve job) admin@slurmnode1:/$ I’ve installed by following the instructions for building/installing the Debian RPMs and can see that all the daemons are up and running. I have this slurm.conf (on all nodes): admin@slurmfrontend:~$ grep -v '#' /etc/slurm/slurm.conf ClusterName=cluster SlurmctldHost=slurmmaster MpiDefault=none ProctrackType=proctrack/linuxproc ReturnToService=1 SlurmdParameters=config_overrides SlurmctldPidFile=/var/run/slurmctld.pid SlurmctldPort=6817 SlurmdPidFile=/var/run/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=root StateSaveLocation=/var/spool/slurmctld SwitchType=switch/none TaskPlugin=task/affinity InactiveLimit=0 KillWait=30 MinJobAge=300 SlurmctldTimeout=120 SlurmdTimeout=300 Waittime=0 SchedulerType=sched/backfill SelectType=select/cons_tres AccountingStorageType=accounting_storage/none JobCompType=jobcomp/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/none SlurmctldDebug=info SlurmctldLogFile=/var/log/slurmctld.log SlurmdDebug=info SlurmdLogFile=/var/log/slurmd.log NodeName=slurmnode[1-3] CPUs=8 State=UNKNOWN PartitionName=slurmpar Nodes=ALL Default=YES MaxTime=INFINITE State=UP admin@slurmfrontend:~$ And I have this group.conf (on all nodes): admin@slurmfrontend:~$ grep -v '#' /etc/slurm/cgroup.conf CgroupPlugin=cgroup/v1 ConstrainCores=yes ConstrainDevices=yes ConstrainRAMSpace=yes admin@slurmfrontend:~$ Does anyone have any clues about where to look for why “srun” can’t run a job and where the "task_g_set_affinity: Operation not permitted” may be coming from? Chris --------------------------------------------------------------------------------------------------- Christopher W. Harrop voice: (720) 649-0316 NOAA Global Systems Laboratory, R/GSL6 fax: (303) 497-7259 325 Broadway Boulder, CO 80303

1 0

Limit GPU depending on type
by Gestió Servidors 13 Jun '24

13 Jun '24

Hello, I would like to know if it would be possible to limit, using "sacctmgr", use of a certain type of GPU according the name I have assigned in "gres.conf" file. For example, my small cluster has 3 GPUs nodes sharing 2 GPUs each one. Two of that GPUs are the same model but they are located in different servers. Because of my scenario, I would like to limit users to user only one of that GPU type and not allowing to use both of them. For example: * gpu-node-1: * GTX1080 * RTX3080 * gpu-node-2: * GTX750 * GTX680 * gpu-node-3: * RTX2070 * RTX3080 What I want is users could user all of them but simultaniously, a user only could use one of the RTX3080. Using QoS I have created a new "qos" with "sacctmgr add qos test-gpu-limit MaxTRESPerUser=gres/gpu=1", but with this new QoS, users are limited to use only one GPU, even they need to use different GPUs models. I have tried with "sacctmgr add qos test-gpu-limit MaxTRESPerUser=gres/gpu:RTX3080:1" but system returns this error "sacctmgr: error: slurmdb_format_tres_str: no TRES id found for gres/gpu:RTX3080:1". So, could be possible to apply limits I want to apply? Thanks. -- [cid:image003.jpg@01DABD6A.8DFD6D30]<http://www.uab.cat/> Daniel Ruiz Molina Tècnic Mitjà Informàtic Arquitectura de Computadors i Sistemes Operatius Escola d'Enginyeria Edifici Q - Despatx QC/3052 - Carrer de les Sitges Campus de la UAB · 08193 Bellaterra (Cerdanyola del Vallès) · Barcelona · Spain +34 93 581 35 44 www.uab.cat<http://www.uab.cat/> Daniel Ruiz at UAB<https://tinyurl.com/yd95zb8j> [cid:image004.jpg@01DABD6A.8DFD6D30]<www.linkedin.com/in/daniel-ruiz-molina-50a83b27> Aquest missatge s'adreça exclusivament a la persona destinatària i pot contenir informació privada o confidencial. Si l'heu rebut per error, comuniqueu-nos-ho i destruïu-lo, i tingueu present que no teniu autorització per fer-ne cap ús. Abans d'imprimir aquest missatge penseu en el medi ambient.

2 1

2025

2024

slurm-users ----- 2025 ----- July 2025 June 2025 May 2025 April 2025 March 2025 February 2025 January 2025 ----- 2024 ----- December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 February 2024 January 2024

slurm-users