March 2025 - slurm-users - lists.schedmd.com

Minimum cpu cores per node partition level configuration
by Jeherul Islam 10 Apr '25

10 Apr '25

Dear All, I need to configure the slurm so the user must take a certain minimum number of CPU cores for a particular partition(not system-wide). Otherwise, the job must not run. Any suggestions will be highly appreciated. With Thanks and Regards -- Jeherul Islam

4 4

cpus and gpus partitions and how to optimize the resource usage
by Massimo Sgaravatto 31 Mar '25

31 Mar '25

Dear all We have just installed a small SLURM cluster composed of 12 nodes: - 6 CPU only nodes: 2 Sockets=2, 96 CoresPerSocket 2, ThreadsPerCore=2, 1.5 TB of RAM - 6 nodes with also GPUS: same conf of the CPU-only node + 4 H100 per node We started with a setup with 2 partitions: - a 'onlycpus' partition which sees all the cpu-only nodes - a 'gpus' partition which sees the nodes with gpus and asked users to use the 'gpus' partition only for jobs that need gpus (for the time being we are not technically enforced that). The problem is that a job requiring a GPU usually needs only a few cores and only a few GB of RAM, which means wasting a lot of CPU cores. And having all nodes in the same partition would mean that there is the risk that a job requiring a GPU can't start if all CPU cores and/or all memory is used by CPU only jobs I went through the mailing list archive and I think that "splitting" a GPU node into two logical nodes (one to be used in the 'gpus' partition and one to be used in the 'onlycpus' partition) as discussed in [*] would help. Since that proposed solution is considered by his author a "bit of a kludge" and since I read that splitting a node into multiple logical nodes is in a general a bad idea, I'd like to understand if you could suggest other/best options. I also found this [**] thread, but I don't like too much that approach (i.e. relying on MaxCPUsPerNode) because it would mean having 3 partition (if I have got it right): two partitions for cpu only jobs and 1 partition for gpu jobs Many thanks, Massimo [*] https://groups.google.com/g/slurm-users/c/IUd7jLKME3M [**] https://groups.google.com/g/slurm-users/c/o7AiYAQ1YJ0

4 5

Preemption question
by Kamil Wilczek 31 Mar '25

31 Mar '25

Dear All, I would like to be able to preempt (SUSPEND) a single QoS of a user that blocks the queue for several days. Currently I have about 100 users on the cluster and it seems that setting the "Preempt" option to each QoS (we have personal QoSes) is not optimal. https://slurm.schedmd.com/sacctmgr.html#OPT_Preempt Is there a way to to set an option to this single problematic QoS, saying that the QoS can be preempted by any other QoS? It would be much more administrator-friendly solution ;) Kind regards -- Kamil Wilczek [https://keys.openpgp.org/] [D415917E84B8DA5A60E853B6E676ED061316B69B]

3 4

Slurm 24.05 and OpenMPI
by Matthias Leopold 28 Mar '25

28 Mar '25

Hi, I built a small Slurm 21.08 cluster with NVIDIA GPU hardware and NVIDIA deepops framework a couple of years ago. It is based on Ubuntu 20.04 and makes use of the NVIDIA pyxis/enroot container solution. For operational validation I used the nccl-tests application in a container. nccl-tests is compiled with MPI support (OpenMPI 4.1.6 or 4.1.7) and I used it also for validation of MPI jobs. Slurm jobs use "pmix" and tasks are launched via srun (not mpirun). Some of the GPUs can talk to each other via Infiniband, but MPI is rarely used at our site and I'm fully aware that my MPI knowledge is very limited. Still it worked with Slurm 21.08. Now I built a Slurm 24.05 cluster based on Ubuntu 24.04 and started to move hardware there. When I run my nccl-tests container (also with newer software) I see error messages like this: [node1:21437] OPAL ERROR: Unreachable in file ext3x_client.c at line 111 -------------------------------------------------------------------------- The application appears to have been direct launched using "srun", but OMPI was not built with SLURM's PMI support and therefore cannot execute. There are several options for building PMI support under SLURM, depending upon the SLURM version you are using: version 16.05 or later: you can use SLURM's PMIx support. This requires that you configure and build SLURM --with-pmix. Versions earlier than 16.05: you must use either SLURM's PMI-1 or PMI-2 support. SLURM builds PMI-1 by default, or you can manually install PMI-2. You must then build Open MPI using --with-pmi pointing to the SLURM PMI library location. Please configure as appropriate and try again. -------------------------------------------------------------------------- *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, *** and potentially your MPI job) [node1:21437] Local abort before MPI_INIT completed completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed! One simple question: Is this related to https://github.com/open-mpi/ompi/issues/12471? If so: is there some workaround? I'm very grateful for any comments. I know that a lot of detail information is missing, but maybe someone can still already give me a hint where to look. Thanks a lot Matthias

3 10

Doubt with SelectTypeParameters in slurm.conf
by Gestió Servidors 28 Mar '25

28 Mar '25

Hello, I'm running some tests in a very small testing environment (before applying in the real scenario). My environment is only a computer with a old Intel i4 with this "lscpu" configuration: Architecture: x86_64 CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel BIOS Vendor ID: Intel(R) Corporation CPU family: 6 Model: 26 Model name: Intel(R) Core(TM) i7 CPU 950 @ 3.07GHz "slurmd -C" returns: "CPUs=8 Boards=1 SocketsPerBoard=1 CoresPerSocket=4 ThreadsPerCore=2". My slurm.conf is configured with: SelectType=select/cons_tres SelectTypeParameters=CR_Core AccountingStorageTRES=cpu,mem,node,billing,fs/disk,vmem,pages After running a simple "helloworld" test, I have noticed that if SelectTypeParameters=CR_Core, system always reserves me an even number of CPUs (during "pending" time, I can see the real number I have requested, but when job starts "running", that number is increased to the next even number). Howerver, if I reconfigure slurm with SelectTypeParameters=CR_CPU, system reserves me the correct number of CPUs (during "pending" time and, also, during "running" time). I suppose that my system must be configured with CR_CPU, according https://slurm.schedmd.com/cons_tres.html, but could someone confirm that? Thanks.

2 1

bit_cache_init failure on the second time backup controller tries to take control
by Safdar Iqbal 28 Mar '25

28 Mar '25

Hi, We're running into an issue where slurmctld core-dumps with the following error. This happens on the backup controller, if it needs to take over from the primary, _for a second time_. slurmctld: fatal: bit_cache_init: cannot change size once set Has anyone seen this error before? Also if there are any existing discussions and/or tickets related to this, please let me know. Our slurm version is 24.11.1. ________________ Steps to reproduce: 1. On a healthy cluster, we make the primary controller unavailable. Since we're running our cluster on cloud VMs, we cause this by stopping the primary controller VM. 2. From the logs we can see the backup controller take over, and log the message "Running as primary controller" 3. We then start the primary again, making sure the IP addresses and hostnames stay consistent. Once slurmctld on the primary has started and taken back control, we can see the log "Running as primary controller" on that VM. 4. We then stop the primary controller VM again, causing the backup to try taking the control a second time. This time however the slurmctld on the backup coredumps, with following log entries from journalctl -u slurmctld: slurmctld: fatal: bit_cache_init: cannot change size once set slurmctld.service: Main process exited, code=dumped, status=6/ABRT slurmctld.service: Failed with result 'core-dump'. Thanks! - Safdar

1 0

How limit CPUs per node in a partition
by Gestió Servidors 27 Mar '25

27 Mar '25

Hello, I have a testing partition with only a node. That server has 12 CPUs (it's a very old server) (2 sockets, 6 cores per socket, 1 thread per core). That partition, called "test.q" only has that node, so by default, partition test.q has 12 CPUs (all from testing node). However, now I would like to decrease that CPU number from 12 to 4, for example. "slurmd -C" returns "CPUs=12 Boards=1 SocketsPerBoard=2 CoresPerSocket=6 ThreadsPerCore=1" as I have said to you before and if in slurm.conf I configure partition in this way "PartitionName=test.q Nodes=test-node MaxTime=10:00 State=UP AllocNodes=clus-login MaxCPUsPerNode=4", partition continues accepting jobs with 12 CPUs. Is it any wrong in my configuration? How could I get my purpose? Thanks. -- [cid:image001.jpg@01DB9F0E.FDD9A7A0]<http://www.uab.cat/> Daniel Ruiz Molina Tècnic Mitjà Informàtic Arquitectura de Computadors i Sistemes Operatius Escola d'Enginyeria Edifici Q - Despatx QC/3052 - Carrer de les Sitges Campus de la UAB · 08193 Bellaterra (Cerdanyola del Vallès) · Barcelona · Spain +34 93 581 35 44 www.uab.cat<http://www.uab.cat/> Daniel Ruiz at UAB<https://tinyurl.com/yd95zb8j> [cid:image002.jpg@01DB9F0E.FDD9A7A0]<www.linkedin.com/in/daniel-ruiz-molina-50a83b27> Aquest missatge s'adreça exclusivament a la persona destinatària i pot contenir informació privada o confidencial. Si l'heu rebut per error, comuniqueu-nos-ho i destruïu-lo, i tingueu present que no teniu autorització per fer-ne cap ús. Abans d'imprimir aquest missatge penseu en el medi ambient.

1 0

Re: Using more cores/CPUs that requested with
by Shunran Zhang 26 Mar '25

26 Mar '25

Ugh I think I did not catch up with the docs. I started with a system that defaults to cgroup v1 but the Slurm doc for that plugin is NOT available at that time. Thus I converted everything to cgroup v2. It appears that they are both supported and that documentation issue is more on the dev side than admin side. Thanks for pointing that out. I misinterpreted the "coming soon" part of cgroup v1 plugin and the "legacy" naming for "do not use". It should be fine. 2025年3月27日(木) 0:48 Williams, Jenny Avis <jenny_williams(a)unc.edu>: > “ … As cgroup is likely not supposed to be used in newer deployments of > Slurm.” > > > > I am curious about this statement. Would someone expand on this, to either > support or counter it? > > > > Jenny Williams > > UNC Chapel Hill > > > > > > *From:* Shunran Zhang via slurm-users <slurm-users(a)lists.schedmd.com> > *Sent:* Wednesday, March 26, 2025 10:52 AM > *To:* Gestió Servidors <sysadmin.caos(a)uab.cat> > *Cc:* Slurm User Community List <slurm-users(a)lists.schedmd.com> > *Subject:* [slurm-users] Re: Using more cores/CPUs that requested with > > > > If you are letting systemd taking most things over, you got systemd-cgtop > that work better than top for your case. There is also systemd-cgls for > non-interactive listing. > > > > Also mind to check if you are using cgroup2? A mount to check your cgroup > would suffice. As cgroup is likely not supposed to be used in newer > deployments of Slurm. > > > > > > 2025年3月26日(水) 17:14 Gestió Servidors via slurm-users < > slurm-users(a)lists.schedmd.com>: > > Hello, > > > > Thanks for your answers. I will try now!! One more question: is there any > way to check if Cgroups restrictions is working fine during a “running” job > or during SLURM scheduling process? > > > > Thanks again! > > > > > -- > slurm-users mailing list -- slurm-users(a)lists.schedmd.com > To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com > >

1 0

Re: Using more cores/CPUs that requested with
by Gestió Servidors 26 Mar '25

26 Mar '25

Hello, Thanks for your answers. I will try now!! One more question: is there any way to check if Cgroups restrictions is working fine during a "running" job or during SLURM scheduling process? Thanks again!

4 3

Using more cores/CPUs that requested with sbatch
by Gestió Servidors 25 Mar '25

25 Mar '25

Hello, I would like to know if there is any mechanism to avoid a user to do "cheating" when he submits a job. For example, a user submits a OpenMP job requesting 1 node (-N 1) and 2 tasks (-n 2). However, inside his script (or compiled in his binany), user uses more than 2 ntasks, for example, all free cores. For SLURM, job is only using 2, but really, node is 100% assigned to that job. SLURM can control that "abuse" in some way? Cgroups? Thanks.

3 2

2025

2024

slurm-users March 2025

2025

2024

slurm-users March 2025 ----- 2025 ----- August 2025 July 2025 June 2025 May 2025 April 2025 March 2025 February 2025 January 2025 ----- 2024 ----- December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 February 2024 January 2024

slurm-users March 2025