- slurm-users - lists.schedmd.com

Why is my job killed when ResumeTimeout is reached instead of it being requeued?
by Xaver Stiensmeier 09 Dec '24

09 Dec '24

Dear Slurm-user list, when a job fails because the node startup fails (cloud scheduling), the job should be re-queued: Resume Timeout Maximum time permitted (in seconds) between when a node resume request is issued and when the node is actually available for use. Nodes which fail to respond in this time frame will be marked DOWN and the jobs scheduled on the node requeued. however, instead of requeuing the job, it is killed. [2024-11-18T10:41:52.003] node bibigrid-worker-wubqboa1z2kkgx0-0 not resumed by ResumeTimeout(1200) - marking down and power_save [2024-11-18T10:41:52.003] Killing JobId=1 on failed node bibigrid-worker-wubqboa1z2kkgx0-0 [2024-11-18T10:41:52.046] update_node: node bibigrid-worker-wubqboa1z2kkgx0-0 reason set to: FailedStartup [2024-11-18T10:41:52.046] power down request repeating for node bibigrid-worker-wubqboa1z2kkgx0-0 Our ResumeProgram does not change the state of the underlying workers, I think we should set the nodes to DOWN explicitly if the startup fails given: *ResumeProgram* is unable to restore a node to service with a responding slurmd and an updated BootTime, it should set the node state to DOWN, which will result in a requeue of any job associated with the node - this will happen automatically if the node doesn't register within ResumeTimeout but in any case as we can see in the log the job should be requeued based on it reaching the ResumeTimeout alone. I am unsure why that is not happening. The power down request is sent by the ResumeFailProgram. We have SlurmctldParameters=idle_on_node_suspend activated, but that shouldn't affect Resume, I guess. My Slurm version is slurm 23.11.5 Best regards, Xaver # More context ## Slurmctld from submitting job to failure [2024-11-18T10:21:45.490] sched: _slurm_rpc_allocate_resources JobId=1 NodeList=bibigrid-worker-wubqboa1z2kkgx0-0 usec=1221 [2024-11-18T10:21:45.499] debug: sackd_mgr_dump_state: saved state of 0 nodes [2024-11-18T10:21:58.387] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:21:58.387] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:22:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:23:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:23:20.009] debug: sackd_mgr_dump_state: saved state of 0 nodes [2024-11-18T10:23:23.003] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:23:23.398] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:23:23.398] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:23:53.398] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:23:53.398] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:24:21.000] debug: sched: Running job scheduler for full queue. [2024-11-18T10:24:21.484] slurmscriptd: error: _run_script: JobId=0 resumeprog exit status 1:0 [2024-11-18T10:25:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:26:02.000] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:26:02.417] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:26:02.417] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:26:20.007] debug: sched: Running job scheduler for full queue. [2024-11-18T10:26:32.417] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:26:32.417] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:27:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:28:20.003] debug: Updating partition uid access list [2024-11-18T10:28:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:28:20.008] debug: sackd_mgr_dump_state: saved state of 0 nodes [2024-11-18T10:29:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:29:22.000] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:29:22.448] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:29:22.448] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:30:20.007] debug: sched: Running job scheduler for full queue. [2024-11-18T10:31:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:32:21.000] debug: sched: Running job scheduler for full queue. [2024-11-18T10:32:42.000] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:32:42.478] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:32:42.478] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:33:12.479] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:33:12.479] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:33:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:33:20.010] debug: sackd_mgr_dump_state: saved state of 0 nodes [2024-11-18T10:34:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:35:20.007] debug: sched: Running job scheduler for full queue. [2024-11-18T10:36:01.004] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:36:01.504] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:36:01.504] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:36:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:36:31.505] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:36:31.505] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:37:21.000] debug: sched: Running job scheduler for full queue. [2024-11-18T10:38:20.008] debug: Updating partition uid access list [2024-11-18T10:38:20.008] debug: sched: Running job scheduler for full queue. [2024-11-18T10:38:20.017] debug: sackd_mgr_dump_state: saved state of 0 nodes [2024-11-18T10:39:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:39:21.003] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:39:21.530] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:39:21.530] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:39:51.531] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:39:51.531] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:40:21.000] debug: sched: Running job scheduler for full queue. [2024-11-18T10:41:20.003] debug: sched: Running job scheduler for full queue. [2024-11-18T10:41:52.003] node bibigrid-worker-wubqboa1z2kkgx0-0 not resumed by ResumeTimeout(1200) - marking down and power_save [2024-11-18T10:41:52.003] Killing JobId=1 on failed node bibigrid-worker-wubqboa1z2kkgx0-0 [2024-11-18T10:41:52.046] update_node: node bibigrid-worker-wubqboa1z2kkgx0-0 reason set to: FailedStartup [2024-11-18T10:41:52.046] power down request repeating for node bibigrid-worker-wubqboa1z2kkgx0-0 [2024-11-18T10:41:52.047] debug: sackd_mgr_dump_state: saved state of 0 nodes [2024-11-18T10:41:52.549] debug: sched/backfill: _attempt_backfill: beginning [2024-11-18T10:41:52.549] debug: sched/backfill: _attempt_backfill: no jobs to backfill [2024-11-18T10:41:52.736] _slurm_rpc_complete_job_allocation: JobId=1 error Job/step already completing or completed [2024-11-18T10:41:53.000] debug: Spawning ping agent for bibigrid-master-wubqboa1z2kkgx0 [2024-11-18T10:41:53.000] debug: sched: Running job scheduler for default depth. [2024-11-18T10:41:53.014] update_node: node bibigrid-worker-wubqboa1z2kkgx0-0 reason set to: FailedStartup [2024-11-18T10:41:53.014] update_node: node bibigrid-worker-wubqboa1z2kkgx0-0 state set to IDLE

1 1

getting slurm going
by Steven Jones 08 Dec '24

08 Dec '24

What tests can I do to prove that slurm is talking to the nodes pls? regards Steven

3 5

Non-Standard Mail Notification in Job
by Daniel Miliate 05 Dec '24

05 Dec '24

Newbie here, I'm trying to send myself an email notification with a custom body/subject for notification of important events. Is this possible? I've tried a few things in the script below. Thanks in advance and apologies if this is not how this listing works! - Daniel ---- Script ---- .... #SBATCH --mail-user= example(a)example.example #SBATCH --export=ALL #SBATCH --mail-type=ALL # Both of these work which mail which mailx EMAIL="example(a)example.example" # Attempt 1 echo "Test email body" | mail -s "Test Subject" "$EMAIL" # Attempt 2 mail -s "hello" "$EMAIL" <<EOF hello world EOF # Attempt 3 echo "Specific Information" | mail -s "SLURM Job information" $EMAIL ---- End ----

3 2

slurm not running on a warewulf node
by Steven Jones 04 Dec '24

04 Dec '24

Hi, I have set a log creation/location in slurm.conf as, SlurmdLogFile=/var/log/slurm/slurmd.log But it is 0 length. Slurm will not run, what else do I need to do to log why its failing pls? regards Steven

4 6

slurmd on a warwwulf node - not running
by Steven Jones 04 Dec '24

04 Dec '24

Hi, I have munge running on the controller and nodes fine, all tests passed. I have slurmctld running on the controller ok after checking the logs /var/spool/slurmctld was not created which I assume should have happened via the rpm install? Anyway I cant get slurmd to run on the warewulf nodes and cant find any log to check? How to fault find this? regards Steven

3 3

How can I make sure my user have only one job per node (Job array --exclusive=user,)
by Oren 03 Dec '24

03 Dec '24

Hi, I have a cluster of 20-nodes, and I want to run a jobarray on that cluster, but I want each node to get one job per node. When I do the following: #!/bin/bash #SBATCH --job-name=process_images_train # Job name #SBATCH --time=50:00:00 # Time limit hrs:min:sec #SBATCH --tasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=50000 #SBATCH --array=0-19# 19 # Job array with 20 jobs (0 to 19) I get 10 jobs in node #1 and 10 jobs in node #20, I want a job in each node. I've tried: #SBATCH --exclusive=user Also #SBATCH --spread-job #SBATCH --distribution=cyclic Nothing changes, node#1 got 10 jobs and node#2 got 10 jobs. Thanks

2 6

multiple conf-server entries for sackd
by Brian Andrus 03 Dec '24

03 Dec '24

Not sure anyone would know, but... If you are running slurm in HA mode (multiple SlurmctldHost entries) is it possible to point sackd to more than one using the --conf-server option? So either specify --conf-server more than once, or have a comma-delimited list of them? The docs are a little light about that. Brian Andrus

1 0

sinfo not listing any partitions
by Kent L. Hanson 02 Dec '24

02 Dec '24

I am doing a new install of slurm 24.05.3 I have all the packages built and installed on headnode and compute node with the same munge.key, slurm.conf, and gres.conf file. I was able to run munge and unmunge commands to test munge successfully. Time is synced with chronyd. I can't seem to find any useful errors in the logs. For some reason when I run sinfo no nodes are listed. I just see the headers for each column. Has anyone seen this or know what a next step of troubleshooting would be? I'm new to this and not sure where to go from here. Thanks for any and all help! The odd output I am seeing [username@headnode ~] sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST (Nothing is output showing status of partition or nodes) Slurm.conf ClusterName=slurmkvasir SlurmctldHost=kadmin2 MpiDefault=none ProctrackType=proctrack/cgroup PrologFlags=contain ReturnToService=2 SlurmctldPidFile=/var/run/slurm/slurmctld.pid SlurmctldPort=6817 SlurmPidFile=/var/run/slurm/slurmd.pid SlurmdPort=6818 SlurmdSpoolDir=/var/spool/slurmd SlurmUser=slurm StateSaveLocation=/var/spool/slurmctld TaskPlugin=task/cgroup MinJobAge=600 SchedulerType=sched/backfill SelectType=select/cons_tres PriorityType=priority/multifactor AccountingStorageHost=localhost AccountingStoragePass=/var/run/munge/munge.socket.2 AccountingStorageType=accounting_storage/slurmdbd AccountingStorageTRES=gres/gpu,cpu,node JobCompType=jobcomp/none JobAcctGatherFrequency=30 JobAcctGatherType=jobacct_gather/cgroup SlurmctldDebug=info SlurmctldLogFile=/var/log/slurm/slurmctld.log SlurmdDebug=info SlurmLogFile=/var/log/slurm/slurmd.log nodeName=k[001-448] PartitionName=default Nodes=k[001-448] Default=YES MaxTime=INFINITE State=up Slurmctld.log Error: Configured MailProg is invalid Slurmctld version 24.05.3 started on cluster slurmkvasir Accounting_storage/slurmdbd: clusteracct_storage_p_register_ctld: Regisetering slurmctld at port 8617 Error: read_slurm_conf: default partition not set. Revovered state of 448 nodes Down nodes: k[002-448] Recovered information about 0 jobs Revovered state of 0 reservations Read_slurm_conf: backup_controller not specified Select/cons_tres; select_p_reconfigure: select/cons_tres: reconfigure Running as primary controller Slurmd.log Error: Node configuration differs from hardware: CPUS=1:40(hw) Boards=1:1(hw) SocketsPerBoard=1:2(hw) CoresPerSocket=1:20(hw) ThreadsPerCore:1:1(hw) CPU frequency setting not configured for this node Slurmd version 24.05.3started Slurmd started on Wed, 27 Nov 2024 06:51:03 -0700 CPUS=1 Boards=1 Cores=1 Threads=1 Memory=192030 TmpDisk=95201 uptime 166740 CPUSpecList=(null) FeaturesAvail=(null) FeaturesActive=(null) Error: _forward_thread: failed to k019 (10.142.0.119:6818): Connection timed out (Above line repeated 20 or so times for different nodes.) Thanks, Kent Hanson

6 9

Slurm version 24.11 is now available
by Tim Wickberg 29 Nov '24

29 Nov '24

We are pleased to announce the availability of the Slurm 24.11 release. To highlight some new features in 24.11: - New gpu/nvidia plugin. This does not rely on any NVIDIA libraries, and will build by default on all systems. It supports basic GPU detection and management, but cannot currently identify GPU-to-GPU links, or provide usage data as these are not exposed by the kernel driver. - Add autodetected GPUs to the output from "slurmd -C". - Added new QOS-based reports to "sreport". - Revamped network I/O with the "conmgr" thread-pool model. - Added new "hostlist function" syntax for management commands and configuration files. - switch/hpe_slingshot - Added support for hardware collectives setup through the fabric manager. (Requires SlurmctldParameters=enable_stepmgr) - Added SchedulerParameters=bf_allow_magnetic_slot configuration option to allow backfill planning for magnetic reservations. - Added new "scontrol listjobs" and "liststeps" commands to complement "listpids", and provide --json/--yaml output for all three subcommands. - Allow jobs to be submitted against multiple QOSes. - Added new experimental "oracle" backfill scheduling support, which permits jobs to be delayed if the oracle function determines the reduced fragmentation of the network topology is sufficiently advantageous. - Improved responsiveness of the controller when jobs are requeued by replacing the "db_index" identifier with a slurmctld-generated unique identifier. ("SLUID") - New options to job_container/tmpfs to permit site-specific scripts to modify the namespace before user steps are launched, and to ensure all steps are completely captured within that namespace. The Slurm documentation has also been updated to the 24.11 release. (Older versions can be found in the archive, linked from the main documentation page.) Slurm can be downloaded from https://www.schedmd.com/download-slurm/ . - Tim -- Tim Wickberg Chief Technology Officer, SchedMD LLC Commercial Slurm Development and Support

1 0

Non-intiutive rank placement/CPU masking
by Ohlerich, Martin 28 Nov '24

28 Nov '24

Dear * , I've some question for understanding. Essentially, I use the following job script: ---------------------------> #!/bin/bash #SBATCH -J srun_test #SBATCH --time=0:02:00 #SBATCH --export=NONE #SBATCH --partition=test #SBATCH --nodes=2 #SBATCH -o ./%x.%j.out #SBATCH -D . srun --export=all --mpi=pmi2 --verbose --cpu-bind=verbose -n 96 --ntasks-per-node=48 --ntasks-per-socket=24 -c 2 -m block:block -B 2:56:1 /lrz/sys/tools/placement_test_2021/bin/placement-test.intel_impi ---------------------------> On a sapphire rapid with 2 sockets and 56 physical CPUs. The executable is just some MPI dummy, build with Intel Compilers and Intel MPI (although I guess that's not essential here). The related MPI environment looks as show below (1). Slurm environment is shown below under (2). Now. what I would expect intuitively is that I place here 24 ranks per socket. And here also on the even CPUs in sequence starting from CPU 0 of each socket. What I get is srun: defined options srun: -------------------- -------------------- srun: (null) : i20r01c01s[06,08] srun: cpu-bind : verbose srun: cpus-per-task : 2 srun: distribution : block:block srun: export : all srun: extra-node-info : 2:56:1 srun: jobid : 395726 srun: job-name : MGLET_srun srun: mpi : pmi2 srun: nodes : 2 srun: ntasks : 96 srun: ntasks-per-node : 48 srun: ntasks-per-socket : 24 srun: verbose : 1 srun: -------------------- -------------------- srun: end of defined options srun: jobid 395726: nodes(2):`i20r01c01s[06,08]', cpu counts: 224(x2) srun: Implicitly setting --exact, because -c/--cpus-per-task given. srun: CpuBindType=verbose,threads srun: launching StepId=395726.0 on host i20r01c01s06, 48 tasks: [0-47] srun: launching StepId=395726.0 on host i20r01c01s08, 48 tasks: [48-95] srun: topology/tree: init: topology tree plugin loaded cpu-bind=MASK - i20r01c01s06, task 1 1 [98437]: mask 0xc set cpu-bind=MASK - i20r01c01s06, task 2 2 [98438]: mask 0x30 set cpu-bind=MASK - i20r01c01s06, task 3 3 [98439]: mask 0xc0 set cpu-bind=MASK - i20r01c01s06, task 4 4 [98440]: mask 0x300 set cpu-bind=MASK - i20r01c01s06, task 5 5 [98441]: mask 0xc00 set cpu-bind=MASK - i20r01c01s06, task 6 6 [98442]: mask 0x3000 set cpu-bind=MASK - i20r01c01s06, task 7 7 [98443]: mask 0xc000 set cpu-bind=MASK - i20r01c01s06, task 8 8 [98444]: mask 0x30000 set cpu-bind=MASK - i20r01c01s06, task 9 9 [98445]: mask 0xc0000 set cpu-bind=MASK - i20r01c01s06, task 10 10 [98446]: mask 0x300000 set cpu-bind=MASK - i20r01c01s06, task 11 11 [98447]: mask 0xc00000 set cpu-bind=MASK - i20r01c01s06, task 12 12 [98448]: mask 0x3000000 set cpu-bind=MASK - i20r01c01s06, task 13 13 [98449]: mask 0xc000000 set cpu-bind=MASK - i20r01c01s06, task 14 14 [98450]: mask 0x30000000 set cpu-bind=MASK - i20r01c01s06, task 15 15 [98451]: mask 0xc0000000 set cpu-bind=MASK - i20r01c01s06, task 16 16 [98452]: mask 0x300000000 set cpu-bind=MASK - i20r01c01s06, task 17 17 [98453]: mask 0xc00000000 set cpu-bind=MASK - i20r01c01s06, task 18 18 [98454]: mask 0x3000000000 set cpu-bind=MASK - i20r01c01s06, task 19 19 [98455]: mask 0xc000000000 set cpu-bind=MASK - i20r01c01s06, task 20 20 [98456]: mask 0x30000000000 set cpu-bind=MASK - i20r01c01s06, task 21 21 [98457]: mask 0xc0000000000 set cpu-bind=MASK - i20r01c01s06, task 22 22 [98458]: mask 0x300000000000 set cpu-bind=MASK - i20r01c01s06, task 23 23 [98459]: mask 0xc00000000000 set cpu-bind=MASK - i20r01c01s06, task 24 24 [98460]: mask 0x300000000000000 set cpu-bind=MASK - i20r01c01s06, task 25 25 [98461]: mask 0xc00000000000000 set cpu-bind=MASK - i20r01c01s06, task 26 26 [98462]: mask 0x3000000000000000 set cpu-bind=MASK - i20r01c01s06, task 27 27 [98463]: mask 0xc000000000000000 set cpu-bind=MASK - i20r01c01s06, task 28 28 [98464]: mask 0x30000000000000000 set cpu-bind=MASK - i20r01c01s06, task 29 29 [98465]: mask 0xc0000000000000000 set cpu-bind=MASK - i20r01c01s06, task 30 30 [98466]: mask 0x300000000000000000 set cpu-bind=MASK - i20r01c01s06, task 31 31 [98467]: mask 0xc00000000000000000 set cpu-bind=MASK - i20r01c01s06, task 32 32 [98468]: mask 0x3000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 33 33 [98469]: mask 0xc000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 34 34 [98470]: mask 0x30000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 35 35 [98471]: mask 0xc0000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 36 36 [98472]: mask 0x300000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 37 37 [98473]: mask 0xc00000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 38 38 [98474]: mask 0x3000000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 39 39 [98475]: mask 0xc000000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 40 40 [98476]: mask 0x30000000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 41 41 [98477]: mask 0xc0000000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 42 42 [98478]: mask 0x300000000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 43 43 [98479]: mask 0xc00000000000000000000000 set cpu-bind=MASK - i20r01c01s06, task 44 44 [98480]: mask 0x3 set cpu-bind=MASK - i20r01c01s06, task 45 45 [98481]: mask 0xc set cpu-bind=MASK - i20r01c01s06, task 46 46 [98482]: mask 0x30 set srun: Node i20r01c01s06, 48 tasks started cpu-bind=MASK - i20r01c01s08, task 48 0 [179006]: mask 0x3 set cpu-bind=MASK - i20r01c01s08, task 49 1 [179007]: mask 0xc set cpu-bind=MASK - i20r01c01s08, task 50 2 [179008]: mask 0x30 set cpu-bind=MASK - i20r01c01s08, task 51 3 [179009]: mask 0xc0 set cpu-bind=MASK - i20r01c01s08, task 52 4 [179010]: mask 0x300 set cpu-bind=MASK - i20r01c01s08, task 53 5 [179011]: mask 0xc00 set cpu-bind=MASK - i20r01c01s08, task 54 6 [179012]: mask 0x3000 set cpu-bind=MASK - i20r01c01s08, task 55 7 [179013]: mask 0xc000 set cpu-bind=MASK - i20r01c01s08, task 56 8 [179014]: mask 0x30000 set cpu-bind=MASK - i20r01c01s08, task 57 9 [179015]: mask 0xc0000 set cpu-bind=MASK - i20r01c01s08, task 58 10 [179016]: mask 0x300000 set cpu-bind=MASK - i20r01c01s08, task 59 11 [179017]: mask 0xc00000 set cpu-bind=MASK - i20r01c01s08, task 60 12 [179018]: mask 0x3000000 set cpu-bind=MASK - i20r01c01s08, task 61 13 [179019]: mask 0xc000000 set cpu-bind=MASK - i20r01c01s08, task 62 14 [179020]: mask 0x30000000 set cpu-bind=MASK - i20r01c01s08, task 63 15 [179021]: mask 0xc0000000 set cpu-bind=MASK - i20r01c01s08, task 64 16 [179022]: mask 0x300000000 set cpu-bind=MASK - i20r01c01s08, task 65 17 [179023]: mask 0xc00000000 set cpu-bind=MASK - i20r01c01s08, task 66 18 [179024]: mask 0x3000000000 set cpu-bind=MASK - i20r01c01s08, task 67 19 [179025]: mask 0xc000000000 set cpu-bind=MASK - i20r01c01s08, task 68 20 [179026]: mask 0x30000000000 set cpu-bind=MASK - i20r01c01s08, task 69 21 [179027]: mask 0xc0000000000 set cpu-bind=MASK - i20r01c01s08, task 70 22 [179028]: mask 0x300000000000 set cpu-bind=MASK - i20r01c01s08, task 71 23 [179029]: mask 0xc00000000000 set cpu-bind=MASK - i20r01c01s08, task 72 24 [179030]: mask 0x300000000000000 set cpu-bind=MASK - i20r01c01s08, task 73 25 [179031]: mask 0xc00000000000000 set cpu-bind=MASK - i20r01c01s08, task 74 26 [179032]: mask 0x3000000000000000 set cpu-bind=MASK - i20r01c01s08, task 75 27 [179033]: mask 0xc000000000000000 set cpu-bind=MASK - i20r01c01s08, task 76 28 [179034]: mask 0x30000000000000000 set cpu-bind=MASK - i20r01c01s08, task 77 29 [179035]: mask 0xc0000000000000000 set cpu-bind=MASK - i20r01c01s08, task 78 30 [179036]: mask 0x300000000000000000 set cpu-bind=MASK - i20r01c01s08, task 79 31 [179037]: mask 0xc00000000000000000 set cpu-bind=MASK - i20r01c01s08, task 80 32 [179038]: mask 0x3000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 81 33 [179039]: mask 0xc000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 82 34 [179040]: mask 0x30000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 83 35 [179041]: mask 0xc0000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 84 36 [179042]: mask 0x300000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 85 37 [179043]: mask 0xc00000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 86 38 [179044]: mask 0x3000000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 87 39 [179045]: mask 0xc000000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 88 40 [179046]: mask 0x30000000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 89 41 [179047]: mask 0xc0000000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 90 42 [179048]: mask 0x300000000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 91 43 [179049]: mask 0xc00000000000000000000000 set cpu-bind=MASK - i20r01c01s08, task 92 44 [179050]: mask 0x3 set cpu-bind=MASK - i20r01c01s08, task 93 45 [179051]: mask 0xc set cpu-bind=MASK - i20r01c01s08, task 94 46 [179052]: mask 0x30 set cpu-bind=MASK - i20r01c01s08, task 95 47 [179053]: mask 0xc0 set srun: Node i20r01c01s08, 48 tasks started cpu-bind=MASK - i20r01c01s06, task 47 47 [98483]: mask 0xc0 set In words: the last 4 ranks meant for the second socket of each node are actually placed on the first socket ... So, effectively --ntasks-per-socket is then ignored? I couldn't see an obvious reason for that. Maybe I miss some important point ... or neglect some interference with Intel MPI's runtime environment (there didn't change anything in the masks when unsetting KMP_AFFINITY). I'd like also to mention that explicit placement via cpu_map does the right thing. ---------------------------> #!/bin/bash #SBATCH -J srun_test #SBATCH --time=0:02:00 #SBATCH --export=NONE #SBATCH --partition=test #SBATCH --nodes=2 #SBATCH --ntasks-per-node=48 #SBATCH -o ./%x.%j.out #SBATCH -D . srun --export=all --mpi=pmi2 --cpu-bind=map_cpu:$(seq 0 2 46 | tr '\n' ',')$(seq 56 2 102 | tr '\n' ',') /lrz/sys/tools/placement_test_2021/bin/placement-test.intel_impi ---------------------------> So, a workaround is available for me. But if I could get some illuminating hint on where my intuition failed above, I'd be very grateful. Thank you! Cheers, Martin (1) I_MPI_FILESYSTEM=on I_MPI_HYDRA_BOOTSTRAP=slurm I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS=--external-launcher I_MPI_HYDRA_BRANCH_COUNT=128 I_MPI_OFFLOAD=0 I_MPI_OFFLOAD_FAST_MEMCPY_COLL=0 I_MPI_OFFLOAD_RDMA=0 I_MPI_OFI_PROVIDER=psm3 I_MPI_ROOT=/dss/lrzsys/sys/spack/release/24.1.0/opt/x86_64/intel-oneapi-mpi/2021.11.0-gcc-w56vuor/mpi/2021.11 KMP_AFFINITY=granularity=thread,compact,1,0 (2) SLURMD_NODENAME=i20r01c01s06 SLURM_CLUSTER_NAME=sng2 SLURM_CONF=/etc/slurm/slurm.conf SLURM_CPUS_ON_NODE=224 SLURM_ECLIBR=0 SLURM_ECPLUG=1 SLURM_ERLAST=sbatch SLURM_ERSBAC=1 SLURM_GET_USER_ENV=0 SLURM_GTIDS=0 SLURM_JOBID=395726 SLURM_JOB_ACCOUNT=pr28fa SLURM_JOB_CPUS_PER_NODE=224(x2) SLURM_JOB_END_TIME=1732781080 SLURM_JOB_GID=3000114 SLURM_JOB_ID=395726 SLURM_JOB_NAME=MGLET_srun SLURM_JOB_NODELIST=i20r01c01s[06,08] SLURM_JOB_NUM_NODES=2 SLURM_JOB_PARTITION=test SLURM_JOB_QOS=test SLURM_JOB_START_TIME=1732780959 SLURM_JOB_UID=3808660 SLURM_JOB_USER=di49zop SLURM_LOCALID=0 SLURM_NNODES=2 SLURM_NODEID=0 SLURM_NODELIST=i20r01c01s[06,08] SLURM_PRIO_PROCESS=0 SLURM_PROCID=0 SLURM_SCRIPT_CONTEXT=prolog_task SLURM_SETUP_LICENSE=open source - no access restrictions SLURM_SETUP_MAINTAINER_LIST=Bader(a)lrz.de SLURM_SUBMIT_DIR=/dss/dsshome1/00/di49zop/test_mglet SLURM_SUBMIT_HOST=login26 SLURM_TASKS_PER_NODE=224(x2) SLURM_TASK_PID=98399 SLURM_TOPOLOGY_ADDR=leaf.m02r05.i20r01c01s06 SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node

1 0

2025

2024

slurm-users ----- 2025 ----- July 2025 June 2025 May 2025 April 2025 March 2025 February 2025 January 2025 ----- 2024 ----- December 2024 November 2024 October 2024 September 2024 August 2024 July 2024 June 2024 May 2024 April 2024 March 2024 February 2024 January 2024

slurm-users