- slurm-users - lists.schedmd.com

srun hostname - Socket timed out on send/recv operation
by Arnuld 11 Jun '24

11 Jun '24

I have two machines. When I run "srum hostname" on one machine (it's both a controller and a node) then I get the hostname fine but I get socket timed out error in these two situations: 1) "srun hostname" on 2nd machine (it's a node) 2) "srun -N 2 hostname" on controller "scontrol show node" shows both mach2 and mach4. "sinfo" shows both nodes too. Also the job gets stuck forever in CG state after the error. Here is the output: $ srun -N 2 hostname mach2 srun: error: slurm_receive_msgs: [[mach4]:6818] failed: Socket timed out on send/recv operation srun: error: Task launch for StepId=2222.0 failed on node hpc4: Socket timed out on send/recv operation srun: error: Application launch failed: Socket timed out on send/recv operation srun: Job step aborted Output form "squeue" 3 seconds apart Tue Jun 11 05:09:56 2024 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2222 poxo hostname arnuld R 0:19 2 mach4,mach2 Tue Jun 11 05:09:59 2024 JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 2222 poxo hostname arnuld CG 0:20 1 mach4

1 1

sbatch: Node count specification invalid - when only specifying --ntasks
by George Leaver 11 Jun '24

11 Jun '24

Hello, Previously we were running 22.05.10 and could submit a "multinode" job using only the total number of cores to run, not the number of nodes. For example, in a cluster containing only 40-core nodes (no hyperthreading), Slurm would determine two nodes were needed with only: sbatch -p multinode -n 80 --wrap="...." Now in 23.02.1 this is no longer the case - we get: sbatch: error: Batch job submission failed: Node count specification invalid At least -N 2 is must be used (-n 80 can be added) sbatch -p multinode -N 2 -n 80 --wrap="...." The partition config was, and is, as follows (MinNodes=2 to reject small jobs submitted to this partition - we want at least two nodes requested) PartitionName=multinode State=UP Nodes=node[081-245] DefaultTime=168:00:00 MaxTime=168:00:00 PreemptMode=OFF PriorityTier=1 DefMemPerCPU=4096 MinNodes=2 QOS=multinode Oversubscribe=EXCLUSIVE Default=NO All nodes are of the form NodeName=node245 NodeAddr=node245 State=UNKNOWN Procs=40 Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 RealMemory=187000 slurm.conf has EnforcePartLimits = ANY SelectType = select/cons_tres TaskPlugin = task/cgroup,task/affinity A few fields from: sacctmgr show qos multinode Name|Flags|MaxTRES multinode|DenyOnLimit|node=5 The sbatch/srun man page states: -n, --ntasks .... If -N is not specified, the default behavior is to allocate enough nodes to satisfy the requested resources as expressed by per-job specification options, e.g. -n, -c and --gpus. I've had a look through release notes back to 22.05.10 but can't see anything obvious (to me). Has this behaviour changed? Or, more likely, what have I missed ;-) ? Many thanks, George -- George Leaver Research Infrastructure, IT Services, University of Manchester http://ri.itservices.manchester.ac.uk | @UoM_eResearch

2 3

Issue about selecting cpus for optimization
by Purvesh Parmar 10 Jun '24

10 Jun '24

Hi, We have 16 nodes cluster with DGX-A100 (80 GB). We have 128 cores of each node separated in to a separate partition for cpu only jobs and 8 GPUs and 128 cores in other partitions for cpugpu jobs. We want to ensure that only selected 128 cores should be part of the cpu partition. (NUMA / Symmetry) for optimization. How to achieve it? cores parameter in gres.conf help? Regards, Purvesh

1 0

scontrol create partition fails
by Long, Daniel S. 10 Jun '24

10 Jun '24

Hi, I need to temporarily dedicated one of our compute nodes to a single account. To do this, I was going to create a new partition but I'm running into an error where scontrol create partition outputs "scontrol: error: Invalid input: partition Request aborted" regardless of what parameters I give it. As far as I can tell, this should be allowed; the man page for scontrol has a whole section titled "Partitions - Specficiations for Create, Update, and Delete Commands". What am I missing? Also, is there a better way to approach this? This is really just a one or two day thing and I'm a little surprised there's no easy way to cordon off a node for a user or a project without spinning up an entire partition. Am I missing something obvious? Thanks for any help you can provide.

2 2

Software builds using slurm
by Duane Ellis 10 Jun '24

10 Jun '24

I have been lurking here for a while hoping to see some examples that would help but have not fit several months We have a slurm system setup for xilnix FPGA builds (hdl) I want to use this for sw builds too What I seem to see is slurm talks about cpus, GPUs and memory etc I am looking for a “run my make file (or shell script) on any available node” In our case we have 3 top level jobs A B and C These can all run in parallel and are independent (ie bootloader, linux kernel, and the Linux root file system via buildroot) Job A (boot) is actually about 7 small builds that are independent I am looking for a means to fork n jobs (ie job A B and C above) across the cluster and wait/collect the std output of those n jobs and the exit status Job A would then fork and build 7 to 8 sub jobs When they are done it would assemble the result into what Xilinix calls boot.bin Job B is a Linux kernel build Job C is buildroot so there are several (n=50) smaller builds ie bash, busybody, and other tools like python for the target agian each of these can be executed in parallel Really do not (cannot) re architect my build to be a slurm only build because it also need to be able to run without slurm ie build everything on my laptop without slurm present In that case the jobs would run serially and take an hour or so the hope is by parallelizing the sw build jobs our overall cycle time will improve It would also be nice if the slurm cluster would adapt to the available nodes automatically Our hope is we can run our lab pcs as duel boot they normally boot windows but we can duel boot them into Linux and they become a compile node and auto join the cluster and the cluster sees them as going off line when somebody reboots the machine back to windows Sent from my iPhone

3 2

Re: sbatch: Node count specification invalid - when only specifying --ntasks
by George Leaver 10 Jun '24

10 Jun '24

Noam, Thanks for the suggestion but no luck: sbatch -p multinode -n 80 --ntasks-per-core=1 --wrap="..." sbatch: error: Batch job submission failed: Node count specification invalid sbatch -p multinode -n 2 -c 40 --ntasks-per-core=1 --wrap="..." sbatch: error: Batch job submission failed: Node count specification invalid sbatch -p multinode -N 2 -n 80 --ntasks-per-core=1 --wrap="..." Submitted batch job I guess that the MinNodes=2 in the partition def is now being enforced somewhat more strictly, or earlier in the submission process, before it can be determined that the request will satisfy the constraint. Regards, George -- George Leaver Research Infrastructure, IT Services, University of Manchester http://ri.itservices.manchester.ac.uk | @UoM_eResearch ________________________________________ From: Bernstein, Noam CIV USN NRL WASHINGTON DC (USA) <noam.bernstein.civ(a)us.navy.mil> Sent: 09 June 2024 19:33 To: George Leaver; slurm-users(a)lists.schedmd.com Subject: Re: sbatch: Node count specification invalid - when only specifying --ntasks It would be a shame to lose this capability. Have you tried adding `--ntasks-per-core` explicitly (but not number of nodes)? Noam

1 0

cpu distribution question
by Alan Stange 08 Jun '24

08 Jun '24

All, I have a very simple slurm cluster. It's just a single system with 2 sockets and 16 cores in each socket. I would like to be able to submit a simple task into this cluster, and to have the cpus assigned to that task allocated round robin across the two sockets. Everything I try is putting all the cpus for this single task on the same socket. I have not specified any CpuBind options in the slurm.conf file. For example, if I try $ srun -c 4 --pty bash I get a shell prompt on the system, and can run $ taskset -cp $$ pid 12345 current affinity list: 0,2,4,6 and I get this same set of cpus no matter what options I try (the cluster is idle with no tasks consuming slots). I've tried various srun command line options like: --hint=compute_bound --hint=memory_bound various --cpubind options -B 2:2 -m block:cyclic and block:fcyclic Note that if I try to allocation 17 cpus, then I do get the 17th cpu allocated on the 2nd socket. What magic incantation is needed to get an allocation where the cpus are chosen round robin across the sockets? Thank you! Alan

3 3

maxrss reported by sachet is wrong
by Feng Zhang 07 Jun '24

07 Jun '24

Hi All, I am having trouble calculating the real RSS memory usage by some kind of users' jobs. Which the sacct returned wrong numbers. Rocky Linux release 8.5, Slurm 21.08 (slurm.conf) ProctrackType=proctrack/cgroup JobAcctGatherType=jobacct_gather/linux The troubling jobs are like: 1. python spawn multithreading 96 threads; 2. Each thread uses SKlearn which again spawns 96 threads using openmp. Which is obviously over running the node, and I want to address it. The node has 300GB RAM, but the "sacct" (and seff) reports 1.2TB MaxRSS(also AveRSS). This does not look correct. I am suspecting that whether the SLurm+jobacct_gather/linux repeatedly sums up the memory used by all these threads, multiple counted the same thing many times. For the openMP part, maybe it is fine for slurm; while for python/multithreading, maybe it can not work well with Slurm for memory accounting? So, if this is the case, maybe 1.2TB/96= 12GB MaxRSS? I want to get the right MaxRSS to report to users. Thanks! Best, Feng

2 1

need to set From: address for slurm
by Vanhorn, Mike 07 Jun '24

07 Jun '24

All, When the slurm daemon is sending out emails, they are coming from “slurm(a)servername.subdomain.domain.edu<mailto:slurm@servername.subdomain.domain.edu>”. This has worked okay in the past, but due to a recent mail server change (over which I have no control whatsoever) this will no longer work. Now, the From: address is going to have to be something like “slurm-servername(a)domain.xn--edu-9o0a , or, at least something that ends in “(a)domain.xn--edu-9o0a (the subdomain being present will cause it to get rejected by the mail server. I am not seeing in the documentation how to change the “From:” address tha slurm uses. Is there a way to do this and I’m just missing it? --- Mike VanHorn Senior Computer Systems Administrator College of Engineering and Computer Science Wright State University 265 Russ Engineering Center 937-775-5157 michael.vanhorn(a)wright.edu

2 2

Re: Not being able to ssh to node with running job
by Ratnasamy, Fritz 06 Jun '24

06 Jun '24

So the squeue issue was resolved and was due to the partition being hidden. Unhiding it solves the problem. However, the ssh issue remains (looks like both were separate issues). The pam_slurm_adopt is working on all the other nodes but not on the new ones. Any idea how to solve this? Best, *Fritz Ratnasamy* Data Scientist Information Technology On Thu, Jun 6, 2024 at 2:11 PM Ratnasamy, Fritz via slurm-users < slurm-users(a)lists.schedmd.com> wrote: > As admin on the cluster, we do not observe any issue on our newly added > gpu nodes. > However, for regular users, they are not seeing their jobs running on > these gpu nodes when running squeue -u <username> ( it is > however showing as running status with sacct) and they are not able to ssh > to these newly added nodes when they have a running job on it. > I am not sure if these 2 are related (not being to ssh to mgpu node with > a running job on it and not listing a job with squeue for a user on the > same node). There are no issues reported on the other nodes. Anyone know > what is happening? > Best, > > *Fritz Ratnasamy* > > Data Scientist > > Information Technology > > > CAUTION: This email has originated outside of University email systems. > Please do not click links or open attachments unless you recognize the > sender and trust the contents as safe. > >

1 0

2025

2024