[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster

30 May 2024


      Following up on this in case anyone can provide some insight, please.
On Thu, May 16, 2024 at 8:32 AM Dan Healy daniel.t.healy@gmail.com wrote:
...
Hi there, SLURM community,
I swear I've done this before, but now it's failing on a new cluster I'm
deploying. We have 6 compute nodes with 64 cpu each (384 CPU total). When I
run `srun -n 500 hostname`, the task gets queued since there's not 500
available CPU.
Wasn't there an option that allows for this to be run where the first 384
tasks execute, and then the remaining execute when resources free up?
Here's my conf:
# Slurm Cgroup Configs used on controllers and workersslurm_cgroup_config:  CgroupAutomount: yes  ConstrainCores: yes  ConstrainRAMSpace: yes  ConstrainSwapSpace: yes  ConstrainDevices: yes# Slurm conf file settingsslurm_config:  AccountingStorageType: "accounting_storage/slurmdbd"  AccountingStorageEnforce: "limits"  AuthAltTypes: "auth/jwt"  ClusterName: "cluster"  AccountingStorageHost : "{{ hostvars[groups['controller'][0]].ansible_hostname }}"  DefMemPerCPU: 1024  InactiveLimit: 120  JobAcctGatherType: "jobacct_gather/cgroup"  JobCompType: "jobcomp/none"  MailProg: "/usr/bin/mail"  MaxArraySize: 40000  MaxJobCount: 100000  MinJobAge: 3600  ProctrackType: "proctrack/cgroup"  ReturnToService: 2  SelectType: "select/cons_tres"  SelectTypeParameters: "CR_Core_Memory"  SlurmctldTimeout: 30  SlurmctldLogFile: "/var/log/slurm/slurmctld.log"  SlurmdLogFile: "/var/log/slurm/slurmd.log"  SlurmdSpoolDir: "/var/spool/slurm/d"  SlurmUser: "{{ slurm_user.name }}"  SrunPortRange: "60000-61000"  StateSaveLocation: "/var/spool/slurm/ctld"  TaskPlugin: "task/affinity,task/cgroup"  UnkillableStepTimeout: 120
--
Thanks,
Daniel Healy
-- 
Thanks,

Daniel Healy

2026

2025

2024

[slurm-users] Re: Executing srun -n X where X is greater than total CPU in entire cluster