[slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior
Beatrice Charton
beatrice.charton at criann.fr
Wed Dec 11 16:26:05 UTC 2019
Hi,
We have a strange behaviour of Slurm after updating from 18.08.7 to 18.08.8, for jobs using --exclusive and --mem-per-cpu.
Our nodes have 128GB of memory, 28 cores.
$ srun --mem-per-cpu=30000 -n 1 --exclusive hostname
=> works in 18.08.7
=> doesn’t work in 18.08.8
In 18.08.8 :
- If mem-per-cpu of lower to (full_memory_size_of_node/nb_core_per_node), it works fine (so lower to 4681MB).
- if mem-per-cpu of upper, the job stays pending while the starting date is to now. In slurmctld logs, we see error "backfill: Failed to start JobId=xxxx with reserve avail: Requested nodes are busy” every 30s : so slurmctld tries to start it again and again.
- If I use --exclusive=user, it works.
On an other cluster, I also tried on a 19.05.2 version : I have the same behaviour.
In slurm-19.05.3 version : the job is refused with the error : “srun: error: Unable to allocate resources: Requested node configuration is not available”
I can’t upgrade my production cluster to 19 version… Will it be a patch for 18 version ?
We have a workaround by using --exclusive, --ntasks-per-node and (--ntasks or —nodes).
But sometime, in depopulating mode, asking only ntasks and mem-per-cpu with exclusive allow to change easily a job by increasing the memory per task without knowing the memory size of the node : slurm calculate how many tasks are distributed on the right number of nodes...
Is this new behaviour was intentional ? I can’t find anything about it in release notes (except the patch for 19.05.3).
We have academics and non-academic user on the same cluster, so non-academic users ask of --exclusive.
Thank you in advance for your help,
Sincerely,
Béatrice
--
Béatrice CHARTON | CRIANN
Beatrice.Charton at criann.fr | 745, avenue de l'Université
Tel : +33 (0)2 32 91 42 91 | 76800 Saint Etienne du Rouvray
--- Support : support at criann.fr ---
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2234 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191211/936c084e/attachment.bin>
More information about the slurm-users
mailing list