[slurm-users] Slurm 18.08.8 --mem-per-cpu + --exclusive = strange behavior

Beatrice Charton beatrice.charton at criann.fr
Wed Dec 11 16:26:05 UTC 2019


Hi,

We have a strange behaviour of Slurm after updating from 18.08.7 to 18.08.8, for jobs using --exclusive and --mem-per-cpu.

Our nodes have 128GB of memory, 28 cores.
	$ srun  --mem-per-cpu=30000 -n 1  --exclusive  hostname
=> works in 18.08.7 
=> doesn’t work in 18.08.8

In 18.08.8 :
- If mem-per-cpu of lower to (full_memory_size_of_node/nb_core_per_node), it works fine (so lower to 4681MB).
- if mem-per-cpu of upper, the job stays pending while the starting date is to now. In slurmctld logs, we see error "backfill: Failed to start JobId=xxxx with reserve avail: Requested nodes are busy” every 30s : so slurmctld tries to start it again and again.
- If I use --exclusive=user, it works.

On an other cluster, I also tried on a 19.05.2 version : I have the same behaviour.
In slurm-19.05.3 version : the job is refused with the error : “srun: error: Unable to allocate resources: Requested node configuration is not available”

I can’t upgrade my production cluster to 19 version…  Will it be a patch for 18 version ?

We have a workaround by using --exclusive, --ntasks-per-node and (--ntasks or —nodes). 
But sometime, in depopulating mode, asking only ntasks and mem-per-cpu with exclusive allow to change easily a job by increasing the memory per task without knowing the memory size of the node : slurm calculate how many tasks are distributed on the right number of nodes...

Is this new behaviour was intentional ? I can’t find anything about it in release notes (except the patch for 19.05.3).

We have academics and non-academic user on the same cluster, so non-academic users ask of --exclusive.

Thank you in advance for your help,
Sincerely,

	Béatrice

-- 
Béatrice CHARTON		|              CRIANN
Beatrice.Charton at criann.fr	|  745, avenue de l'Université
Tel : +33 (0)2 32 91 42 91 	| 76800 Saint Etienne du Rouvray
       ---   Support : support at criann.fr   ---

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2234 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191211/936c084e/attachment.bin>


More information about the slurm-users mailing list