Different slurm.conf for master and nodes

Aurélien Vallée vallee.aurelien at gmail.com
Sun Feb 24 05:22:17 UTC 2019


I am in the situation where evaluating the precise memory consumption of jobs beforehand is pretty challenging. So I would like to create a “trust” system, meaning that the requested memory for jobs is taken into account for scheduling, but no action is taken if the job actually breach the limit once running on the node.
I tried to use NoOverMemoryKill but it seems to work only for sbatch, not srun.
So I ended up declaring memory as an un-consumable resource on the slurm.conf of nodes, but not on the master. This seems to work, but looks rather hackish (and slurm complains of the discrepancy in configuration)
Is this a supported practice? Can it bite me later on? Is there a cleaner solution?

