[slurm-users] Slurm Upgrade Philosophy?

Chris Samuel chris at csamuel.org
Thu Dec 24 06:57:28 UTC 2020


On Friday, 18 December 2020 10:10:19 AM PST Jason Simms wrote:

> Thanks to several helpful members on this list, I think I have a much better
> handle on how to upgrade Slurm. Now my question is, do most of you upgrade
> with each major release?

We do, though not immediately and not without a degree of testing on our test 
systems.  One of the big reasons for us upgrading is that we've usually paid 
for features in Slurm for our needs (for example in 20.11 that includes 
scrontab so users won't be tied to favourite login nodes, as well as  the 
experimental RPC queue code due to the large numbers of RPCs our systems need 
to cope with).

I also keep an eye out for discussions of what other sites find with new 
releases too, so I'm following the current concerns about 20.11 and the change 
in behaviour for job steps that do (expanding NVIDIA's example slightly):

#SBATCH --exclusive
#SBATCH -N2
srun --ntasks-per-node=1 python multi_node_launch.py

which (if I'm reading the bugs correctly) fails in 20.11 as that srun no 
longer gets all the allocated resources, instead just gets the default of
--cpus-per-task=1 instead, which also affects things like mpirun in OpenMPI 
built with Slurm support (as it effectively calls "srun orted" and that "orted" 
launches the MPI ranks, so in 20.11 it only has access to a single core for 
them all to fight over).  Again - if I'm interpreting the bugs correctly!

I don't currently have a test system that's free to try 20.11 on, but 
hopefully early in the new year I'll be able to test this out to see how much 
of an impact this is going to have and how we will manage it.

https://bugs.schedmd.com/show_bug.cgi?id=10383
https://bugs.schedmd.com/show_bug.cgi?id=10489

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA






More information about the slurm-users mailing list