[slurm-users] Slurm Upgrade Philosophy?
Chris Samuel
chris at csamuel.org
Thu Dec 24 06:57:28 UTC 2020
On Friday, 18 December 2020 10:10:19 AM PST Jason Simms wrote:
> Thanks to several helpful members on this list, I think I have a much better
> handle on how to upgrade Slurm. Now my question is, do most of you upgrade
> with each major release?
We do, though not immediately and not without a degree of testing on our test
systems. One of the big reasons for us upgrading is that we've usually paid
for features in Slurm for our needs (for example in 20.11 that includes
scrontab so users won't be tied to favourite login nodes, as well as the
experimental RPC queue code due to the large numbers of RPCs our systems need
to cope with).
I also keep an eye out for discussions of what other sites find with new
releases too, so I'm following the current concerns about 20.11 and the change
in behaviour for job steps that do (expanding NVIDIA's example slightly):
#SBATCH --exclusive
#SBATCH -N2
srun --ntasks-per-node=1 python multi_node_launch.py
which (if I'm reading the bugs correctly) fails in 20.11 as that srun no
longer gets all the allocated resources, instead just gets the default of
--cpus-per-task=1 instead, which also affects things like mpirun in OpenMPI
built with Slurm support (as it effectively calls "srun orted" and that "orted"
launches the MPI ranks, so in 20.11 it only has access to a single core for
them all to fight over). Again - if I'm interpreting the bugs correctly!
I don't currently have a test system that's free to try 20.11 on, but
hopefully early in the new year I'll be able to test this out to see how much
of an impact this is going to have and how we will manage it.
https://bugs.schedmd.com/show_bug.cgi?id=10383
https://bugs.schedmd.com/show_bug.cgi?id=10489
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list