[slurm-users] Testing/evaluating new versions of slurm (19.05 in this case)

Thu May 16 15:25:22 UTC 2019

Hello,

Following the various postings regarding slurm 19.05 I thought it was an opportune time to send this question to the forum.

Like others I'm awaiting 19.05 primarily due to the addition of the XFACTOR priority setting, but due to other new/improved features as well. I'm interested to hear how other admins/groups test (and stress) new versions of slurm. That is, how do admins test a new version with a (a) realistic workload and (b) with sufficient hardware resources with taking too many hardware resources from their production cluster and/or annoying too many users? I understand that it is possible to emulate a large cluster on SMP nodes by firing up many slurm processes on those nodes, for example.

I have been experimenting with a slurm simulator (https://github.com/ubccr-slurm-simulator/slurm_sim_tools/blob/master/doc/slurm_sim_manual.Rmd) using historical job data, however that simulator is based on an old version of slurm and (to be honest) it's slightly unreliable for serious study. It's certainly only useful for broad brush analysis, at the most.

Please let me have your thoughts -- they would be appreciated.

Best regards,
David

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190516/afc48e3d/attachment.html>