[slurm-users] SLURM heterogeneous jobs, a little help needed plz

Prentice Bisbal pbisbal at pppl.gov
Thu Mar 21 15:26:59 UTC 2019


On 3/20/19 1:58 PM, Christopher Samuel wrote:
> On 3/20/19 4:20 AM, Frava wrote:
>
>> Hi Chris, thank you for the reply.
>> The team that manages that cluster is not very fond of upgrading 
>> SLURM, which I understand.

As a system admin who manages clusters myself, I don't understand this. 
Our job is to provide and maintain resources for our users. Part of that 
maintenance is to provide updates for security, performance, and 
functionality (new features) reasons. HPC has always been a leading-edge 
kind if field, so I feel this is even more important for HPC admins.

Yes, there can be issues caused by updates, but those can be with proper 
planning: Have a plan to do the actual upgrade, have a plan to test for 
issues, and have a plan to revert to an earlier version if issues are 
discovered. This is work, but it's really not all that much work, and 
this is exactly the work we are being paid to do as cluster admins.

 From my own experience, I find *not* updating in a timely manner is 
actually more problematic and more work than keep on top of updates. For 
example, where I work now, we still haven't upgraded to CentOS 7, and as 
a result, many basic libraries are older than what many of the 
open-source apps my users need require. As a result, I don't just have 
to install application X, I often have to install up-to-date versions of 
basic libraries like libreadline, libcurses, zlib, etc. And then there 
are the security concerns...

Okay, rant over. I'm sorry. It just bothers me when I hear fellow system 
admins aren't "very fond" of things that I think are a core 
responsbility of our jobs. I take a lot of pride on my job.

--
Prentice




More information about the slurm-users mailing list