[slurm-users] SLURM heterogeneous jobs, a little help needed plz
Goetz, Patrick G
pgoetz at math.utexas.edu
Thu Mar 21 20:13:29 UTC 2019
There are 2 kinds of system admins: can do and can't do. You're a can
do; his are can't do.
On 3/21/19 10:26 AM, Prentice Bisbal wrote:
>
> On 3/20/19 1:58 PM, Christopher Samuel wrote:
>> On 3/20/19 4:20 AM, Frava wrote:
>>
>>> Hi Chris, thank you for the reply.
>>> The team that manages that cluster is not very fond of upgrading
>>> SLURM, which I understand.
>
> As a system admin who manages clusters myself, I don't understand this.
> Our job is to provide and maintain resources for our users. Part of that
> maintenance is to provide updates for security, performance, and
> functionality (new features) reasons. HPC has always been a leading-edge
> kind if field, so I feel this is even more important for HPC admins.
>
> Yes, there can be issues caused by updates, but those can be with proper
> planning: Have a plan to do the actual upgrade, have a plan to test for
> issues, and have a plan to revert to an earlier version if issues are
> discovered. This is work, but it's really not all that much work, and
> this is exactly the work we are being paid to do as cluster admins.
>
> From my own experience, I find *not* updating in a timely manner is
> actually more problematic and more work than keep on top of updates. For
> example, where I work now, we still haven't upgraded to CentOS 7, and as
> a result, many basic libraries are older than what many of the
> open-source apps my users need require. As a result, I don't just have
> to install application X, I often have to install up-to-date versions of
> basic libraries like libreadline, libcurses, zlib, etc. And then there
> are the security concerns...
>
> Okay, rant over. I'm sorry. It just bothers me when I hear fellow system
> admins aren't "very fond" of things that I think are a core
> responsbility of our jobs. I take a lot of pride on my job.
>
> --
> Prentice
>
>
More information about the slurm-users
mailing list