[slurm-users] Quick hold on all partitions, all jobs

John Hearns hearnsj at gmail.com
Thu Nov 9 00:09:47 MST 2017


 "complete network wide network outage tomorrow night from 10pm across the
whole institute".

                     ^^^^^^

Lachlan, I advise running the following script on all login nodes:

#!/bin/bash
#
cat << EOF > /etc/motd
HPC Managers are in the pub.
At this hour of the day you should also be.

In case of HPC actually on fire, Lachlan can be contacted at:
In Front of the Bar
The Dog and Duck
EOF







On 9 November 2017 at 04:57, Jonathon A Anderson <
jonathon.anderson at colorado.edu> wrote:

> In your situation, where you're blocking user access to the login node, it
> probably doesn't matter. We use DOWN in most events, as INACTIVE would
> prevent new jobs from being queued against the partition at all. DOWN
> allows the jobs to be queued, and just doesn't permit them to run. (In
> either case, HOLDing PENDING jobs is redundant.)
>
> ~jonathon
>
> ________________________________________
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Lachlan Musicman <datakid at gmail.com>
> Sent: Wednesday, November 8, 2017 5:00:12 PM
> To: Slurm User Community List
> Subject: [slurm-users] Quick hold on all partitions, all jobs
>
> The IT team sent an email saying "complete network wide network outage
> tomorrow night from 10pm across the whole institute".
>
> Our plan is to put all queued jobs on hold, suspend all running jobs, and
> turning off the login node.
>
> I've just discovered that the partitions have a state, and it can be set
> to UP, DOWN, DRAIN or INACTIVE.
>
> In this situation - most likely a 4 hour outage with nothing else affected
> - would you mark your partitions DOWN or INACTIVE?
>
> Ostensibly all users should be off the systems (because no network), but
> there's always one that sets an at or cron job or finds that corner case.
>
> Cheers
> L.
>
>
> ------
> "The antidote to apocalypticism is apocalyptic civics. Apocalyptic civics
> is the insistence that we cannot ignore the truth, nor should we panic
> about it. It is a shared consciousness that our institutions have failed
> and our ecosystem is collapsing, yet we are still here — and we are
> creative agents who can shape our destinies. Apocalyptic civics is the
> conviction that the only way out is through, and the only way through is
> together. "
>
> Greg Bloom @greggish https://twitter.com/greggish/
> status/873177525903609857
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171109/44dce2aa/attachment.html>


More information about the slurm-users mailing list