[slurm-users] Quick hold on all partitions, all jobs
datakid at gmail.com
Wed Nov 8 17:00:12 MST 2017
The IT team sent an email saying "complete network wide network outage
tomorrow night from 10pm across the whole institute".
Our plan is to put all queued jobs on hold, suspend all running jobs, and
turning off the login node.
I've just discovered that the partitions have a state, and it can be set to
UP, DOWN, DRAIN or INACTIVE.
In this situation - most likely a 4 hour outage with nothing else affected
- would you mark your partitions DOWN or INACTIVE?
Ostensibly all users should be off the systems (because no network), but
there's always one that sets an at or cron job or finds that corner case.
"The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic civics
is the insistence that we cannot ignore the truth, nor should we panic
about it. It is a shared consciousness that our institutions have failed
and our ecosystem is collapsing, yet we are still here — and we are
creative agents who can shape our destinies. Apocalyptic civics is the
conviction that the only way out is through, and the only way through is
*Greg Bloom* @greggish
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users