[slurm-users] fast way for a node to determine its own state?
Chris Samuel
chris at csamuel.org
Sat Mar 24 18:15:10 MDT 2018
On Thursday, 22 March 2018 2:01:02 AM AEDT Michael Jennings wrote:
> As you can see from
> https://github.com/mej/nhc/blob/master/helpers/node-mark-offline#L55
> starting at line #61, NHC uses "sinfo -o '%t %E' -hn $HOSTNAME" to get
> the current node's status.
At ${JIOB-1} our health check scripts were decoupled from Slurm and run from
cron. They wrote their status into a file in /dev/shm on successful completion
so Slurm could just poll that - the idea being to try and reduce the chance
the check would hang due a system issue and stop slurmd responding.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
More information about the slurm-users
mailing list