[slurm-users] fast way for a node to determine its own state?

Chris Samuel chris at csamuel.org
Sat Mar 24 18:15:10 MDT 2018

On Thursday, 22 March 2018 2:01:02 AM AEDT Michael Jennings wrote:

> As you can see from
> https://github.com/mej/nhc/blob/master/helpers/node-mark-offline#L55
> starting at line #61, NHC uses "sinfo -o '%t %E' -hn $HOSTNAME" to get
> the current node's status.

At ${JIOB-1} our health check scripts were decoupled from Slurm and run from 
cron.  They wrote their status into a file in /dev/shm on successful completion 
so Slurm could just poll that - the idea being to try and reduce the chance 
the check would hang due a system issue and stop slurmd responding.

All the best,
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

More information about the slurm-users mailing list