[slurm-users] fast way for a node to determine its own state?

Michael Jennings mej at lanl.gov
Wed Mar 21 09:01:02 MDT 2018


On Wednesday, 21 March 2018, at 12:05:49 (+0100),
Alexis Huxley wrote:

> > >Depending on the load on the scheduler, this can be slow. Is there
> > >faster way? Perhaps one that doesn't involve communicating with
> > >the scheduler node? Thanks!
> 
> Thanks for the suggestion Ole, but we have something in place that
> we don't want to change at this time. We just need a faster way
> for a node to get its own status.

As you can see from
https://github.com/mej/nhc/blob/master/helpers/node-mark-offline#L55
starting at line #61, NHC uses "sinfo -o '%t %E' -hn $HOSTNAME" to get
the current node's status.  I've confirmed with Moe that this is the
Right Way(tm) to do this with SLURM and that any resulting hangs,
loops, or deadlocks would be considered bugs by SchedMD/Moe and fixed
accordingly. :-)

I have not spoken to him specifically about querying scontrol for
information -- NHC only uses scontrol to alter node state -- but I
would imagine the same would apply there.  It's all done via RPC to
slurmctld anyway!

Michael

-- 
Michael E. Jennings <mej at lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341     W: +1 (505) 606-0605



More information about the slurm-users mailing list