[slurm-users] fast way for a node to determine its own state?

Wed Mar 21 05:23:10 MDT 2018

Alexis Huxley <alexis.huxley at mpcdf.mpg.de> writes:

>> >Depending on the load on the scheduler, this can be slow. Is there
>> >faster way? Perhaps one that doesn't involve communicating with
>> >the scheduler node? Thanks!
>
> Thanks for the suggestion Ole, but we have something in place that
> we don't want to change at this time. We just need a faster way
> for a node to get its own status.

How about running sinfo or scontrol show job in a cron job on the
controller node, say once every minute, saving the output to a file?
Then the nodes can simply grep in the file.  We use this in our crontab:

*/2 * * * * scontrol --oneliner show node > /cluster/var/node-info.new 2>/dev/null && mv -f /cluster/var/node-info.new /cluster/var/node-info 2>/dev/null

So, every 2. minute, the /cluster/var/node-info is updated (if the
scontrol command succeeds), and the nodes simply grep in that file.

Naturally, /cluster/var must be available on all nodes for this to work,
but we usually notice when the cluster file system goes down anyway. :)

-- 
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 832 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180321/7a88bebb/attachment.sig>