[slurm-users] fast way for a node to determine its own state?
b.h.mevik at usit.uio.no
Wed Mar 21 05:23:10 MDT 2018
Alexis Huxley <alexis.huxley at mpcdf.mpg.de> writes:
>> >Depending on the load on the scheduler, this can be slow. Is there
>> >faster way? Perhaps one that doesn't involve communicating with
>> >the scheduler node? Thanks!
> Thanks for the suggestion Ole, but we have something in place that
> we don't want to change at this time. We just need a faster way
> for a node to get its own status.
How about running sinfo or scontrol show job in a cron job on the
controller node, say once every minute, saving the output to a file?
Then the nodes can simply grep in the file. We use this in our crontab:
*/2 * * * * scontrol --oneliner show node > /cluster/var/node-info.new 2>/dev/null && mv -f /cluster/var/node-info.new /cluster/var/node-info 2>/dev/null
So, every 2. minute, the /cluster/var/node-info is updated (if the
scontrol command succeeds), and the nodes simply grep in that file.
Naturally, /cluster/var must be available on all nodes for this to work,
but we usually notice when the cluster file system goes down anyway. :)
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 832 bytes
Desc: not available
More information about the slurm-users