[slurm-users] Set a ramdom offset when starting node health check in SLURM

Micheal Krombopulous MichealKrombopulous at outlook.com
Fri Nov 27 03:32:10 UTC 2020


Call healthcheck with a shell script that starts with:
sleep $[ ( $RANDOM % 10 )  + 1 ], or similar.

M.K.
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of SJTU <weijianwen at sjtu.edu.cn>
Sent: Thursday, November 26, 2020 8:24 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Set a ramdom offset when starting node health check in SLURM

Hi,

   We uses HealthCheckProgram = /usr/sbin/nhc in slurm to check node health every 600 seconds. However, some NHC checks points to a same central resource thus starting these checks simultaneously may lead to false alarms of service degrade.

   Is it possible  to set a random offset to when HealthCheckProgram starts?


Thank you!

Jianwen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201127/9345a147/attachment.htm>


More information about the slurm-users mailing list