[slurm-users] Set a ramdom offset when starting node health check in SLURM
Micheal Krombopulous
MichealKrombopulous at outlook.com
Fri Nov 27 03:32:10 UTC 2020
Call healthcheck with a shell script that starts with:
sleep $[ ( $RANDOM % 10 ) + 1 ], or similar.
M.K.
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of SJTU <weijianwen at sjtu.edu.cn>
Sent: Thursday, November 26, 2020 8:24 PM
To: slurm-users at lists.schedmd.com <slurm-users at lists.schedmd.com>
Subject: [slurm-users] Set a ramdom offset when starting node health check in SLURM
Hi,
We uses HealthCheckProgram = /usr/sbin/nhc in slurm to check node health every 600 seconds. However, some NHC checks points to a same central resource thus starting these checks simultaneously may lead to false alarms of service degrade.
Is it possible to set a random offset to when HealthCheckProgram starts?
Thank you!
Jianwen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20201127/9345a147/attachment.htm>
More information about the slurm-users
mailing list