[slurm-users] sacct thinks slurmctld is not up
Brian Andrus
toomuchit at gmail.com
Thu Jul 18 15:00:50 UTC 2019
All,
I have slurmdbd running and everything is (mostly) happy. It's been working
well for months, but fairly regularly, when I do 'sacctmgr show runaway
jobs', I get:
*sacctmgr: error: Slurmctld running on cluster orion is not up, can't check
running jobs*
if I do 'sacctmgr show cluster', it lists the cluster but has no IP in the
ControlHost field.
slurmctld is most definitely running (on the same system even), but the
only fix I find is to restart slurmctld. Then I can check and there is an
IP in the ControlHost field and I am able to check for runawayjobs.
Is this a known issue? Is there a better fix than restarting slurmctld?
Brian Andrus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190718/40aeb90a/attachment.htm>
More information about the slurm-users
mailing list