Hi,
Am I correct in thinking that the history of a *node* as shown by sinfo isn't stored anywhere by Slurm?
Interested to know if slurm can tell me historically when a node was draining,drained etc.
Regards, Steve
Correct. What we do is that we have prometheus collectors running which pull node state so we can graph it over time.
https://github.com/fasrc/prometheus-slurm-exporter
-Paul Edmon-
On 7/28/25 12:48 PM, Steve Kirk via slurm-users wrote:
Hi,
Am I correct in thinking that the history of a *node* as shown by sinfo isn't stored anywhere by Slurm?
Interested to know if slurm can tell me historically when a node was draining,drained etc.
Regards, Steve
Hi
I think the events you're looking for would be tracked in the events tables in the accounting database:
sacctmgr show event where node=<nodename>
-- Michael
On Mon, Jul 28, 2025 at 9:55 AM Steve Kirk via slurm-users < slurm-users@lists.schedmd.com> wrote:
Hi,
Am I correct in thinking that the history of a *node* as shown by sinfo isn't stored anywhere by Slurm?
Interested to know if slurm can tell me historically when a node was draining,drained etc.
Regards, Steve
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
On 7/28/25 9:58 am, Michael Gutteridge via slurm-users wrote:
I think the events you're looking for would be tracked in the events tables in the accounting database:
Be aware that down and drainED nodes are there, but not drainING.
So (unless something has changed in 25.05) until a draining node is empty of jobs it doesn't get recorded in slurmdbd's events table.
All the best, Chris
On 7/29/25 02:17, Christopher Samuel via slurm-users wrote:
On 7/28/25 9:58 am, Michael Gutteridge via slurm-users wrote:
I think the events you're looking for would be tracked in the events tables in the accounting database:
Thanks, "sacctmgr show event where node=<nodename>" is extremely useful for monitoring nodes, and I wasn't aware of this command. I've added some further examples to my Wiki page now at https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_operations/#listing-node-eve...
Be aware that down and drainED nodes are there, but not drainING.
So (unless something has changed in 25.05) until a draining node is empty of jobs it doesn't get recorded in slurmdbd's events table.
So the sacctmgr manual page is not quite correct when it states "event: Events like downed or draining nodes on clusters." I've opened a ticket https://support.schedmd.com/show_bug.cgi?id=23337 suggesting a documentation update.
Best regards, Ole
On 7/29/25 08:58, Ole Holm Nielsen wrote:
On 7/29/25 02:17, Christopher Samuel via slurm-users wrote:
On 7/28/25 9:58 am, Michael Gutteridge via slurm-users wrote:
Thanks, "sacctmgr show event where node=<nodename>" is extremely useful for monitoring nodes, and I wasn't aware of this command. I've added some further examples to my Wiki page now at https:// eur01.safelinks.protection.outlook.com/? url=https%3A%2F%2Fwiki.fysik.dtu.dk%2FNiflheim_system%2FSlurm_operations%2F%23listing-node-events&data=05%7C02%7COle.H.Nielsen%40fysik.dtu.dk%7C6571d26860a24f0755fa08ddce6d5473%7Cf251f123c9ce448e927734bb285911d9%7C0%7C0%7C638893691141746858%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=e6wWHvnvnausmanpKqnTPevTWcafDliBAKvMjYzLhtI%3D&reserved=0
If you're interested in a general node status, I've added the "sacctmgr show event" command to my shownode script: https://github.com/OleHolmNielsen/Slurm_tools/blob/master/nodes/shownode
/Ole