[slurm-users] nodes lingering in completion
brent.henderson at hpe.com
Mon Mar 27 15:17:54 UTC 2023
Sorry William for the long time in not replying (almost exactly a year!) your note was sent to my spam folder and I lost access to that cluster so it became less of a concern.
I recently got access to another system and had the same issue even with a local epilog with just /bin/true in it. This time I found a big clue in the slurmd.log on one of the nodes:
[2023-03-24T18:43:11.525] debug: Finished wait for job 134161's prolog to complete
[2023-03-24T18:43:56.573] Warning: Note very large processing time from slurm_getpwuid_r: usec=45048016 began=18:43:11.525
[2023-03-24T18:43:56.573] debug: [job 134161] attempting to run epilog [/tmp/epilog.sh]
[2023-03-24T18:43:56.581] Warning: Note very large processing time from prep_g_epilog: usec=45055597 began=18:43:11.525
[2023-03-24T18:43:56.581] epilog for job 134161 ran for 45 seconds
Note that almost the entire time is in that slurm_getpwuid_r call. Both the last cluster and this one use a single NIS server to serve the user accounts. Anyway, the resolution for my system is to make the account info local to each system. For ‘real’ systems, they will probably want to spread the load across multiple NIS servers, but I’m fine on my system with local account information.
Can anyone shed some light on why slurm is parsing the passwd file for the invoking user if the system epilog is going to be run as root anyway? Maybe that is in there if the user has their own epilog?
PS: Kudos to whomever put the wrapper to check the duration of the slurm_getpwuid_r call!
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of William Brown
Sent: Friday, April 1, 2022 12:33 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] nodes lingering in completion
To process the epilog a Bash process must be created so perhaps look at .bashrc.
Try timing running the epilog yourself on a compute node. I presume it is owned by an account local to the compute nodes, not a directory service account?
On Fri, 1 Apr 2022, 17:25 Henderson, Brent, <brent.henderson at hpe.com<mailto:brent.henderson at hpe.com>> wrote:
Hi slurm experts -
I’ve gotten temporary access to a cluster with 1k nodes - so of course I setup slurm on it (v20.11.8). ☺ Small jobs are fine and go back to idle rather quickly. Jobs that use all the nodes will have some ‘linger’ in the completing state for over a minute while others may take less time - but still noticeable.
Reading some older posts, I see that the epilog is a typical cause for this so I removed it from the config file and indeed, nodes very quickly go back to the idle state after the job completes. I then created an epilog on each node in /tmp that just contained the bash header and exit 0 and changed my run to be just: ‘salloc -N 1024 sleep 10’. Even with this very simple command and epilog, the nodes exhibit the ‘lingering’ behavior before returning to idle.
Looking in the slurmd log for one of the nodes that took >60s to go back to idle, I see this:
[2022-03-31T20:57:44.158] Warning: Note very large processing time from prep_epilog: usec=75087286 began=20:56:29.070
[2022-03-31T20:57:44.158] epilog for job 43226 ran for 75 seconds
I tried upping the debug level on the slurmd side but didn’t see anything useful.
So, I guess I have a couple questions:
- anyone seen this behavior before and know a fix? :)
- might this issue be resolved in 21.08? (Didn’t see anything in the release note that talked about the epilog.)
- thoughts on how to collect some additional information on what might be happening on the system to slow down the epilog?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users