[slurm-users] nodes lingering in completion

Mon Mar 27 15:17:54 UTC 2023

Sorry William for the long time in not replying (almost exactly a year!) your note was sent to my spam folder and I lost access to that cluster so it became less of a concern.

I recently got access to another system and had the same issue even with a local epilog with just /bin/true in it.  This time I found a big clue in the slurmd.log on one of the nodes:

[2023-03-24T18:43:11.525] debug:  Finished wait for job 134161's prolog to complete
[2023-03-24T18:43:56.573] Warning: Note very large processing time from slurm_getpwuid_r: usec=45048016 began=18:43:11.525
[2023-03-24T18:43:56.573] debug:  [job 134161] attempting to run epilog [/tmp/epilog.sh]
[2023-03-24T18:43:56.581] Warning: Note very large processing time from prep_g_epilog: usec=45055597 began=18:43:11.525
[2023-03-24T18:43:56.581] epilog for job 134161 ran for 45 seconds

Note that almost the entire time is in that slurm_getpwuid_r call.  Both the last cluster and this one use a single NIS server to serve the user accounts.  Anyway, the resolution for my system is to make the account info  local to each system.  For ‘real’ systems, they will probably want to spread the load across multiple NIS servers, but I’m fine on my system with local account information.

Can anyone shed some light on why slurm is parsing the passwd file for the invoking user if the system epilog is going to be run as root anyway?  Maybe that is in there if the user has their own epilog?

Thanks,

Brent

PS: Kudos to whomever put the wrapper to check the duration of the slurm_getpwuid_r call!

From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of William Brown
Sent: Friday, April 1, 2022 12:33 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] nodes lingering in completion

To process the epilog a Bash process must be created so perhaps look at .bashrc.

Try timing running the epilog yourself on a compute node.  I presume it is owned by an account local to the compute nodes, not a directory service account?

William

On Fri, 1 Apr 2022, 17:25 Henderson, Brent, <brent.henderson at hpe.com<mailto:brent.henderson at hpe.com>> wrote:
Hi slurm experts -

I’ve gotten temporary access to a cluster with 1k nodes - so of course I setup slurm on it (v20.11.8).  ☺  Small jobs are fine and go back to idle rather quickly.  Jobs that use all the nodes will have some ‘linger’ in the completing state for over a minute while others may take less time - but still noticeable.

Reading some older posts, I see that the epilog is a typical cause for this so I removed it from the config file and indeed, nodes very quickly go back to the idle state after the job completes.  I then created an epilog on each node in /tmp that just contained the bash header and exit 0 and changed my run to be just: ‘salloc -N 1024  sleep 10’.  Even with this very simple command and epilog, the nodes exhibit the ‘lingering’ behavior before returning to idle.

Looking in the slurmd log for one of the nodes that took >60s to go back to idle, I see this:

[2022-03-31T20:57:44.158] Warning: Note very large processing time from prep_epilog: usec=75087286 began=20:56:29.070
[2022-03-31T20:57:44.158] epilog for job 43226 ran for 75 seconds

I tried upping the debug level on the slurmd side but didn’t see anything useful.

So, I guess I have a couple questions:
- anyone seen this behavior before and know a fix?  :)
- might this issue be resolved in 21.08?  (Didn’t see anything in the release note that talked about the epilog.)
- thoughts on how to collect some additional information on what might be happening on the system to slow down the epilog?

Thanks,

Brent

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230327/b8bf1046/attachment-0001.htm>