[slurm-users] Slurmctld caching extended gid?
lhuang at NYGENOME.ORG
Tue Mar 3 17:47:48 UTC 2020
Recently encountered an odd issue where some users were getting sporadic permission denied on certain directories with their stderr/stdout. We realized that this was caused by a change in their nested group permissions on AD several days ago.
At first we thought it was the compute nodes themselves so we cleared sssd, restarted slurmd and even restarted the node completely. This did not resolve the issue. User was able to ssh directly onto the nodes and access the directories, this issue only manifest itself when the jobs were going through slurm.
We later read on slurm.conf:
By default the slurmctld will lookup and send the user_name and extended gids for a job, rather than individual on each node as part of each task launch. Which avoids issues around name service scalability when launching jobs involving many nodes. Using this option will reverse this functionality.
We checked sssd and getent on the slurmctld for the users and they were resolving correctly. The fix was to clear sssd and restart slurmctld.
I’m wondering if the slurmctld does some kind of caching with the extended gids and if there were a better way of handling this?
Luis Huang | Systems Administrator II, Research Computing
New York Genome Center
101 Avenue of the Americas
New York, NY 10013
O: (646) 977-7291
lhuang at nygenome.org<mailto:lhuang at nygenome.org>
This message is for the recipient’s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users