Managed to narrow it down a little bit. Our groups file is pretty large and we have a handful of individual groups that are also quite large as shown below
[root@batch1 ~]# wc /etc/group 6075 6075 349457 /etc/group
[root@batch1 ~]# grep 8xxx2 /etc/group | wc -c 56959
It looks like one of the recent changes (https://github.com/SchedMD/slurm/commit/e1b4cdba70f7f1b5ac5335c572d9c4c79e6e...) migrated the old uid check to the dedicated `gid_from_uid` function. However, an important change with that migration is that we've lost this part of the old loop:
``` if (errno == ERANGE) { buflen *= 2; xrealloc(buf, buflen); continue; } ```
In doing so I think we're hitting a buffer limit. Trimming down our groups enough can get us back to normal operations, but unfortunately that's not a tenable solution.