Hi there, We've updated to 23.11.6 and replaced MUNGE with SACK. Performance and stability have both been pretty good, but we're occasionally seeing this in the slurmctld.log /[2024-05-07T03:50:16.638] error: decode_jwt: token expired at 1715053769 [2024-05-07T03:50:16.638] error: cred_p_unpack: decode_jwt() failed [2024-05-07T03:50:16.638] error: Malformed RPC of type REQUEST_BATCH_JOB_LAUNCH(4005) received [2024-05-07T03:50:16.641] error: slurm_receive_msg_and_forward: [[headnode.internal]:58286] failed: Header lengths are longer than data received [2024-05-07T03:50:16.648] error: service_connection: slurm_receive_msg: Header lengths are longer than data received/ it seems to impact a subset of nodes: jobs get killed and no new ones are allocated. Full functionality can be restored by simply restarting slurmctld first, and then slurmd. Is the token expected to actually expire? I didn't see this possibility mentioned in the docs. The problem occurs on an R&D cloud cluster based on EL9, with a pretty "flat" setup. _headnode_: configless slurmctld, slurmdbd, mariadb, nfsd _elastic compute nodes_: autofs, slurmd *//etc/slurm/slurm.conf/* AuthType=auth/slurm AuthInfo=use_client_ids CredType=cred/slurm *//etc/slurm/slurmdbd.conf/* AuthType=auth/slurm AuthInfo=use_client_ids Has anyone else encountered the same error? Thanks, Fabio -- *Fabio Ranalli* | Principal Systems Administrator Schrödinger, Inc. <https://schrodinger.com>