4. Since the new version we also see messages like:
[2024-01-17T09:58:48.589] error: Failed to kill program loading user environment
[2024-01-17T09:58:48.590] error: Failed to load current user environment variables
[2024-01-17T09:58:48.590] error: _get_user_env: Unable to get user's local environment, running only with passed environment
The effect of this is that the users run with the wrong environment and can’t load the modules for the software that is needed by their jobs. This leads to many job failures.
The issue appears to be somewhat similar to the one described at:
https://bugs.schedmd.com/show_bug.cgi?id=18561In that case the site downgraded the slurmd clients to 22.05 which got rid of the problems.
We’ve now downgraded the slurmd on the compute nodes to 23.02.7 which also seems to be a workaround for the issue.
Does anyone know of a better solution?
Kind regards,
Fokke Dijkstra
-- Center for Information Technology, University of Groningen
Postbus 11044, 9700 CA Groningen, The Netherlands