[slurm-users] Slurm unlink error messages -- what do they mean?
D.J.Baker at soton.ac.uk
Thu Apr 23 08:29:00 UTC 2020
We have, rather belatedly, just upgraded to Slurm v19.05.5. On the whole, so far so good -- no major problems. One user has complained that his job now crashes and reports an unlink error. That is..
slurmstepd: error: get_exit_code task 0 died by signal: 9
slurmstepd: error: unlink(/tmp/slurmd/job392987/slurm_script): No such file or directory
I suspect that this message has something to do with the completion of one of the steps in his job. Apparently his job is quite complex with a number of inter-related tasks.
Significantly, we decided to switch from an rpm to a 'build from source' installation. In other words, we did have rpms on each node in the cluster, but now have slurm installed on a global file system. Does anyone have any thoughts regarding the above issue, please? I'm still to see the user's script and so there might be a good logical explanation for the message on inspection.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users