[slurm-users] Slurm unlink error messages -- what do they mean?

David Baker D.J.Baker at soton.ac.uk
Thu Apr 23 08:29:00 UTC 2020


Hello,

We have, rather belatedly, just upgraded to Slurm v19.05.5. On the whole, so far so good -- no major problems. One user has complained that his job now crashes and reports an unlink error. That is..


slurmstepd: error: get_exit_code task 0 died by signal: 9
slurmstepd: error: unlink(/tmp/slurmd/job392987/slurm_script): No such file or directory

I suspect that this message has something to do with the completion of one of the steps in his job. Apparently his job is quite complex with a number of inter-related tasks.

Significantly, we decided to switch from an rpm to a 'build from source' installation. In other words, we did have rpms on each node in the cluster, but now have slurm installed on a global file system. Does anyone have any thoughts regarding the above issue, please? I'm still to see the user's script and so there might be a good logical explanation for the message on inspection.

Best regards,
David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200423/c0f869fc/attachment.htm>


More information about the slurm-users mailing list