[slurm-users] Bug: incorrect output directory fails silently

Marcus Boden mboden at gwdg.de
Thu Jul 8 15:51:42 UTC 2021


I already answered tons of tickets due to this, when our users are 
confused, that the job silently fails.
The problem is, you cannot solve this with a job_submit or cli_filter, 
as you do not know the situation of the file system at job runtime. Or 
even on the node in the end.

At lest the slurmd gives an error, so you could scan the logs for this 
error and maybe use that to automate something.

Best,
Marcus

On 08.07.21 16:58, Jeffrey T Frey wrote:
>> I understand that there is no output file to write an error message to, but it might be good to check the `--output` path during the scheduling, just like `--account` is checked.
>>
>> Does anybody know a workaround to be warned about the error?
> 
> I would make a feature request of SchedMD to fix the issue, then I would write a cli_filter plugin to validate the --output/--error/--input paths as desired until Slurm itself handles it.
> 
> 

-- 
Marcus Vincent Boden, M.Sc.
Arbeitsgruppe eScience, HPC-Team
Tel.:   +49 (0)551 201-2191, E-Mail: mboden at gwdg.de
-------------------------------------------------------------------------
Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
Am Faßberg 11, 37077 Göttingen, URL: https://www.gwdg.de

Support: Tel.: +49 551 201-1523, URL: https://www.gwdg.de/support
Sekretariat: Tel.: +49 551 201-1510, Fax: -2150, E-Mail: gwdg at gwdg.de

Geschäftsführer: Prof. Dr. Ramin Yahyapour
Aufsichtsratsvorsitzender: Prof. Dr. Norbert Lossau
Sitz der Gesellschaft: Göttingen
Registergericht: Göttingen, Handelsregister-Nr. B 598

Zertifiziert nach ISO 9001
-------------------------------------------------------------------------

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5376 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210708/a2b4d4c3/attachment.bin>


More information about the slurm-users mailing list