[slurm-users] %x in job names
Bill Barth
bbarth at tacc.utexas.edu
Fri May 28 20:03:32 UTC 2021
We noticed today that a %x anywhere in a job name like
#SBATCH -J abcdefghijklmnopqrstuvw%xyz
Etc. will send scontrol (and maybe other %x-respecting programs) into an infinite loop. We had a user cron launching 'scontrol show job ######' regularly on a system and it was just going off the rails and eating resources until we killed it. The Slurm version 18.08.4 release email says that
-- Expand %x in job name in 'scontrol show job'.
...so I wonder if that is armored to look for self-refferential calls. I haven't looked at the code, myself. I thought I'd give a heads up. I don't think our user was being malicious, and their actual -J was
#SBATCH -J sd-PBEpvw9040%x
Probably a hash and probably machine-generated/unlucky.
I hope this helps and is actually a problem report. We're on 18.08.5, so I hope we don't have to go backwards to stop this error.
Best regards,
Bill.
--
Bill Barth, Ph.D., Director, FutureTechnologies
bbarth at tacc.utexas.edu | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
More information about the slurm-users
mailing list