[slurm-users] 18.08.4 - batch scripts named "batch" getting rejected.
Prentice Bisbal
pbisbal at pppl.gov
Thu Dec 20 14:37:04 MST 2018
Thanks for confirming the issue.
I found the source of the problem with the help of SchedMD support.
18.08.4 has this bugfix to prevent commands in the cwd from taking
precedence over commands in your PATH:
https://github.com/SchedMD/slurm/commit/ccafaf7b60090155639edcbdbf4a3ab5e36967c6
There is a command /usr/bin/batch which is part of the at package:
$ which batch
/usr/bin/batch
$ rpm -qf /usr/bin/batch
at-3.1.10-49.el6.x86_64
I'm sure just about every Linux system has at installed. As a result,
sbatch batch
becomes
sbatch /usr/bin/batch
The fix is to use a relative or absolute path to your batch file, like
this:
sbatch ./batch
SchedMD support told me to send them the output of sbatch -v batch, when
I ran that command, I saw this in the output:
sbatch: remote command : `/usr/bin/batch'
Once I saw that, I understood what was going on, and SchedMD support
confirmed that was caused by a bugfix in 18.08.4
Prentice
On 12/19/18 2:43 PM, mercan wrote:
> Hi;
>
> We upgraded from 18.08.3 to 18.08.4 and there is a job_submit.lua
> script also. And nearly same issue at our cluster:
>
> $ sbatch batch
> sbatch: error: Batch job submission failed: Unspecified error
> $ mv batch nobatchy
> $ sbatch nobatchy
> Submitted batch job 172174
>
> I hope this helps.
>
> Ahmet M.
>
>
> 19.12.2018 21:54 tarihinde Prentice Bisbal yazdı:
> Once I saw that, I understood what the problem was,
>> Yesterday I upgraded from 18.08.3 to 18.08.4. After the upgrade, I
>> found that batch scripts named "batch" are being rejected. Simply
>> changing the script name fixes the problem. For example:
>>
>> $ sbatch batch
>> sbatch: error: ERROR: A time limit must be specified
>> sbatch: error: Batch job submission failed: Time limit specification
>> required, but not provided
>>
>> $ mv batch different_name
>>
>> $ sbatch different_name
>> Submitted batch job 398507
>>
>> Not sure if this is a bug in sbatch, my job_submit.lua file, or the
>> lua plugin. My job_submit.lua script hasn't been modified since
>> 10/16/2018. I was using 18.08.3 since 11/20, and the user that
>> reported this has used the same batch script to submit jobs prior to
>> the update w/o any issues.
>>
>> Has anyone else upgraded to 18.08.4? If so, can you replicate this
>> issue? I have already reported this to SchedMD. This BugID is 6271.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181220/e2105ff5/attachment.html>
More information about the slurm-users
mailing list