[slurm-users] 18.08.4 - batch scripts named "batch" getting rejected.

Prentice Bisbal pbisbal at pppl.gov
Thu Dec 20 14:37:04 MST 2018


Thanks for confirming the issue.

I found the source of the problem with the help of  SchedMD support. 
18.08.4 has this bugfix to prevent commands in the cwd from taking 
precedence over commands in your PATH:

https://github.com/SchedMD/slurm/commit/ccafaf7b60090155639edcbdbf4a3ab5e36967c6

There is a command /usr/bin/batch which is part of the at package:

$ which batch
/usr/bin/batch

$ rpm -qf /usr/bin/batch
at-3.1.10-49.el6.x86_64

I'm sure just about every Linux system has at installed. As a result,

sbatch batch

becomes

sbatch /usr/bin/batch

The fix is to use a relative or absolute path to your batch file, like 
this:

sbatch ./batch

SchedMD support told me to send them the output of sbatch -v batch, when 
I ran that command, I saw this in the output:

sbatch: remote command    : `/usr/bin/batch'

Once I saw that, I understood what was going on, and SchedMD support 
confirmed that was caused by a bugfix in 18.08.4

Prentice

On 12/19/18 2:43 PM, mercan wrote:
> Hi;
>
> We upgraded from 18.08.3 to 18.08.4 and there is a job_submit.lua 
> script also. And nearly same issue at our cluster:
>
> $ sbatch batch
> sbatch: error: Batch job submission failed: Unspecified error
> $ mv batch nobatchy
> $ sbatch nobatchy
> Submitted batch job 172174
>
> I hope this helps.
>
> Ahmet M.
>
>
> 19.12.2018 21:54 tarihinde Prentice Bisbal yazdı:
> Once I saw that, I understood what the problem was,
>> Yesterday I upgraded from 18.08.3 to 18.08.4. After the upgrade, I 
>> found that batch scripts named "batch" are being rejected. Simply 
>> changing the script name fixes the problem. For example:
>>
>> $ sbatch batch
>> sbatch: error: ERROR: A time limit must be specified
>> sbatch: error: Batch job submission failed: Time limit specification 
>> required, but not provided
>>
>> $ mv batch different_name
>>
>> $ sbatch different_name
>> Submitted batch job 398507
>>
>> Not sure if this is a bug in sbatch, my job_submit.lua file, or the 
>> lua plugin. My job_submit.lua script hasn't been modified since 
>> 10/16/2018. I was using 18.08.3 since 11/20, and the user that 
>> reported this has used the same batch script to submit jobs prior to 
>> the update w/o any issues.
>>
>> Has anyone else upgraded to 18.08.4? If so, can you replicate this 
>> issue? I have already reported this to SchedMD. This BugID is 6271.
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181220/e2105ff5/attachment.html>


More information about the slurm-users mailing list