[slurm-users] "--batch" option of the sbatch command

Uemoto, Tomoki fj2770fj at aa.jp.fujitsu.com
Wed Oct 2 09:01:25 UTC 2019


Thanks for your comment.


I understood as follows.
  In the case of a typical Linux cluster, 'Batchhost' would be the compute node zero of the allocation.
  * From BatchHost at https://slurm.schedmd.com/squeue.html
  Therefore, it is not necessary to specify the '--batch' option.


Regards,
Tomo

-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Marcus Wagner
Sent: Wednesday, October 02, 2019 3:47 PM
To: slurm-users at lists.schedmd.com
Subject: Re: [slurm-users] "--batch" option of the sbatch command

Hi Loris,

as far as i can read the man page, it indicates that the BatchHost (the node, where the batchscript gets executed) will be one of the types asked for by --batch.
Moreover, if the allocation does not have any of the requested --batch-features, the job will start as usually on the first allocated node.

@Tomo

So, if you want to use the broadwell node, just use

-C broadwell


best
Marcus

On 10/2/19 8:23 AM, Loris Bennett wrote:
> Hi Tomo,
>
> "Uemoto, Tomoki" <fj2770fj at aa.jp.fujitsu.com> writes:
>
>> Hi,all
>> I'm working with slurm 18.08.6 on RHEL7.6
>>    manager : 1node
>>    computes: 2nodes (c001:haswell,c002:broadwell)
>>
>> I am checking the --batch option of the sbatch command.
>> The following Features were set for testing.
>>
>>    # scontrol update nodename=c001 Features=haswell
>>    # scontrol update nodename=c002 Features=broadwell
>>
>> And submitted a sleep job.
>>
>> $ cat sleep_60.sh
>> #!/bin/bash
>>
>> #SBATCH -J sleep_60           # Job name
>> #SBATCH -o job.%j.out         # Name of stdout output file (%j expands to jobId)
>>
>> prun sleep 60
>> $
>>
>> $ sbatch --batch=broadwell --constraint="haswell|broadwell" 
>> sleep_60.sh $ squeue -l Wed Oct  2 13:50:40 2019
>>               JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON)
>>                  28    normal sleep_60     test  RUNNING       0:03 1-00:00:00      1 c001
>> $
>>
>> I thought the job would be executed in "c002(broadwell)" by the designation of "--batch=broadwell".
>> However the job was executed in "c001(haswell)"
>> Why isn't it running on "c002(broadwell)" ?
>>
>> Regards,
>> Tomo
> I haven't come across this before, but the documentation seems to 
> indicate that the option '--batch' applies only to the batch step.  
> The actual job step, in your case the 'sleep' can then run on any node 
> satisfying the condition given via the option '--constraint', so in 
> your case either your Broadwell or you Haswell node.
>
> However, I would have thought that, all things being equal, the batch 
> step and the job step would run on the same node.  However, I'm not 
> sure how a constrain with 'or' is resolved if multiple solutions are 
> available.
>
> What happens if you write the constraint as
>
>    --constraint="broadwell|haswell"
>
> ?
>
> Cheers,
>
> Loris
>

--
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list