[slurm-users] Question about sbatch options: -n, and --cpus-per-task

Thu Mar 24 23:54:31 UTC 2022

Thank you!  We recently converted from pbs, and I was converting “ppn=X” to
“-n X”.  Does it make more sense to convert “ppn=X” to --“cpus-per-task=X”?

Thanks again
David

On Thu, Mar 24, 2022 at 3:54 PM Thomas M. Payerle <payerle at umd.edu> wrote:

> Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1
> --cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause
> Slurm to allocate 64 cores to the job, there can (and will) be differences
> in the other respects.
>
> The variable SLURM_NTASKS will be set to the argument of the -n (aka
> --ntasks) argument, and other Slurm variables will differ as well.
>
> More importantly, as others noted, srun will launch $SLURM_NTASKS
> processes.  The mpirun/mpiexec/etc binaries of most MPI libraries will (if
> compiled with support for Slurm) act similarly (and indeed, I believe most
> use srun under the hood).
>
> If you are just using sbatch and launching a single process using 64
> threads, then the different options are probably equivalent for most intent
> and purposes.  Similar if you are doing a loop to start 64 single threaded
> processes.  But those are simplistic cases, and just happen to "work" even
> though you are "abusing" the scheduler options.  And even the cases wherein
> it "works" is subject to unexpected failures (e.g. if one substitutes srun
> for sbatch).
>
> The differences are most clear when the -N 1 flag is not given.
> Generally, SLURM_NTASKS should be the number of MPI or similar tasks you
> intend to start.  By default, it is assumed the tasks can support
> distributed memory parallelism, so the scheduler by default assumes that it
> can launch tasks on different nodes (the -N 1 flag you mentioned would
> override that).  Each such task is assumed to need --cpus-per-task cores
> which the scheduler assumes needs shared memory parallelism (i.e. must be
> on the same node).
> So without the -N 1, "--cpus-per-task 64 -n 1" will require 64 cores on a
> single node, whereas "-n 64 --cpus-per-task 1" can result in the job being
> assigned 64 cores on a single node to a single core on 64 nodes or any
> combination in between with 64 cores.  The "--cpus-per-task 32 -n 2" will
> either assign one node with 64 cores or 2 nodes with 32 cores each.
>
> As I said, although there are some simple cases where the different cases
> are mostly functionally equivalent, I would recommend trying to use the
> proper arguments --- "abusing" the arguments might work for a while but
> will likely bite you in the end.  E.g., the 64 thread case should do
> "--cpus-per-task 64", and the launching processes in the loop should
> _probably_ do "-n 64" (assuming it can handle the tasks being assigned to
> different nodes).
>
> On Thu, Mar 24, 2022 at 3:35 PM David Henkemeyer <
> david.henkemeyer at gmail.com> wrote:
>
>> Assuming -N is 1 (meaning, this job needs only one node), then is there a
>> difference between any of these 3 flag combinations:
>>
>> -n 64 (leaving cpus-per-task to be the default of 1)
>> --cpus-per-task  64 (leaving -n to be the default of 1)
>> --cpus-per-task 32 -n 2
>>
>> As far as I can tell, there is no functional difference. But if there is
>> even a subtle difference, I would love to know what it is!
>>
>> Thanks
>> David
>> --
>> Sent from Gmail Mobile
>>
>
>
> --
> Tom Payerle
> DIT-ACIGS/Mid-Atlantic Crossroads        payerle at umd.edu
> 5825 University Research Park               (301) 405-6135
> University of Maryland
> College Park, MD 20740-3831
>
-- 
Sent from Gmail Mobile
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220324/8d8ed087/attachment.htm>