[slurm-users] Question about sbatch options: -n, and --cpus-per-task
Thomas M. Payerle
payerle at umd.edu
Thu Mar 24 20:52:47 UTC 2022
Although all three cases ( "-N 1 --cpus-per-task 64 -n 1", "-N 1
--cpus-per-task 1 -n 64", and "-N 1 --cpus-per-task 32 -n 2") will cause
Slurm to allocate 64 cores to the job, there can (and will) be differences
in the other respects.
The variable SLURM_NTASKS will be set to the argument of the -n (aka
--ntasks) argument, and other Slurm variables will differ as well.
More importantly, as others noted, srun will launch $SLURM_NTASKS
processes. The mpirun/mpiexec/etc binaries of most MPI libraries will (if
compiled with support for Slurm) act similarly (and indeed, I believe most
use srun under the hood).
If you are just using sbatch and launching a single process using 64
threads, then the different options are probably equivalent for most intent
and purposes. Similar if you are doing a loop to start 64 single threaded
processes. But those are simplistic cases, and just happen to "work" even
though you are "abusing" the scheduler options. And even the cases wherein
it "works" is subject to unexpected failures (e.g. if one substitutes srun
for sbatch).
The differences are most clear when the -N 1 flag is not given. Generally,
SLURM_NTASKS should be the number of MPI or similar tasks you intend to
start. By default, it is assumed the tasks can support distributed memory
parallelism, so the scheduler by default assumes that it can launch tasks
on different nodes (the -N 1 flag you mentioned would override that). Each
such task is assumed to need --cpus-per-task cores which the scheduler
assumes needs shared memory parallelism (i.e. must be on the same node).
So without the -N 1, "--cpus-per-task 64 -n 1" will require 64 cores on a
single node, whereas "-n 64 --cpus-per-task 1" can result in the job being
assigned 64 cores on a single node to a single core on 64 nodes or any
combination in between with 64 cores. The "--cpus-per-task 32 -n 2" will
either assign one node with 64 cores or 2 nodes with 32 cores each.
As I said, although there are some simple cases where the different cases
are mostly functionally equivalent, I would recommend trying to use the
proper arguments --- "abusing" the arguments might work for a while but
will likely bite you in the end. E.g., the 64 thread case should do
"--cpus-per-task 64", and the launching processes in the loop should
_probably_ do "-n 64" (assuming it can handle the tasks being assigned to
different nodes).
On Thu, Mar 24, 2022 at 3:35 PM David Henkemeyer <david.henkemeyer at gmail.com>
wrote:
> Assuming -N is 1 (meaning, this job needs only one node), then is there a
> difference between any of these 3 flag combinations:
>
> -n 64 (leaving cpus-per-task to be the default of 1)
> --cpus-per-task 64 (leaving -n to be the default of 1)
> --cpus-per-task 32 -n 2
>
> As far as I can tell, there is no functional difference. But if there is
> even a subtle difference, I would love to know what it is!
>
> Thanks
> David
> --
> Sent from Gmail Mobile
>
--
Tom Payerle
DIT-ACIGS/Mid-Atlantic Crossroads payerle at umd.edu
5825 University Research Park (301) 405-6135
University of Maryland
College Park, MD 20740-3831
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220324/2de02c26/attachment.htm>
More information about the slurm-users
mailing list