[slurm-users] cgroup limits not created for jobs
Christopher Samuel
chris at csamuel.org
Mon Jul 27 05:35:08 UTC 2020
On 7/26/20 12:21 pm, Paul Raines wrote:
> Thank you so much. This also explains my GPU CUDA_VISIBLE_DEVICES missing
> problem in my previous post.
I've missed that, but yes, that would do it.
> As a new SLURM admin, I am a bit suprised at this default behavior.
> Seems like a way for users to game the system by never running srun.
This is because by default salloc only requests a job allocation, it
expects you to use srun to run an application on a compute node. But
yes, it is non-obvious (as evidenced by the number of "sinteractive" and
other scripts out there that folks have written not realising about the
SallocDefaultCommand config option - I wrote one back in 2013!).
> The only limit I suppose that is being really enforced at that point
> is walltime?
Well the user isn't on the compute node so there's nothing really else
to enforce.
> I guess I need to research srun and SallocDefaultCommand more, but is
> there some way to set some kind of separate walltime limit on a
> job for the time a salloc has to run srun? It is not clear if one
> can make a SallocDefaultCommand that does "srun ..." that really
> covers all possibilities.
An srun inside of a salloc (just like an sbatch) should not be able to
exceed the time limit for the job allocation.
If it helps this is the SallocDefaultCommand we use for our GPU nodes:
srun -n1 -N1 --mem-per-cpu=0 --gres=gpu:0 -G 0 --gpus-per-task=0
--gpus-per-node=0 --gpus-per-socket=0 --pty --preserve-env --mpi=none
-m block $SHELL
We have to give all those possible permutations to not use various GPU
GRES as otherwise this srun will consume them if the salloc asked for it
and then when the user tries to "srun" their application across the
nodes it will block as there won't be any available on this first node.
Of course the fact that because of this the user can't see the GPUs
without the srun can confuse some people, but it's unavoidable for this
use case.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list