[slurm-users] cgroup limits not created for jobs
raines at nmr.mgh.harvard.edu
Sun Jul 26 19:21:09 UTC 2020
On Sat, 25 Jul 2020 2:00am, Chris Samuel wrote:
> On Friday, 24 July 2020 9:48:35 AM PDT Paul Raines wrote:
>> But when I run a job on the node it runs I can find no
>> evidence in cgroups of any limits being set
>> Example job:
>> mlscgpu1:~$ salloc -n1 -c3 -p batch --gres=gpu:quadro_rtx_6000:1 --mem=1G
>> salloc: Granted job allocation 17
>> mlscgpu1:~$ echo $$
> You're not actually running inside a job at that point unless you've defined
> "SallocDefaultCommand" in your slurm.conf, and I'm guessing that's not the
> case there. You can make salloc fire up an srun for you in the allocation
> using that option, see the docs here:
Thank you so much. This also explains my GPU CUDA_VISIBLE_DEVICES missing
problem in my previous post.
As a new SLURM admin, I am a bit suprised at this default behavior.
Seems like a way for users to game the system by never running srun.
The only limit I suppose that is being really enforced at that point
I guess I need to research srun and SallocDefaultCommand more, but is
there some way to set some kind of separate walltime limit on a
job for the time a salloc has to run srun? It is not clear if one
can make a SallocDefaultCommand that does "srun ..." that really
covers all possibilities.
More information about the slurm-users