[slurm-users] GRES Restrictions
novosirj at rutgers.edu
Tue Aug 25 15:00:06 UTC 2020
Sorry about that. “NJT” should have read “but;” apparently my phone decided I was talking about our local transit authority. 😓
On Aug 25, 2020, at 10:30, Ryan Novosielski <novosirj at rutgers.edu> wrote:
I believe that’s done via a QoS on the partition. Have a look at the docs there, and I think “require” is a good key word to look for.
Cgroups should also help with this, NJT I’ve been troubleshooting a problem where that seems not to be working correctly.
|| \\UTGERS, |---------------------------*O*---------------------------
||_// the State | Ryan Novosielski - novosirj at rutgers.edu<mailto:novosirj at rutgers.edu>
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
|| \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark
On Aug 25, 2020, at 10:13, Willy Markuske <wmarkuske at sdsc.edu> wrote:
I'm trying to restrict access to gpu resources on a cluster I maintain for a research group. There are two nodes put into a partition with gres gpu resources defined. User can access these resources by submitting their job under the gpu partition and defining a gres=gpu.
When a user includes the flag --gres=gpu:# they are allocated the number of gpus and slurm properly allocates them. If a user requests only 1 gpu they only see CUDA_VISIBLE_DEVICES=1. However, if a user does not include the --gres=gpu:# flag they can still submit a job to the partition and are then able to see all the GPUs. This has led to some bad actors running jobs on all GPUs that other users have allocated and causing OOM errors on the gpus.
Is it possible, and where would I find the documentation on doing so, to require users to define a --gres=gpu:# to be able to submit to a partition? So far reading the gres documentation doesn't seem to have yielded any word on this issue specifically.
HPC Systems Engineer
Research Data Services
P: (858) 246-5593
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users