<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hello,</p>
<p>I'm trying to restrict access to gpu resources on a cluster I
maintain for a research group. There are two nodes put into a
partition with gres gpu resources defined. User can access these
resources by submitting their job under the gpu partition and
defining a gres=gpu. <br>
</p>
<p>When a user includes the flag --gres=gpu:# they are allocated the
number of gpus and slurm properly allocates them. If a user
requests only 1 gpu they only see CUDA_VISIBLE_DEVICES=1. However,
if a user does not include the --gres=gpu:# flag they can still
submit a job to the partition and are then able to see all the
GPUs. This has led to some bad actors running jobs on all GPUs
that other users have allocated and causing OOM errors on the
gpus.</p>
<p>Is it possible, and where would I find the documentation on doing
so, to require users to define a --gres=gpu:# to be able to submit
to a partition? So far reading the gres documentation doesn't seem
to have yielded any word on this issue specifically.</p>
<p>Regards,<br>
</p>
<div class="moz-signature">-- <br>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<title></title>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td width="150" valign="top" height="30" align="left">
<p style="font-size:14px;">Willy Markuske</p>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #000;" align="left">
<p style="font-size:12px;">HPC Systems Engineer</p>
</td>
<td rowspan="3" width="180" valign="center" height="42"
align="center"><tt><img moz-do-not-send="false"
src="cid:part1.7F49C23E.16DEF4B8@sdsc.edu" alt=""
width="168" height="48"></tt> </td>
</tr>
<tr>
<td style="border-right: 1px solid #000;" align="left">
<p style="font-size:12px;">Research Data Services</p>
</td>
</tr>
<tr>
<td style="border-right: 1px solid #000;" align="left">
<p style="font-size:12px;">P: (858) 246-5593</p>
</td>
</tr>
</tbody>
</table>
<p> </p>
</div>
</body>
</html>