<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Hi everyone,<o:p></o:p></p>
<p class="MsoNormal">First message, I am trying find a good way or multiple ways to limit the usage of jobs per node or use of gpus per node, without blocking a user from submitting them.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Example. We have 10 nodes each with 4 gpus in a partition. We allow a team of 6 people to submit jobs to any or all of the nodes. One job per gpu; thus we can hold a total of 40 jobs concurrently in the partition.<o:p></o:p></p>
<p class="MsoNormal">At the moment: each user usually submit 50- 100 jobs at once. Taking up all gpus, and all other users have to wait in pending..<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">What I am trying to setup is allow all users to submit as many jobs as they wish but only run on 1 out of the 4 gpus per node, or some number out of the total 40 gpus across the entire partition. Using slurm 18.08.3..<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">This is roughly our slurm scripts.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">#SBATCH --job-name=Name # Job name<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --mem=5gb # Job memory request<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --ntasks=1<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --gres=gpu:1<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --partition=PART1<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --time=200:00:00 # Time limit hrs:min:sec<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --output=job _%j.log # Standard output and error log<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --nodes=1<o:p></o:p></p>
<p class="MsoNormal">#SBATCH --qos=high<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">srun -n1 --gres=gpu:1 --exclusive --export=ALL bash -c "NV_GPU=$SLURM_JOB_GPUS nvidia-docker run --rm -e SLURM_JOB_ID=$SLURM_JOB_ID -e SLURM_OUTPUT=$SLURM_OUTPUT --name $SLURM_JOB_ID do_job.sh"<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><b><span style="font-size:14.0pt;color:#1F497D">Thomas Theis</span></b><span style="color:#2E74B5"><o:p></o:p></span></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>