<html style="direction: ltr;">

  <head>


    <meta http-equiv="content-type" content="text/html; charset=UTF-8">

    <style type="text/css">body p { margin-bottom: 0cm; margin-top: 0pt; } </style>

  </head>

  <body bidimailui-charset-is-forced="true" style="direction: ltr;"

    text="#000000" bgcolor="#FFFFFF">

    <p>Hi,</p>

    <p><br>

    </p>

    <p>I'd like to allow job suspension in my cluster, without the

      "penalty" of RAM utilization. The jobs are sometimes very big and

      can require ~100GB mem on each node. Suspending such a job would

      usually mean almost nothing else can run on the same node, except

      for very small memory jobs.</p>

    <p>Currently the solution is requeue preemption  with or without

      checkpointing.</p>

    <p>I don't want to use swap for running jobs, ever - I'd rather get

      OOM killed than use swap while the job is running.<br>

    </p>

    <p><br>

    </p>

    <p>Is there a way to tell Slurm to allocate swap and use it only for

      suspending, to allow preemption without terminating the jobs?</p>

    <p><br>

    </p>

    <p>The nodes have  ~TB of disk space each, and most jobs never

      utilize any of that (relying on shared storage instead), so local

      disk space is usually not a concern.</p>

    <p><br>

    </p>

    <p>Using swap to store suspended jobs, while slow to freeze and

      thaw, seems o me to be a better localized solution than

      checkpointing and requeuing, allowing the job to resume

      "immediately" (sans disk io times) after the high priority job

      finishes, but if I'm mistaken, please enlighten me.</p>

    <p><br>

    </p>

    <p>I was wandering if simply setting a large swap in linux, while

      setting AllowedSwapSpace=0 in cgroup.conf would work, but I

      suspect the following:</p>

    <p>1. Even suspended, the job still remains in it's cgroup limits,

      and</p>

    <p>2. Which process gets swapped is non-deterministic from my point

      of view - I'm not sure the kernel will swap out the suspended job

      rather than the new job, at least in it's early stages.<br>

    </p>

    <pre class="moz-signature" cols="72">Thanks in advance,

--Dani_L.

</pre>

  </body>

</html>