[slurm-users] Suspend/Resume, CGROUP and SIGTSTP

Michael Smith msmith at tenstorrent.com
Fri Jan 29 15:58:44 UTC 2021


I’ve setup SLURM to enable pre-emption so that high-priority jobs can take-over resources from lower-priority jobs.  As we use a lot of expensive EDA software, we want to get the best use of these expensive licenses.  The software all uses the FlexLM license manager, and when a job is suspended using SIGTSTP and later resumed with SIGCONT, it releases and then gets the license again allowing another job to use it.

I wrote a simple BASH script to test this behavior with SLURM:

#!/bin/bash

function suspendJob () {
  echo "INFO: Job Suspended"
}

function resumeJob () {
  echo "INFO: Job Resumed"
}

function terminateJob () {
  echo "INFO: Job Terminating..."
}

trap suspendJob   SIGTSTP
trap resumeJob    SIGCONT
trap terminateJob SIGTERM

echo "Burning some compute now...."
yes > /dev/null

When I configure SLURM to use:

     ProctrackType=protrack/pgid

This works as expected when I manually SUSPEND/RESUME/CANCEL a job with each of the corresponding messages appearing in the SLURM StdOut file.

When I change SLURM to use CGROUPS:

     ProctrackType=protrack/cgroup

No messages appear at all in the SLURM StdOut file indicated that the cgroup was thrown into freezer without any signals being sent.  Is this expected behavior and are there ways to “fix” this so that it behaves the same way as using Process Groups?

Maybe this is a moot point since SLURM still shows the License being Used under “scontrol show license” even if a job is suspended, but I figure that problem might be solvable…

Thanks,
Michael



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210129/4c08fb24/attachment.htm>


More information about the slurm-users mailing list