[slurm-users] swap size

Raymond Wan rwan.work at gmail.com
Sun Sep 23 20:53:38 MDT 2018


Hi Chris,


On Mon, Sep 24, 2018 at 7:36 AM Christopher Samuel <chris at csamuel.org> wrote:
> On 24/09/18 00:46, Raymond Wan wrote:
>
> > Hmmmmmm, I'm way out of my comfort zone but I am curious about what
> > happens.  Unfortunately, I don't think I'm able to read kernel code, but
> > someone here
> > (https://stackoverflow.com/questions/31946854/how-does-sigstop-work-in-linux-kernel)
> > seems to suggest that SIGSTOP and SIGCONT moves a process
> > between the runnable and waiting queues.
>
> SIGSTOP is a non-catchable signal that immediately stops a process from
> running, and so it will sit there until either resumed, killed or the
> system is rebooted. :-)
>
> It's like doing ^Z in the shell (which generates SIGTSTP) but isn't
> catchable via signal handlers, so you can't do anything about it (same
> as SIGKILL).
>
> Regarding memory, yes its memory is still used until the process
> either resume and releases it or is killed.  This is why if you want
> to do preemption in this mode you'll want swap so that the kernel has
> somewhere to page out the memory it's using to for the incoming
> process(es).


Ah!!!  Yes, this clears things up for me -- thank you!  Somehow, I
thought what you meant was that SLURM suspends a job and "immediately"
its state is saved.  Then I guessed if SLURM could do that, it ought
to be outside of the main memory + swap space managed by the OS.

But now I see what you mean.  It's just doing it within the signal
communication provided by the OS.

The job gets stopped but it remains in main memory.  That is, it
doesn't "immediately shift to swap space.  But having more swap space
helps to give room for the job to move to so that a currently running
job that is using CPU cycles can run.  Of course, if a HPC has enough
main memory to support all suspended jobs and any other programs that
need to be running when the others are suspended, then I also see why
swap space isn't necessary.

Thank you for taking the time to clarify things!

Ray



More information about the slurm-users mailing list