[slurm-users] swap size

John Hearns hearnsj at googlemail.com
Sat Sep 22 08:04:05 MDT 2018


I would say that, yes, you have a good workflow here with Slurm.
As another aside - is anyone working with suspending and resuming containers?
I see on the Singularity site that suspend/resume in on the roadmap (I
am not talking about checkpointing here).

Also it is worth saying that these days one would be swapping to SSDs
and even better NVRam devices, so the penalties for swapping will be
less.
Warming to my theme what we should be looking at for large memory
machines is tiered memory. Fast DRAM for the actual computations which
are actively being written to. Then slower tiers of cheaper memory.
Diablo had implemented this, I believe they are no longer ctive. lso
there is Optne - which seems to hve gone a bit quiet.
But having read up on Diable the drivers for tiered memory are in the
Linux kernel.

Enough of my ramblings!
MAybe one day you will have a susyem with Tbytes of memory, and only
256 gig of real fast DRAM.

On Sat, 22 Sep 2018 at 07:20, Raymond Wan <rwan.work at gmail.com> wrote:
>
> Hi Ashton,
>
> On Sat, Sep 22, 2018 at 5:34 AM A <andrealphus at gmail.com> wrote:
> > So I'm wondering if 20% is enough, or whether it should scale by the number of single jobs I might be running at any one time. E.g. if I'm running 10 jobs that all use 20 gb of ram, and I suspend, should I need 200 gb of swap?
>
>
> Perhaps I'm a bit clueless here, but maybe someone can correct me if I'm wrong.
>
> I don't think swap space or a swap file is used like that.  If you
> have 256 GB of memory and a 256 GB swap file (I don't suggest this
> size...it just makes my math easier :-) ), then from the point of view
> of the OS, it will appear there is 512 GB of memory.  So, this is
> memory that is used while it is running...for reading in data, etc.
>
> SLURM's ability to suspend jobs must be storing the state in a
> location outside of this 512 GB.  So, you're not helping this by
> allocating more swap.
>
> What you are doing is perhaps allowing more jobs to run concurrently,
> but I would caution against allocating more swap space.  After all,
> disk read/write is much slower than memory.  If you can run 10 jobs
> within 256 GB of memory but 20 jobs within 512 GB of (memory + swap
> space), I think you should do some kind of test to see if it would be
> faster to just let 10 jobs run.  Since disk I/O is slower, I doubt
> you're going to get double the running time.
>
> Personally, I still create swap space, but I agree with John that a
> server with 256 GB of memory shouldn't need any swap at all.  With
> what I run, if it uses more than the amount of memory that I have, I
> tend to stop it and find another computer to run it.  If there isn't
> one, I need to admit I can't do it.  Because once it exceeds the
> amount of main memory, it will start thrashing and, thus, take a lot
> of time to run.  i.e., a day versus a week or more...
>
> On the other hand, we do have servers that double as desktops during
> the day.  An alternative for you to consider is to only allocate 200
> GB of memory to slurm, for example, leaving 56 GB for your own use.
> Yes, this means that, at night, 56 GB of RAM is wasted, but during the
> day, they can also continue running.  Of course, you should set aside
> an amount that is enough for you...56 GB was chosen to make my math
> easier as well.  :-)
>
> If something I said here isn't quite correct, I'm happy to have
> someone correct me...
>
> Ray
>



More information about the slurm-users mailing list