It depends on a number of factors.

 

How do your workloads behave?  Do they do a lot of fork()?  I’ve had cases in the past where users submitted scripts which initially used quite a lot of memory and then used fork() or system() to execute subprocesses.  This of course means that temporarily (between the fork() and the exec() system calls) the job uses twice as much virtual memory, although this does not become real because the pages are copy-on-write.  Something similar happens if the code performs mmap() on large files.

 

Whether this has an impact on you needing swap space is down to what your  sysctl settings are for vm.overcommit_memory and vm.overcommit_ratio

 

If you set vm.overcommit_memory to 2, then the OOM killer will never hit you (because malloc() will fail rather than allocate virtual memory that isn’t available), but cases like the above will tend to fail memory allocations unnecessarily, especially if you don’t have any swap allocated.

 

If you set vm.overcommit_memory to 0 or 1, then you need less swap allocated (possibly even zero) but you run the risk of running out of memory and the OOM killer blowing things up left right and centre.

 

If you provide swap, it only causes a performance impact if the node actually runs out of physical memory and actively starts swapping.

 

So bottom line is I think it depends on what you want the failure mode to be.

 

  1. If you want everything to always run in a very deterministic way at full speed, with failures at the precise moment the memory is exhausted, but with a risk that jobs fail if they’re relying on overcommit (e.g. through fork(0/exec()), then vm.overcommit_memory=2 and no swap
  2. If you want high throughput single threaded stuff to run more smoothly (think:  horrible genomics perl and python scrips, etc), then overcommit_memory=0 and add some swap.  You’ll probably get higher throughput, but things may blow up slight unpredictably from time to time when nodes run out of memory.

 

I now call on someone who understands cgroups properly to explain how this changes when cgroups are in play, because I’m not sure I understand that!

 

Tim

 

 

-- 

Tim Cutts

Scientific Computing Platform Lead

AstraZeneca

 

Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Catalogue |

 

 

From: John Joseph via slurm-users <slurm-users@lists.schedmd.com>
Date: Monday, 4 March 2024 at 07:06
To: slurm-users@lists.schedmd.com <slurm-users@lists.schedmd.com>
Subject: [slurm-users] Is SWAP memory mandatory for SLURM

Dear All,

Good morning

I do have a 4 node SLURM instance up and running.

Like to know if I disable the SWAP memory, will it effect the SLURM performance

Is SWAP a mandatory requirement, I have each node more RAM, if my phsicall RAM is more, is there any need for the SWAP

thanks

Joseph John

 


AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.

This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.com