[slurm-users] Ideal NFS exported StateSaveLocation size.

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Mon Oct 24 08:14:10 UTC 2022


On 10/24/22 09:57, Diego Zuccato wrote:
> Il 24/10/2022 09:32, Ole Holm Nielsen ha scritto:
> 
>  > It is definitely a BAD idea to store Slurm StateSaveLocation on a slow
>  > NFS directory!  SchedMD recommends to use local NVME or SSD disks
>  > because there will be many IOPS to this file system!
> 
> IIUC it does have to be shared between controllers, right?
> 
> Possibly use NVME-backed (or even better NVDIMM-backed) NFS share. Or 
> replica-3 Gluster volume with NVDIMMs for the bricks, for the paranoid  :)

IOPS is the key parameter!  Local NVME or SSD should beat any networked 
storage.  The original question refers to having StateSaveLocation on a 
standard (slow) NFS drive, AFAICT.

I don't know how many people prefer using 2 slurmctld hosts (primary and 
backup)?  I certainly don't do that.  Slurm does have a configurable 
SlurmctldTimeout parameter so that you can reboot the server quickly when 
needed.

It would be nice if people with experience in HA storage for slurmctld 
could comment.

/Ole



More information about the slurm-users mailing list