Hi,

This may help.

 

job_container.conf

--------

# All nodes have /localscratch but for some_nodes2 it is mounted as NVME.

AutoBasePath=true

BasePath=/localscratch

Shared=true

# Some nodes have /localscratch1 configured, as localscratch is actually taken by a valid local device setup

NodeName=some_nodes[9995-9999] AutoBasePath=true BasePath=/localscratch1 Shared=true

# Some_nodes2 where we want to use local NVME mounted at localscratch. If this is nvidia kit, we may not want /dev/shm so explicit /tmp

NodeName=some_nodes2[7770-7777] Dirs="/tmp" AutoBasePath=true BasePath=/localscratch Shared=true





David

 

----------

David Simpson - Senior Systems Engineer

ARCCA, Redwood Building,

King Edward VII Avenue,

Cardiff, CF10 3NB                                                                              

 

David Simpson - peiriannydd uwch systemau

ARCCA, Adeilad Redwood,

King Edward VII Avenue,

Caerdydd, CF10 3NB

 

 

From: Jake Longo via slurm-users <slurm-users@lists.schedmd.com>
Date: Wednesday, 4 September 2024 at 16:19
To: slurm-users@schedmd.com <slurm-users@schedmd.com>
Subject: [slurm-users] Configuration for nodes with different TmpFs locations and TmpDisk sizes

External email to Cardiff University - Take care when replying/opening attachments or links.

Nid ebost mewnol o Brifysgol Caerdydd yw hwn - Cymerwch ofal wrth ateb/agor atodiadau neu ddolenni.

 

Hi,

 

We have a number of machines in our compute cluster that have larger disks available for local data. I would like to add them to the same partition as the rest of the nodes but assign them a larger TmpDisk value which would allow users to request a larger tmp and land on those machines.

 

The main hurdle is that (for reasons beyond my control) the larger local disks are on a special mount point /largertmp whereas the rest of the compute cluster uses the vanilla /tmp. I can't see an obvious way to make this work as the TmpFs value appears to be global only and attempting to set TmpDisk to a value larger than TmpFs for those nodes will put the machine into an invalid state.

 

I couldn't see any similar support tickets or anything in the mail archive but I wouldn't have thought it would be that unusual to do this.

 

Thanks in advance!

Jake