[slurm-users] How can jobs request a minimum available (free) TmpFS disk space?

Tue Sep 3 09:14:21 UTC 2019

Hi Bjørn-Helge,

I figured that other sites need the free disk space feature as well :-)

How do you dynamically update your gres=localtmp resource according to 
the current disk free space?  I mean, there is already a TmpFS disk 
space size defined in slurm.conf, so how does your gres=localtmp differ 
from TmpFS?

Quotas on the TmpFS file system could cap how much users can fill up the 
disk, but this doesn't seem to cover my need, which is for Slurm to 
consider the disk free space on the node.

With "scontrol show node xxx" we get the node memory values such as 
"RealMemory=256000 AllocMem=240000 FreeMem=160056".  Similarly it would 
be great to augment the TmpDisk with a FreeDisk parameter, for example 
"TmpDisk=140000 FreeDisk=90000".

Would a Slurm modification be required to include a FreeDisk parameter, 
and then change the meaning of "sbatch --tmp=xxx" to refer to the 
FreeDisk in stead of TmpDisk size?

Thanks,
Ole

On 9/3/19 9:19 AM, Bjørn-Helge Mevik wrote:
> We are facing more or less the same problem.  We have historically
> defined a Gres "localtmp" with the number of GB initially available
> on local disk, and then jobs ask for --gres=localtmp:50 or similar.
> 
> That prevents slurm from allocating jobs on the cluster if they ask for
> more disk than is currently "free" -- in the sense of "not handed out to
> a job".  But it doesn't prevent jobs from using more than they have
> asked for, so the disk might have less (real) free space than slurm
> thinks.
> 
> As far as I can see, cgroups does not support limiting used disk space,
> only amount of IO/s and similar.
> 
> We are currently considering using file system quotas for enforcing
> this.  Our localtmp disk is a separate xfs partition, and the idea is to
> make the prolog set up a "project" disk quota for the job on the
> localtmp file system, and the epilog to remove it again.
> 
> I'm not 100% sure we will make it work, but I'm hopeful.  Fingers
> crossed! :)

On 9/2/19 8:02 PM, Ole Holm Nielsen wrote:> We have some users 
requesting that a certain minimum size of the
 > *Available* (i.e., free) TmpFS disk space should be present on nodes
 > before a job should be considered by the scheduler for a set of nodes.
 >
 > I believe that the "sbatch --tmp=size" option merely refers to the TmpFS
 > file system *Size* as configured in slurm.conf, and this is *not* what
 > users need.
 >
 > For example, a job might require 50 GB of *Available disk space* on the
 > TmpFS file system, which may however have only 20 GB out of 100 GB
 > *Available* as shown by the df command, the rest having been consumed by
 > other jobs (present or past).
 >
 > However, when we do "scontrol show node <nodename>", only the TmpFS file
 > system *Size* is displayed as a "TmpDisk" number, but not the
 > *Available* number.
 >
 > Question: How can we get slurmd to report back to the scheduler the
 > amount of *Available* disk space?  And how can users specify the minimum
 > *Available* disk space required by their jobs submitted by "sbatch"?
 >
 > If this is not feasible, are there other techniques that achieve the
 > same goal?  We're currently still at Slurm 18.08.