[slurm-users] How can jobs request a minimum available (free) TmpFS disk space?

Michael Jennings mej at lanl.gov
Tue Sep 10 18:41:05 UTC 2019


On Monday, 02 September 2019, at 20:02:57 (+0200),
Ole Holm Nielsen wrote:

> We have some users requesting that a certain minimum size of the
> *Available* (i.e., free) TmpFS disk space should be present on nodes
> before a job should be considered by the scheduler for a set of
> nodes.
> 
> I believe that the "sbatch --tmp=size" option merely refers to the
> TmpFS file system *Size* as configured in slurm.conf, and this is
> *not* what users need.
> 
> For example, a job might require 50 GB of *Available disk space* on
> the TmpFS file system, which may however have only 20 GB out of 100
> GB *Available* as shown by the df command, the rest having been
> consumed by other jobs (present or past).
> 
> However, when we do "scontrol show node <nodename>", only the TmpFS
> file system *Size* is displayed as a "TmpDisk" number, but not the
> *Available* number.
> 
> Question: How can we get slurmd to report back to the scheduler the
> amount of *Available* disk space?  And how can users specify the
> minimum *Available* disk space required by their jobs submitted by
> "sbatch"?
> 
> If this is not feasible, are there other techniques that achieve the
> same goal?  We're currently still at Slurm 18.08.

Hi, Ole!

I'm assuming you are wanting a per-job resolution on this rather than
per-node?  If per-node is good enough, you can of course use NHC to
check this, e.g.:
  * || check_fs_free /tmp 50GB

That doesn't work per-job, though, obviously.  Something that might
work, however, as a temporary work-around for this might be to have
the user run a single NHC command, like this:
  srun --prolog='nhc -e "check_fs_free /tmp 50GB"'

There might be some tweaks/caveats to this since NHC normally runs as
root, but just an idea....  :-)  An even crazier idea would be to set
NHC_LOAD_ONLY=1 in the environment, source /usr/sbin/nhc, and then
execute the shell function `check_fs_free` directly!  :-D

HTH,
Michael

-- 
Michael E. Jennings <mej at lanl.gov>
HPC Systems Team, Los Alamos National Laboratory
Bldg. 03-2327, Rm. 2341     W: +1 (505) 606-0605



More information about the slurm-users mailing list