[slurm-users] TmpFS/tmpDisk/TMPDIR

Ransom, Geoffrey M. Geoffrey.Ransom at jhuapl.edu
Wed Jun 24 20:18:11 UTC 2020


Hello
   I defined "TmpDisk=930000" for some machines in slurm 20.02.3 (and TmpFS is set to a local volume slightly bigger than that) and when I run...

  sbatch  --tmp=100000 -w node01 -array=1-100 -wrap="sleep 300"

I ended up with 36 jobs on the machine at a time, 1 per CPU core. I expect the --tmp option to limit it to 9 jobs at a time since the node was defined as having 930000MB of TmpDisk.
If I up the option to "--tmp=1000000" sbatch rejects the job because "Temporary disk specification cannot be satisfied" so this should not be a typo in the config or unit conversion issue.

I would expect this to be treated as a managed resource that could help limit how many jobs land on a machine.

Am I misunderstanding how the "--tmp" option is supposed to work?


And a general question regarding TMPDIR and TmpFS...

I understand that TMPDIR is set to /tmp by slurm regardless of what TmpFS is set to and it is expected that local sites will define TMPDIR in a prolog or plugin if they feel it is necessary. I would expect TmpFS to affect the value of TMPDIR by default.

What is the reasoning behind the decision not to set TMPDIR to something like ${TmpFS}/${SLURM_JOB_ID}?

Is there any documented discussion on slurm's expected use of TmpFS/TMPDIR or the philosophy behind it that I can read?

Thanks


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200624/a0febdf3/attachment.htm>


More information about the slurm-users mailing list