[slurm-users] TmpFS/tmpDisk/TMPDIR
Ransom, Geoffrey M.
Geoffrey.Ransom at jhuapl.edu
Wed Jun 24 20:18:11 UTC 2020
Hello
I defined "TmpDisk=930000" for some machines in slurm 20.02.3 (and TmpFS is set to a local volume slightly bigger than that) and when I run...
sbatch --tmp=100000 -w node01 -array=1-100 -wrap="sleep 300"
I ended up with 36 jobs on the machine at a time, 1 per CPU core. I expect the --tmp option to limit it to 9 jobs at a time since the node was defined as having 930000MB of TmpDisk.
If I up the option to "--tmp=1000000" sbatch rejects the job because "Temporary disk specification cannot be satisfied" so this should not be a typo in the config or unit conversion issue.
I would expect this to be treated as a managed resource that could help limit how many jobs land on a machine.
Am I misunderstanding how the "--tmp" option is supposed to work?
And a general question regarding TMPDIR and TmpFS...
I understand that TMPDIR is set to /tmp by slurm regardless of what TmpFS is set to and it is expected that local sites will define TMPDIR in a prolog or plugin if they feel it is necessary. I would expect TmpFS to affect the value of TMPDIR by default.
What is the reasoning behind the decision not to set TMPDIR to something like ${TmpFS}/${SLURM_JOB_ID}?
Is there any documented discussion on slurm's expected use of TmpFS/TMPDIR or the philosophy behind it that I can read?
Thanks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200624/a0febdf3/attachment.htm>
More information about the slurm-users
mailing list