Hey Jeffrey,
thanks for this suggestion! This is probably the way to go if one can
find a way to access GRES in the prolog. I read somewhere that people
were calling scontrol to get this information, but this seems a bit
unclean. Anyway, if I find some time I will try it out.
Best,
Tim
On 2/6/24 16:30, Jeffrey T Frey wrote:
> Most of my ideas have revolved around creating file systems on-the-fly
> as part of the job prolog and destroying them in the epilog. The
> issue with that mechanism is that formatting a file system (e.g.
> mkfs.<type>) can be time-consuming. E.g. formatting your local
> scratch SSD as an LVM PV+VG and allocating per-job volumes, you'd
> still need to run a e.g. mkfs.xfs and mount the new file system.
>
>
> ZFS file system creation is much quicker (basically combines the LVM +
> mkfs steps above) but I don't know of any clusters using ZFS to manage
> local file systems on the compute nodes :-)
>
>
> One /could/ leverage XFS project quotas. E.g. for Slurm job 2147483647:
>
>
> *[root@r00n00 /]# mkdir /tmp-alloc/slurm-2147483647*
> *[root@r00n00 /]# xfs_quota -x -c 'project -s -p
> /tmp-alloc/slurm-2147483647 2147483647' /tmp-alloc*
> Setting up project 2147483647 (path /tmp-alloc/slurm-2147483647)...
> Processed 1 (/etc/projects and cmdline) paths for project
> 2147483647 with recursion depth infinite (-1).
> *[root@r00n00 /]# xfs_quota -x -c 'limit -p bhard=1g 2147483647'
> /tmp-alloc*
> *[root@r00n00 /]# cd /tmp-alloc/slurm-2147483647*
> *[root@r00n00 slurm-2147483647]# dd if=/dev/zero of=zeroes bs=5M
> count=1000*
> dd: error writing ‘zeroes’: No space left on device
> 205+0 records in
> 204+0 records out
> 1073741824 bytes (1.1 GB) copied, 2.92232 s, 367 MB/s
>
> :
>
> [root@r00n00 /]# rm -rf /tmp-alloc/*slurm-2147483647*
> [root@r00n00 /]# *xfs_quota -x -c 'limit -p bhard=0 2147483647'
> /tmp-alloc*
>
>
> Since Slurm jobids max out at 0x03FFFFFF (and 2147483647 = 0x7FFFFFFF)
> we have an easy on-demand project id to use on the file system. Slurm
> tmpfs plugins have to do a mkdir to create the per-job directory,
> adding two xfs_quota commands (which run in more or less O(1) time)
> won't extend the prolog by much. Likewise, Slurm tmpfs plugins have to
> scrub the directory at job cleanup, so adding another xfs_quota
> command will not do much to change their epilog execution times. The
> main question is "where does the tmpfs plugin find the quota limit for
> the job?"
>
>
>
>
>
>> On Feb 6, 2024, at 08:39, Tim Schneider via slurm-users
>> <slurm-users(a)lists.schedmd.com> wrote:
>>
>> Hi,
>>
>> In our SLURM cluster, we are using the job_container/tmpfs plugin to
>> ensure that each user can use /tmp and it gets cleaned up after them.
>> Currently, we are mapping /tmp into the nodes RAM, which means that
>> the cgroups make sure that users can only use a certain amount of
>> storage inside /tmp.
>>
>> Now we would like to use of the node's local SSD instead of its RAM
>> to hold the files in /tmp. I have seen people define local storage as
>> GRES, but I am wondering how to make sure that users do not exceed
>> the storage space they requested in a job. Does anyone have an idea
>> how to configure local storage as a proper tracked resource?
>>
>> Thanks a lot in advance!
>>
>> Best,
>>
>> Tim
>>
>>
>> --
>> slurm-users mailing list -- slurm-users(a)lists.schedmd.com
>> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com
>