[slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

Wed Apr 6 11:49:40 UTC 2022

Thanks, Greg! This looks like the right way to do this. I will have to stop
putting off learning to use spank plugins :)

griznog

On Wed, Apr 6, 2022 at 1:40 AM Greg Wickham <greg.wickham at kaust.edu.sa>
wrote:

> Hi John, Mark,
>
>
>
> We use a spank plugin
> https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this was
> derived from other authors but modified for functionality required on site).
>
>
>
> It can bind tmpfs mount points to the users cgroup allocation,
> additionally bind options can be provided (ie: limit memory by size, limit
> memory by % as supported by tmpfs(5))
>
>
>
> More information is in the README.md
>
>
>
>   -Greg
>
>
>
> On 05/04/2022, 23:17, "slurm-users" <slurm-users-bounces at lists.schedmd.com>
> wrote:
>
>
>
> I've thought-experimented this in the past, wanting to do the same thing
> but haven't found any way to get a/dev/shm or a tmpfs into a job's cgroups
> to be accounted against the job's allocation. The best I have come up with
> is creating a per-job tmpfs from a prolog, removing from epilog and setting
> its size to be some amount of memory that at least puts some restriction on
> how much damage the job could do. Another alternative is to only allow
> access to a memory filesystem if the job request is exclusive and takes the
> whole node. Crude, but effective at least to the point of preventing one
> job from killing others. If you happen to find a real solution, please post
> it :)
>
>
>
> griznog
>
>
>
> On Mon, Apr 4, 2022 at 10:19 AM Mark Coatsworth <
> mark.coatsworth at vectorinstitute.ai> wrote:
>
> Hi all,
>
>
>
> We have a GPU cluster (Slurm 19.05.3) that typically runs large PyTorch
> jobs dependent on shared memory (/dev/shm). When our machines get busy, we
> often run into a problem where one job exhausts all the shared memory on a
> system, causing any other jobs landing there to fail immediately.
>
>
>
> We're trying to figure out a good way to manage this resource. I know that
> Slurm counts shared memory as part of a job's total memory allocation, so
> we could use cgroups to OOM kill jobs that exceed this. But that doesn't
> prevent a user from just making a large request and exhausting it all
> anyway.
>
>
>
> Does anybody have any thoughts or experience with setting real limits on
> shared memory, and either swapping it out or killing the job if this gets
> exceeded? One thought we had was to use a new generic resource (GRES). This
> is pretty easy to add in the configuration, but seems like it would be a
> huge task to write a plugin that actually enforces it.
>
>
>
> Is this something where the Job Container plugin might be useful?
>
>
>
> Any thoughts or suggestions would be appreciated,
>
>
>
> Mark
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220406/62b0478c/attachment-0001.htm>