[slurm-users] [EXTERNAL] Re: Managing shared memory (/dev/shm) usage per job?

Mark Coatsworth mark.coatsworth at vectorinstitute.ai
Thu Apr 7 19:56:15 UTC 2022


Thanks so much Greg! That looks like the solution we want, but like John
I'm also unfamiliar with spank plugins. I guess that will have to change.

Mark

On Wed, Apr 6, 2022 at 7:54 AM John Hanks <griznog at gmail.com> wrote:

> Thanks, Greg! This looks like the right way to do this. I will have to
> stop putting off learning to use spank plugins :)
>
> griznog
>
> On Wed, Apr 6, 2022 at 1:40 AM Greg Wickham <greg.wickham at kaust.edu.sa>
> wrote:
>
>> Hi John, Mark,
>>
>>
>>
>> We use a spank plugin
>> https://gitlab.com/greg.wickham/slurm-spank-private-tmpdir (this was
>> derived from other authors but modified for functionality required on site).
>>
>>
>>
>> It can bind tmpfs mount points to the users cgroup allocation,
>> additionally bind options can be provided (ie: limit memory by size, limit
>> memory by % as supported by tmpfs(5))
>>
>>
>>
>> More information is in the README.md
>>
>>
>>
>>   -Greg
>>
>>
>>
>> On 05/04/2022, 23:17, "slurm-users" <
>> slurm-users-bounces at lists.schedmd.com> wrote:
>>
>>
>>
>> I've thought-experimented this in the past, wanting to do the same thing
>> but haven't found any way to get a/dev/shm or a tmpfs into a job's cgroups
>> to be accounted against the job's allocation. The best I have come up with
>> is creating a per-job tmpfs from a prolog, removing from epilog and setting
>> its size to be some amount of memory that at least puts some restriction on
>> how much damage the job could do. Another alternative is to only allow
>> access to a memory filesystem if the job request is exclusive and takes the
>> whole node. Crude, but effective at least to the point of preventing one
>> job from killing others. If you happen to find a real solution, please post
>> it :)
>>
>>
>>
>> griznog
>>
>>
>>
>> On Mon, Apr 4, 2022 at 10:19 AM Mark Coatsworth <
>> mark.coatsworth at vectorinstitute.ai> wrote:
>>
>> Hi all,
>>
>>
>>
>> We have a GPU cluster (Slurm 19.05.3) that typically runs large PyTorch
>> jobs dependent on shared memory (/dev/shm). When our machines get busy, we
>> often run into a problem where one job exhausts all the shared memory on a
>> system, causing any other jobs landing there to fail immediately.
>>
>>
>>
>> We're trying to figure out a good way to manage this resource. I know
>> that Slurm counts shared memory as part of a job's total memory allocation,
>> so we could use cgroups to OOM kill jobs that exceed this. But that doesn't
>> prevent a user from just making a large request and exhausting it all
>> anyway.
>>
>>
>>
>> Does anybody have any thoughts or experience with setting real limits on
>> shared memory, and either swapping it out or killing the job if this gets
>> exceeded? One thought we had was to use a new generic resource (GRES). This
>> is pretty easy to add in the configuration, but seems like it would be a
>> huge task to write a plugin that actually enforces it.
>>
>>
>>
>> Is this something where the Job Container plugin might be useful?
>>
>>
>>
>> Any thoughts or suggestions would be appreciated,
>>
>>
>>
>> Mark
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220407/e7ad4041/attachment-0001.htm>


More information about the slurm-users mailing list