[slurm-users] srun --mem issue
Moshe Mergy
moshe.mergy at weizmann.ac.il
Thu Dec 8 09:36:28 UTC 2022
Hi Loris
indeed https://slurm.schedmd.com/resource_limits.html explains the possibilities of limitations
At present time, I do no limit memory for specific users, but just a global limitation in slurm.conf:
MaxMemPerNode=65536 (for 64 GB limitation)
But... anyway, for my Slurm version 20.02, any user can obtain MORE than 64 GB of memory by using the "--mem=0" option !
So I had to filter this in job_submit.lua
________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Loris Bennett <loris.bennett at fu-berlin.de>
Sent: Thursday, December 8, 2022 10:57:56 AM
To: Slurm User Community List
Subject: Re: [slurm-users] srun --mem issue
Loris Bennett <loris.bennett at fu-berlin.de> writes:
> Moshe Mergy <moshe.mergy at weizmann.ac.il> writes:
>
>> Hi Sandor
>>
>> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>>
>> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then
>> slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix)
>> slurm.log_info("%s: ERROR: job %s from user %s rejected because of an invalid (unlimited) memory request.", log_prefix, job_desc.name, job_desc.user_name)
>> slurm.log_user("Job rejected because of an invalid memory request.")
>> return slurm.ERROR
>> end
>
> What happens if somebody explicitly requests all the memory, so in
> Sandor's case --mem=500G ?
>
>> Maybe there is a better or nicer solution...
Can't you just use account and QOS limits:
https://slurm.schedmd.com/resource_limits.html
?
And anyway, what is the use-case for preventing someone using all the
memory? In our case, if someone really need all the memory, they should be able
to have it.
However, I do have a chronic problem with users requesting too much
memory. My approach has been to try to get people to use 'seff' to see
what resources their jobs in fact need. In addition each month we
generate a graphical summary of 'seff' data for each user, like the one
shown here
https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik
and automatically send an email to those with a large percentage of
resource-inefficient jobs telling them to look at their graphs and
correct their resource requirements for future jobs.
Cheers,
Loris
>> All the best
>> Moshe
>>
>> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Felho, Sandor <Sandor.Felho at transunion.com>
>> Sent: Wednesday, December 7, 2022 7:03 PM
>> To: slurm-users at lists.schedmd.com
>> Subject: [slurm-users] srun --mem issue
>>
>> TransUnion is running a ten-node site using slurm with multiple queues. We have an issue with --mem parameter. The is one user who has read the slurm manual and found the
>> --mem=0. This is giving the maximum memory on the node (500 GiB's) for the single job. How can I block a --mem=0 request?
>>
>> We are running:
>>
>> * OS: RHEL 7
>> * cgroups version 1
>> * slurm: 19.05
>>
>> Thank you,
>>
>> Sandor Felho
>>
>> Sr Consultant, Data Science & Analytics
>>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221208/f3e9d11a/attachment.htm>
More information about the slurm-users
mailing list