[slurm-users] srun --mem issue
Loris Bennett
loris.bennett at fu-berlin.de
Thu Dec 8 10:28:13 UTC 2022
Hi Moshe,
Moshe Mergy <moshe.mergy at weizmann.ac.il> writes:
> Hi Loris
>
> indeed https://slurm.schedmd.com/resource_limits.html explains the possibilities of limitations
>
> At present time, I do no limit memory for specific users, but just a global limitation in slurm.conf:
>
> MaxMemPerNode=65536 (for 64 GB limitation)
>
> But... anyway, for my Slurm version 20.02, any user can obtain MORE than 64 GB of memory by using the "--mem=0" option !
>
> So I had to filter this in job_submit.lua
We don't use MaxMemPerNode but define RealMemory for groups of nodes
which have the same amount of RAM. We share the nodes and use
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
So a job can't start on a node if it requests more memory than
available, i.e. more than RealMemory minus memory already committed to
other jobs, even if --mem=0 is specified (I guess).
Cheers,
Loris
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Loris Bennett <loris.bennett at fu-berlin.de>
> Sent: Thursday, December 8, 2022 10:57:56 AM
> To: Slurm User Community List
> Subject: Re: [slurm-users] srun --mem issue
>
> Loris Bennett <loris.bennett at fu-berlin.de> writes:
>
>> Moshe Mergy <moshe.mergy at weizmann.ac.il> writes:
>>
>>> Hi Sandor
>>>
>>> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>>>
>>> if (job_desc.min_mem_per_node == 0 or job_desc.min_mem_per_cpu == 0) then
>>> slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix)
>>> slurm.log_info("%s: ERROR: job %s from user %s rejected because of an invalid (unlimited) memory request.", log_prefix, job_desc.name, job_desc.user_name)
>>> slurm.log_user("Job rejected because of an invalid memory request.")
>>> return slurm.ERROR
>>> end
>>
>> What happens if somebody explicitly requests all the memory, so in
>> Sandor's case --mem=500G ?
>>
>>> Maybe there is a better or nicer solution...
>
> Can't you just use account and QOS limits:
>
> https://slurm.schedmd.com/resource_limits.html
>
> ?
>
> And anyway, what is the use-case for preventing someone using all the
> memory? In our case, if someone really need all the memory, they should be able
> to have it.
>
> However, I do have a chronic problem with users requesting too much
> memory. My approach has been to try to get people to use 'seff' to see
> what resources their jobs in fact need. In addition each month we
> generate a graphical summary of 'seff' data for each user, like the one
> shown here
>
> https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik
>
> and automatically send an email to those with a large percentage of
> resource-inefficient jobs telling them to look at their graphs and
> correct their resource requirements for future jobs.
>
> Cheers,
>
> Loris
>
>>> All the best
>>> Moshe
>>>
>>>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Felho, Sandor <Sandor.Felho at transunion.com>
>>> Sent: Wednesday, December 7, 2022 7:03 PM
>>> To: slurm-users at lists.schedmd.com
>>> Subject: [slurm-users] srun --mem issue
>>>
>>> TransUnion is running a ten-node site using slurm with multiple queues. We have an issue with --mem parameter. The is one user who has read the slurm manual and found the
>>> --mem=0. This is giving the maximum memory on the node (500 GiB's) for the single job. How can I block a --mem=0 request?
>>>
>>> We are running:
>>>
>>> * OS: RHEL 7
>>> * cgroups version 1
>>> * slurm: 19.05
>>>
>>> Thank you,
>>>
>>> Sandor Felho
>>>
>>> Sr Consultant, Data Science & Analytics
>>>
--
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin
More information about the slurm-users
mailing list