[slurm-users] srun --mem issue

Loris Bennett loris.bennett at fu-berlin.de
Thu Dec 8 10:28:13 UTC 2022


Hi Moshe,

Moshe Mergy <moshe.mergy at weizmann.ac.il> writes:

> Hi Loris
>
> indeed  https://slurm.schedmd.com/resource_limits.html explains the possibilities of limitations
>
> At present time, I do no limit memory for specific users, but just a global limitation in slurm.conf:
>
>   MaxMemPerNode=65536 (for 64 GB limitation) 
>
> But... anyway, for my Slurm version 20.02, any user can obtain MORE than 64 GB of memory by using the "--mem=0" option !
>
> So I had to filter this in  job_submit.lua 

We don't use MaxMemPerNode but define RealMemory for groups of nodes
which have the same amount of RAM.  We share the nodes and use

  SelectType=select/cons_res
  SelectTypeParameters=CR_Core_Memory

So a job can't start on a node if it requests more memory than
available, i.e. more than RealMemory minus memory already committed to
other jobs, even if --mem=0 is specified (I guess).

Cheers,

Loris

> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Loris Bennett <loris.bennett at fu-berlin.de>
> Sent: Thursday, December 8, 2022 10:57:56 AM
> To: Slurm User Community List
> Subject: Re: [slurm-users] srun --mem issue 
>  
> Loris Bennett <loris.bennett at fu-berlin.de> writes:
>
>> Moshe Mergy <moshe.mergy at weizmann.ac.il> writes:
>>
>>> Hi Sandor
>>>
>>> I personnaly block "--mem=0" requests in file job_submit.lua (slurm 20.02):
>>>
>>>   if (job_desc.min_mem_per_node == 0  or  job_desc.min_mem_per_cpu == 0) then
>>>         slurm.log_info("%s: ERROR: unlimited memory requested", log_prefix) 
>>>         slurm.log_info("%s: ERROR: job %s from user %s rejected because of an invalid (unlimited) memory request.", log_prefix, job_desc.name, job_desc.user_name) 
>>>         slurm.log_user("Job rejected because of an invalid memory request.") 
>>>         return slurm.ERROR
>>>    end
>>
>> What happens if somebody explicitly requests all the memory, so in
>> Sandor's case --mem=500G ?
>>
>>> Maybe there is a better or nicer solution...
>
> Can't you just use account and QOS limits:
>
>   https://slurm.schedmd.com/resource_limits.html
>
> ?
>
> And anyway, what is the use-case for preventing someone using all the
> memory? In our case, if someone really need all the memory, they should be able
> to have it. 
>
> However, I do have a chronic problem with users requesting too much
> memory. My approach has been to try to get people to use 'seff' to see
> what resources their jobs in fact need.  In addition each month we
> generate a graphical summary of 'seff' data for each user, like the one
> shown here
>
>   https://www.fu-berlin.de/en/sites/high-performance-computing/Dokumentation/Statistik
>
> and automatically send an email to those with a large percentage of
> resource-inefficient jobs telling them to look at their graphs and
> correct their resource requirements for future jobs.
>
> Cheers,
>
> Loris
>
>>> All the best
>>> Moshe
>>>
>>>
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Felho, Sandor <Sandor.Felho at transunion.com>
>>> Sent: Wednesday, December 7, 2022 7:03 PM
>>> To: slurm-users at lists.schedmd.com
>>> Subject: [slurm-users] srun --mem issue 
>>>  
>>> TransUnion is running a ten-node site using slurm with multiple queues. We have an issue with --mem parameter. The is one user who has read the slurm manual and found the
>>> --mem=0. This is giving the maximum memory on the node (500 GiB's) for the single job. How can I block a --mem=0 request?
>>>
>>> We are running:
>>>
>>> * OS: RHEL 7
>>> * cgroups version 1
>>> * slurm: 19.05
>>>
>>> Thank you,
>>>
>>> Sandor Felho 
>>>
>>> Sr Consultant, Data Science & Analytics 
>>>
-- 
Dr. Loris Bennett (Herr/Mr)
ZEDAT, Freie Universität Berlin



More information about the slurm-users mailing list