[slurm-users] Checking memory requirements in job_submit.lua

Hendryk Bockelmann bockelmann at dkrz.de
Fri Jun 15 00:07:25 MDT 2018


Hi,

based on information given in job_submit_lua.c we decided not to use 
pn_min_memory any more. The comment in src says:

/*
  * FIXME: Remove this in the future, lua can't handle 64bit
  * numbers!!!.  Use min_mem_per_node|cpu instead.
*/

Instead we check in job_submit.lua for s,th, like

  if (job_desc.min_mem_per_node ~= nil) and
     (job_desc.min_mem_per_node == 0) then
    slurm.log_user("minimum real mem per node specified as %u", 
job_desc.min_mem_per_node)
  end

For mem-per-cpu things are more confusing. Somehow min_mem_per_cpu = 
2^63 = 0x8000000000000000 if sbatch/salloc does not set --mem-per-cpu, 
instead of being nil as expected !
But one can still check for

  if (job_desc.min_mem_per_cpu == 0) then
    slurm.log_user("minimum real mem per CPU specified as %u", 
job_desc.min_mem_per_cpu)
  end

Maybe this helps a bit.

CU,
Hendryk

On 14.06.2018 19:38, Prentice Bisbal wrote:
> 
> On 06/13/2018 01:59 PM, Prentice Bisbal wrote:
>> In my environment, we have several partitions that are 'general 
>> access', with each partition providing different hardware resources 
>> (IB, large mem, etc). Then there are other partitions that are for 
>> specific departments/projects. Most of this configuration is 
>> historical, and I can't just rearrange the partition layout, etc, 
>> which would allow Slurm to apply it's own logic to redirect jobs to 
>> the appropriate nodes.
>>
>> For the general access partitions, I've decided apply some of this 
>> logic in my job_submit.lua script. This logic would look at some of 
>> the job specifications and change the QOS/Partition for the job as 
>> appropriate. One thing I'm trying to do is have large memory jobs be 
>> assigned to my large memory partition, which is named mque for 
>> historical reasons.
>>
>> To do this, I have added the following logic to my job_submit.lua script:
>>
>> if job_desc.pn_min_mem > 65536 then
>>     slurm.user_msg("NOTICE: Partition switched to mque due to memory 
>> requirements.")
>>     job_desc.partition = 'mque'
>>     job_desc.qos = 'mque'
>>     return slurm.SUCCESS
>> end
>>
>> This works when --mem is specified, doesn't seem to work when 
>> --mem-per-cpu is specified. What is the best way to check this when 
>> --mem-per-cpu is specified instead? Logically, one would have to 
>> calculate
>>
>> mem per node = ntasks_per_node * ( ntasks_per_core / min_mem_per_cpu )
>>
>> Is correct? If so, are there any flaws in the logic/variable names 
>> above? Also, is this quantity automatically calculated in Slurm by a 
>> variable that is accessible by job_submit.lua at this point, or do I 
>> need to calculate this myself?
>>
>>
> 
> I've given up on calculating mem per node when --mem-per-cpu is 
> specified. I was hoping to do this to protect my users from themselves, 
> but the more I think about this, the more this looks like a fool's errand.
> 
> Prentice
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4973 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180615/1e3eded6/attachment.bin>


More information about the slurm-users mailing list