[slurm-users] EXTERNAL: Re: Memory per CPU

Luecht, Jeff A jeff.luecht at pnc.com
Tue Sep 29 16:13:49 UTC 2020


Here are the particulars asked for.

The following is the pertinent information for our cluster and the job run.  Note: server names, IP addresses and user IDs are anonymized. 

Slurm.conf
======================================================
TaskPlugin=task/affinity

# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory

# Memory Management
DefMemPerCPU=16384
MaxMemPerCPU=16384


NodeName=linuxnode1 NodeAddr=99.999.999.999 CPUs=4 RealMemory=49152 State=UNKNOWN
NodeName=linuxnode2 NodeAddr=99.999.999.999 CPUs=4 RealMemory=49152 State=UNKNOWN
PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP


Job SBATCH file
========================================================
#!/bin/bash
#SBATCH --job-name=HadoopTest                  # Job name
#SBATCH --mail-type=ALL                     # Mail events (NONE, BEGIN, END, FAIL, ALL)
#SBATCH --mail-user=**********     # Where to send mail	
#SBATCH --mem=16gb                           # Job memory request
#SBATCH --time=08:00:00                     # Time limit hrs:min:sec
#SBATCH --output=logs/slurm_test_%j.log          # Standard output and error log
pwd; hostname; date


echo "Running sbatch-HadoopTest script"

kinit ************************
cd /projects
python HiveValidation.py
python ImpalaValidation.py
python SparkTest.py

date


scontrol output
=======================================================================================
JobId=334 JobName=HadoopTest
   UserId=********** GroupId=********** MCS_label=N/A
   Priority=4294901604 Nice=0 Account=(null) QOS=(null)
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=00:00:06 TimeLimit=08:00:00 TimeMin=N/A
   SubmitTime=2020-09-29T10:40:09 EligibleTime=2020-09-29T10:40:09
   AccrueTime=2020-09-29T10:40:09
   StartTime=2020-09-29T10:40:10 EndTime=2020-09-29T18:40:10 Deadline=N/A
   SuspendTime=None SecsPreSuspend=0 LastSchedEval=2020-09-29T10:40:10
   Partition=debug AllocNode:Sid=lpae138a:41279
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=linuxnode2
   BatchHost=linuxnode2
   NumNodes=1 NumCPUs=2 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=2,mem=16G,node=1,billing=2
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=2 MinMemoryNode=16G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/projects/sbatch-HadoopTest.sh
   WorkDir=/projects
   StdErr=/projects/logs/slurm_test_334.log
   StdIn=/dev/null
   StdOut=/projects/logs/slurm_test_334.log
   Power=
   MailUser=********** MailType=BEGIN,END,FAIL,REQUEUE,STAGE_OUT

-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Michael Di Domenico
Sent: Tuesday, September 29, 2020 10:20 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: EXTERNAL: Re: [slurm-users] Memory per CPU

** This email has been received from outside the organization – Think before clicking on links, opening attachments, or responding. **

what leads you to believe that you're getting 2 CPU's instead of 1?
'scontrol show job <id>' would be a helpful first start.

On Tue, Sep 29, 2020 at 9:56 AM Luecht, Jeff A <jeff.luecht at pnc.com> wrote:
>
> I am working on my first ever SLURM cluster build for use as a resource manager in a JupyterHub Development environment.  I have configured the cluster for SelectType of ‘select/con_res’ with DefMemPerCPU and MaxMemPerCPU of 16Gb.  The idea is to essentially provide for jobs that run in a 1 CPU/16Gb chunks.  This is a starting point for us.
>
>
>
> What I am seeing is that when users submit jobs and ask for memory only  – in this case, 16Gb, SLURM actually allocates 2 CPUs, not 1 that I would expect.  Is my understanding of how this particular configuration works incorrect?
>
>
> The contents of this email are the property of PNC. If it was not addressed to you, you have no legal right to read it. If you think you received it in error, please notify the sender. Do not forward or copy without permission of the sender. This message may be considered a commercial electronic message under Canadian law or this message may contain an advertisement of a product or service and thus may constitute a commercial electronic mail message under US law. You may unsubscribe at any time from receiving commercial electronic messages from PNC at http://pages.e.pnc.com/globalunsub/
> PNC, 249 Fifth Avenue, Pittsburgh, PA 15222; pnc.com
>




The contents of this email are the property of PNC. If it was not addressed to you, you have no legal right to read it. If you think you received it in error, please notify the sender. Do not forward or copy without permission of the sender. This message may be considered a commercial electronic message under Canadian law or this message may contain an advertisement of a product or service and thus may constitute a commercial electronic mail message under US law. You may unsubscribe at any time from receiving commercial electronic messages from PNC at http://pages.e.pnc.com/globalunsub/
PNC, 249 Fifth Avenue, Pittsburgh, PA 15222; pnc.com

-----Original Message-----
From: slurm-users [mailto:slurm-users-bounces at lists.schedmd.com] On Behalf Of Michael Di Domenico
Sent: Tuesday, September 29, 2020 10:20 AM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: EXTERNAL: Re: [slurm-users] Memory per CPU

** This email has been received from outside the organization – Think before clicking on links, opening attachments, or responding. **

what leads you to believe that you're getting 2 CPU's instead of 1?
'scontrol show job <id>' would be a helpful first start.

On Tue, Sep 29, 2020 at 9:56 AM Luecht, Jeff A <jeff.luecht at pnc.com> wrote:
>
> I am working on my first ever SLURM cluster build for use as a resource manager in a JupyterHub Development environment.  I have configured the cluster for SelectType of ‘select/con_res’ with DefMemPerCPU and MaxMemPerCPU of 16Gb.  The idea is to essentially provide for jobs that run in a 1 CPU/16Gb chunks.  This is a starting point for us.
>
>
>
> What I am seeing is that when users submit jobs and ask for memory only  – in this case, 16Gb, SLURM actually allocates 2 CPUs, not 1 that I would expect.  Is my understanding of how this particular configuration works incorrect?
>
>
> The contents of this email are the property of PNC. If it was not addressed to you, you have no legal right to read it. If you think you received it in error, please notify the sender. Do not forward or copy without permission of the sender. This message may be considered a commercial electronic message under Canadian law or this message may contain an advertisement of a product or service and thus may constitute a commercial electronic mail message under US law. You may unsubscribe at any time from receiving commercial electronic messages from PNC at http://pages.e.pnc.com/globalunsub/
> PNC, 249 Fifth Avenue, Pittsburgh, PA 15222; pnc.com
>




The contents of this email are the property of PNC. If it was not addressed to you, you have no legal right to read it. If you think you received it in error, please notify the sender. Do not forward or copy without permission of the sender. This message may be considered a commercial electronic message under Canadian law or this message may contain an advertisement of a product or service and thus may constitute a commercial electronic mail message under US law. You may unsubscribe at any time from receiving commercial electronic messages from PNC at http://pages.e.pnc.com/globalunsub/
PNC, 249 Fifth Avenue, Pittsburgh, PA 15222; pnc.com




More information about the slurm-users mailing list