[slurm-users] require info on merging diff core count nodes under single queue or partition

Loris Bennett loris.bennett at fu-berlin.de
Tue May 19 05:49:16 UTC 2020


Sudeep Narayan Banerjee <snbanerjee at iitgn.ac.in> writes:

> Dear Loris: Many thanks for your response. 
>
> I did change the IDLE state to UNKNOWN state for NodeName
> configuration, then reloaded slurmctld and got 2 gpu nodes(gpu3 & 4)
> as drain mode. Although the same state I have manually updated to IDLE
> state.

That shouldn't be necessary.  At some point the slurmds on the nodes
should contact the slurmctld and inform it about their actual status.

> But how do I change the CoresPerSocket and ThreadsPerCore in the
> NodeName parameter?

Why do you need to change them if they are correct?  What is the problem
you are seeing?

Whatever that is, What is probably also incorrect is that you are
overspecifing the number of cores/procs

  NodeName=node[11-22] Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 Procs=32 State=IDLE
  NodeName=node[23-24] Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 Procs=40 State=IDLE

If you look at

  man slurm.conf

you will find for 'Procs' or rather 'CPUs'

  CPUs   Number of logical processors on the node (e.g. "2").   CPUs  and  Boards  are
         mutually  exclusive.  It  can be set to the total number of sockets, cores or
         threads. This can be useful when you want to schedule only  the  cores  on  a
         hyper-threaded node.  If CPUs is omitted, it will be set equal to the product
         of Sockets, CoresPerSocket, and ThreadsPerCore.  The default value is 1.

So you should probably omit the 'Procs' specification.

Cheers,

Loris

> *
> *
>
> Thanks & Regards,
> Sudeep Narayan Banerjee
> On 18/05/20 7:29 pm, Loris Bennett wrote:
>
>  Hi Sudeep,
>
> I am not sure if this is the cause of the problem but in your slurm.conf
> you have 
>
>   # COMPUTE NODES
>
>   NodeName=node[1-10] Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 Procs=16  RealMemory=60000  State=IDLE
>   NodeName=gpu[1-2] CPUs=16 Gres=gpu:2 State=IDLE
>
>   NodeName=node[11-22] Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 Procs=32 State=IDLE
>   NodeName=node[23-24] Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 Procs=40 State=IDLE
>   NodeName=gpu[3-4] CPUs=32 Gres=gpu:1 State=IDLE
>
> But if you read
>
>   man slurm.conf
>
> you will find the following under the description of the parameter
> "State" for nodes:
>
>   "IDLE" should not be specified in the node configuration, but set the
>   node state to "UNKNOWN" instead.
>
> Cheers,
>
> Loris
>
>
> Sudeep Narayan Banerjee <snbanerjee at iitgn.ac.in> writes:
>
>  Dear Loris: I am very sorry to address as Support; actually it has
> become a bad habit for me which I will change. Sincere Apologies!
>
> Yes, I have checked while adding hybrid arch of hardware but while
> executing slurmctld, it shows mismatch in core-count and also the
> existing 32core nodes goes to Dowm/Drng mode and new 40-core nodes
> sets to IDLE.
>
> Any help/guide to some link will be highly appreciated!
>
> Thanks & Regards,
> Sudeep Narayan Banerjee
> System Analyst | Scientist B
> Information System Technology Facility
> Academic Block 5 | Room 110
> Indian Institute of Technology Gandhinagar
> Palaj, Gujarat 382355 INDIA
> On 18/05/20 6:30 pm, Loris Bennett wrote:
>
>  Dear Sudeep,
>
> Sudeep Narayan Banerjee <snbanerjee at iitgn.ac.in> writes:
>
>  Dear Support,
>
>
> This mailing list is not really the Slurm support list.  It is just the
> Slurm User Community List, so basically a bunch of people just like you.
>
>  node11-22 is having 16cores socket x 2 and node23-24 is having 20cores
> socket x 2. In slurm.conf file (attached), can we merge all the nodes
> 11-24 (having different core count) and have a single queue or
> partition name?
>
>
> Yes, you can have a partition consisting of heterogeneous nodes.  Have
> you tried this?  Was there a problem?
>
> Cheers,
>
> Loris



More information about the slurm-users mailing list