[slurm-users] require info on merging diff core count nodes under single queue or partition
Sudeep Narayan Banerjee
snbanerjee at iitgn.ac.in
Mon May 18 14:17:57 UTC 2020
Dear Loris: Many thanks for your response.
I did change the IDLE state to UNKNOWN state for NodeName configuration,
then reloaded *slurmctld* and got 2 gpu nodes(gpu3 & 4) as drain mode.
Although the same state I have manually updated to IDLE state.
But how do I change the CoresPerSocket and ThreadsPerCore in the
NodeName parameter?
Thanks & Regards,
Sudeep Narayan Banerjee
On 18/05/20 7:29 pm, Loris Bennett wrote:
> Hi Sudeep,
>
> I am not sure if this is the cause of the problem but in your slurm.conf
> you have
>
> # COMPUTE NODES
>
> NodeName=node[1-10] Sockets=2 CoresPerSocket=8 ThreadsPerCore=1 Procs=16 RealMemory=60000 State=IDLE
> NodeName=gpu[1-2] CPUs=16 Gres=gpu:2 State=IDLE
>
> NodeName=node[11-22] Sockets=2 CoresPerSocket=16 ThreadsPerCore=1 Procs=32 State=IDLE
> NodeName=node[23-24] Sockets=2 CoresPerSocket=20 ThreadsPerCore=1 Procs=40 State=IDLE
> NodeName=gpu[3-4] CPUs=32 Gres=gpu:1 State=IDLE
>
> But if you read
>
> man slurm.conf
>
> you will find the following under the description of the parameter
> "State" for nodes:
>
> "IDLE" should not be specified in the node configuration, but set the
> node state to "UNKNOWN" instead.
>
> Cheers,
>
> Loris
>
>
> Sudeep Narayan Banerjee <snbanerjee at iitgn.ac.in> writes:
>
>> Dear Loris: I am very sorry to address as Support; actually it has
>> become a bad habit for me which I will change. Sincere Apologies!
>>
>> Yes, I have checked while adding hybrid arch of hardware but while
>> executing slurmctld, it shows mismatch in core-count and also the
>> existing 32core nodes goes to Dowm/Drng mode and new 40-core nodes
>> sets to IDLE.
>>
>> Any help/guide to some link will be highly appreciated!
>>
>> Thanks & Regards,
>> Sudeep Narayan Banerjee
>> System Analyst | Scientist B
>> Information System Technology Facility
>> Academic Block 5 | Room 110
>> Indian Institute of Technology Gandhinagar
>> Palaj, Gujarat 382355 INDIA
>> On 18/05/20 6:30 pm, Loris Bennett wrote:
>>
>> Dear Sudeep,
>>
>> Sudeep Narayan Banerjee <snbanerjee at iitgn.ac.in> writes:
>>
>> Dear Support,
>>
>>
>> This mailing list is not really the Slurm support list. It is just the
>> Slurm User Community List, so basically a bunch of people just like you.
>>
>> node11-22 is having 16cores socket x 2 and node23-24 is having 20cores
>> socket x 2. In slurm.conf file (attached), can we merge all the nodes
>> 11-24 (having different core count) and have a single queue or
>> partition name?
>>
>>
>> Yes, you can have a partition consisting of heterogeneous nodes. Have
>> you tried this? Was there a problem?
>>
>> Cheers,
>>
>> Loris
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200518/6551cc75/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pobilfledbpcnghp.png
Type: image/png
Size: 26314 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200518/6551cc75/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: afnhjopabpggngpe.png
Type: image/png
Size: 7441 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200518/6551cc75/attachment-0003.png>
More information about the slurm-users
mailing list