[slurm-users] [External] Re: Granular or dynamic control of partitions?

Tina Friedrich tina.friedrich at it.ox.ac.uk
Mon Aug 7 13:56:57 UTC 2023


Hi Mike,

I moved from Grid Engine to SLURM a couple of years ago & it took me a 
while to get my head around this :)

Yes - and you could also just edit slurm.conf and restart the 
controller. That will not affect running jobs. It's - both in my 
experience and from all I read - absolutely safe to restart any of the 
daemons (slurmd on the nodes, slurmctld, ...) in operation, with jobs 
running, it shouldn't affect them.

(I these days think of a quick change to slurm.conf & a restart of the 
controller daemon as equivalent to a quick qconf command.)

Tina

On 07/08/2023 10:29, Pacey, Mike wrote:
> 
> Hi Feng,
> 
> Thanks - that's what I was looking for, though for my version of SLURM (23.02.0) it looks like the syntax is "scontrol update partition=mypart". Good to know that SLURM can cope with on-the-fly changes without affecting jobs.
> 
> With the "live" config now being different from the static I guess best practice is to ensure slurm.conf's partition definitions also need to be edited?
> 
> Regards,
> Mike
> 
> -----Original Message-----
> From: slurm-users <slurm-users-bounces at lists.schedmd.com> On Behalf Of Feng Zhang
> Sent: Friday, August 4, 2023 7:36 PM
> To: Slurm User Community List <slurm-users at lists.schedmd.com>
> Subject: [External] Re: [slurm-users] Granular or dynamic control of partitions?
> 
> This email originated outside the University. Check before clicking links or attachments.
> 
> You can try command as:
> 
> scontrol update partition mypart  Nodes=node[1-90],ab,ac  #exclude the one you want to remove
> 
> "Changing the Nodes in a partition has no effect upon jobs that have already begun execution."
> 
> 
> Best,
> 
> Feng
> 
> On Fri, Aug 4, 2023 at 10:47 AM Pacey, Mike <m.pacey at lancaster.ac.uk> wrote:
>>
>> Hi folks,
>>
>>
>>
>> We’re currently moving our cluster from Grid Engine to SLURM, and I’m having trouble finding the best way to perform a specific bit of partition maintenance. I’m not sure if I’m simply missing something in the manual or if I need to be thinking in a more SLURM-centric way. My basic question: is it possible to ‘disable’ specific partition/node combinations rather than whole nodes or whole partitions? Here’s an example of the sort of thing I’m looking to do:
>>
>>
>>
>> I have node ‘node1’ with two partitions ‘x’ and ‘y’. I’d like to remove partition ‘y’, but there are currently user jobs in that partition on that node. With Grid Engine, I could disable specific queue instances (ie, I could just run “qmod -d y at node1’ to disable queue/partition y on node1 and wait for the jobs to complete and then remove the partition. That would be the least disruptive option because:
>>
>> Queue/partition ‘y’ on other nodes would be unaffected User jobs for
>> queue/partition ‘x’ would still be able to launch on node1 the whole
>> time
>>
>>
>>
>> I can’t seem to find a functional equivalent of this in SLURM:
>>
>> I can set the whole node to Drain
>> I can set the whole partition to Inactive
>>
>>
>>
>> Is there some way to ‘disable’ partition y just on node1?
>>
>>
>>
>> Regards,
>>
>> Mike
> 



More information about the slurm-users mailing list