[slurm-users] changes in slurm.

Brian Andrus toomuchit at gmail.com
Thu Jul 9 14:26:12 UTC 2020


Navin,

1. you will need to restart slurmctld when you make changes to the 
physical definition of a node. This can be done without affecting 
running jobs.

2. You can have a node in more than one partition. That will not hurt 
anything. Jobs are allocated to nodes, not partitions, the partition is 
used to determine which node(s) and filter/order jobs. You should add 
the node to the new partition, but also leave it in the 'test' 
partition. If you are looking to remove the 'test' partition, set it to 
down and once all the running jobs that are in it finish, then remove it.

Brian Andrus

On 7/8/2020 10:57 PM, navin srivastava wrote:
> Hi Team,
>
> i have 2 small query.because of the lack of testing environment i am 
> unable to test the scenario. working on to set up a test environment.
>
> 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> i found the reason is because there is no RealMemory entry in the node 
> definition of the slurm.
>
> NodeName=Node[1-12] NodeHostname=deda1x[1450-1461] NodeAddr=Node[1-12] 
> Sockets=2 CoresPerSocket=10 State=UNKNOWN
>
> if i add the RealMemory it should be able to pick. So my query here 
> is, is it possible to add RealMemory in the definition anytime while 
> the jobs are in progres and execute the scontrol reconfigure and 
> reload the daemon on client node?  or do we need to take a 
> downtime?(which i don't think so)
>
> 2. Also I would like to know what will happen if some jobs are running 
> in a partition(say test) and I will move the associated node to some 
> other partition(say normal) without draining the node.or if i suspend 
> the job and then change the node partition and will resume the job. I 
> am not deleting the partition here.
>
> Regards
> Navin.
>
>
>
>
>
>
>



More information about the slurm-users mailing list