[slurm-users] changes in slurm.

Stephan Roth stephan.roth at ee.ethz.ch
Fri Jul 10 15:00:47 UTC 2020


It's recommended to round RealMemory down to the next lower gigabyte 
value to prevent nodes from entering a drain state after rebooting with 
a bios- or kernel-update.

Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node 
configuration"

Stephan

On 10.07.20 13:46, Sarlo, Jeffrey S wrote:
> If you run  slurmd -C  on the compute node, it should tell you what 
> slurm thinks the RealMemory number is.
> 
> Jeff
> 
> ------------------------------------------------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of 
> navin srivastava <navin.altair at gmail.com>
> *Sent:* Friday, July 10, 2020 6:24 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] changes in slurm.
> Thank you for the answers.
> 
> is the RealMemory will be decided on the Total Memory value or total 
> usable memory value.
> 
> i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
> deda1x1591:~ # free -g
>               total       used       free     shared    buffers     cached
> Mem:           251         67        184          6          0         47
> 
> so we can add the value is 251*1024 MB  or 256*1024MB.  or is there any 
> slurm command which will provide me the value to add.
> 
> Regards
> Navin.
> 
> 
> 
> On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus <toomuchit at gmail.com 
> <mailto:toomuchit at gmail.com>> wrote:
> 
>     Navin,
> 
>     1. you will need to restart slurmctld when you make changes to the
>     physical definition of a node. This can be done without affecting
>     running jobs.
> 
>     2. You can have a node in more than one partition. That will not hurt
>     anything. Jobs are allocated to nodes, not partitions, the partition is
>     used to determine which node(s) and filter/order jobs. You should add
>     the node to the new partition, but also leave it in the 'test'
>     partition. If you are looking to remove the 'test' partition, set it to
>     down and once all the running jobs that are in it finish, then
>     remove it.
> 
>     Brian Andrus
> 
>     On 7/8/2020 10:57 PM, navin srivastava wrote:
>      > Hi Team,
>      >
>      > i have 2 small query.because of the lack of testing environment i am
>      > unable to test the scenario. working on to set up a test environment.
>      >
>      > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
>      > i found the reason is because there is no RealMemory entry in the
>     node
>      > definition of the slurm.
>      >
>      > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461]
>     NodeAddr=Node[1-12]
>      > Sockets=2 CoresPerSocket=10 State=UNKNOWN
>      >
>      > if i add the RealMemory it should be able to pick. So my query here
>      > is, is it possible to add RealMemory in the definition anytime while
>      > the jobs are in progres and execute the scontrol reconfigure and
>      > reload the daemon on client node?  or do we need to take a
>      > downtime?(which i don't think so)
>      >
>      > 2. Also I would like to know what will happen if some jobs are
>     running
>      > in a partition(say test) and I will move the associated node to some
>      > other partition(say normal) without draining the node.or if i
>     suspend
>      > the job and then change the node partition and will resume the
>     job. I
>      > am not deleting the partition here.
>      >
>      > Regards
>      > Navin.
>      >
>      >
>      >
>      >
>      >
>      >
>      >
> 


-------------------------------------------------------------------
Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch
+4144 632 30 59  |  ETF D 104  |  Sternwartstrasse 7  | 8092 Zurich
-------------------------------------------------------------------



More information about the slurm-users mailing list