[slurm-users] changes in slurm.
Stephan Roth
stephan.roth at ee.ethz.ch
Fri Jul 10 15:00:47 UTC 2020
It's recommended to round RealMemory down to the next lower gigabyte
value to prevent nodes from entering a drain state after rebooting with
a bios- or kernel-update.
Source: https://slurm.schedmd.com/SLUG17/FieldNotes.pdf, "Node
configuration"
Stephan
On 10.07.20 13:46, Sarlo, Jeffrey S wrote:
> If you run slurmd -C on the compute node, it should tell you what
> slurm thinks the RealMemory number is.
>
> Jeff
>
> ------------------------------------------------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> navin srivastava <navin.altair at gmail.com>
> *Sent:* Friday, July 10, 2020 6:24 AM
> *To:* Slurm User Community List <slurm-users at lists.schedmd.com>
> *Subject:* Re: [slurm-users] changes in slurm.
> Thank you for the answers.
>
> is the RealMemory will be decided on the Total Memory value or total
> usable memory value.
>
> i mean if a node having 256GB RAM but free -g will tell about only 251 GB.
> deda1x1591:~ # free -g
> total used free shared buffers cached
> Mem: 251 67 184 6 0 47
>
> so we can add the value is 251*1024 MB or 256*1024MB. or is there any
> slurm command which will provide me the value to add.
>
> Regards
> Navin.
>
>
>
> On Thu, Jul 9, 2020 at 8:01 PM Brian Andrus <toomuchit at gmail.com
> <mailto:toomuchit at gmail.com>> wrote:
>
> Navin,
>
> 1. you will need to restart slurmctld when you make changes to the
> physical definition of a node. This can be done without affecting
> running jobs.
>
> 2. You can have a node in more than one partition. That will not hurt
> anything. Jobs are allocated to nodes, not partitions, the partition is
> used to determine which node(s) and filter/order jobs. You should add
> the node to the new partition, but also leave it in the 'test'
> partition. If you are looking to remove the 'test' partition, set it to
> down and once all the running jobs that are in it finish, then
> remove it.
>
> Brian Andrus
>
> On 7/8/2020 10:57 PM, navin srivastava wrote:
> > Hi Team,
> >
> > i have 2 small query.because of the lack of testing environment i am
> > unable to test the scenario. working on to set up a test environment.
> >
> > 1. In my environment i am unable to pass #SBATCH --mem-2GB option.
> > i found the reason is because there is no RealMemory entry in the
> node
> > definition of the slurm.
> >
> > NodeName=Node[1-12] NodeHostname=deda1x[1450-1461]
> NodeAddr=Node[1-12]
> > Sockets=2 CoresPerSocket=10 State=UNKNOWN
> >
> > if i add the RealMemory it should be able to pick. So my query here
> > is, is it possible to add RealMemory in the definition anytime while
> > the jobs are in progres and execute the scontrol reconfigure and
> > reload the daemon on client node? or do we need to take a
> > downtime?(which i don't think so)
> >
> > 2. Also I would like to know what will happen if some jobs are
> running
> > in a partition(say test) and I will move the associated node to some
> > other partition(say normal) without draining the node.or if i
> suspend
> > the job and then change the node partition and will resume the
> job. I
> > am not deleting the partition here.
> >
> > Regards
> > Navin.
> >
> >
> >
> >
> >
> >
> >
>
-------------------------------------------------------------------
Stephan Roth | ISG.EE D-ITET ETH Zurich | http://www.isg.ee.ethz.ch
+4144 632 30 59 | ETF D 104 | Sternwartstrasse 7 | 8092 Zurich
-------------------------------------------------------------------
More information about the slurm-users
mailing list