[slurm-users] Seff error with Slurm-18.08.1

Miguel A. Sánchez miguelangel.sanchez at upf.edu
Thu Nov 8 03:06:27 MST 2018


Hi and thanks for all your answers and sorry for the delay in my answer.
Yesterday I have installed in the controller machine the Slurm-18.08.3
to check if with this last release the Seff command is working fine. The
behavior has improve but I still receive a error message:


# /usr/local/slurm-18.08.3/bin/seff 1694112
*Use of uninitialized value $lmem in numeric lt (<) at
/usr/local/slurm-18.08.3/bin/seff line 130, <DATA> line 624.*
Job ID: 1694112
Cluster: XXXXX
User/Group: XXXXX
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:39:33
CPU Efficiency: 4266.43% of 00:02:20 core-walltime
Job Wall-clock time: 00:01:10
Memory Utilized: 0.00 MB (estimated maximum)
Memory Efficiency: 0.00% of 3.91 GB (3.91 GB/node)
[root at hydra ~]#


And due to this problem,  any job shows me as memory utilized the value
of 0.00 MB.


With slurm-17.11.1 is working fine:


# /usr/local/slurm-17.11.0/bin/seff 1694112
Job ID: 1694112
Cluster: XXXXX
User/Group: XXXXX
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 2
CPU Utilized: 01:39:33
CPU Efficiency: 4266.43% of 00:02:20 core-walltime
Job Wall-clock time: 00:01:10
Memory Utilized: 2.44 GB
Memory Efficiency: 62.57% of 3.91 GB
[root at hydra bin]#




Miguel A. Sánchez Gómez
System Administrator
Research Programme on Biomedical Informatics - GRIB (IMIM-UPF)

Barcelona Biomedical Research Park (office 4.80)
Doctor Aiguader 88 | 08003 Barcelona (Spain)
Phone: +34/ 93 316 0522 | Fax: +34/ 93 3160 550
e-mail: miguelangel.sanchez at upf.edu

On 11/06/2018 06:30 PM, Mike Cammilleri wrote:
>
> Thanks for this. We'll try the workaround script. It is not
> mission-critical but our users have gotten accustomed to seeing these
> metrics at the end of each run and its nice to have. We are currently
> doing this in a test VM environment, so by the time we actually do the
> upgrade to the cluster perhaps the fix will be available then.
>
>
> Mike Cammilleri
>
> Systems Administrator
>
> Department of Statistics | UW-Madison
>
> 1300 University Ave | Room 1280
> 608-263-6673 | mikec at stat.wisc.edu
>
>
>
> ------------------------------------------------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf
> of Chris Samuel <chris at csamuel.org>
> *Sent:* Tuesday, November 6, 2018 5:03 AM
> *To:* slurm-users at lists.schedmd.com
> *Subject:* Re: [slurm-users] Seff error with Slurm-18.08.1
>  
> On 6/11/18 7:49 pm, Baker D.J. wrote:
>
> > The good new is that I am assured by SchedMD that the bug has been
> fixed
> > in v18.08.3.
>
> Looks like it's fixed in this commmit.
>
> commit 3d85c8f9240542d9e6dfb727244e75e449430aac
> Author: Danny Auble <da at schedmd.com>
> Date:   Wed Oct 24 14:10:12 2018 -0600
>
>      Handle symbol resolution errors in the 18.08 slurmdbd.
>
>      Caused by b1ff43429f6426c when moving the slurmdbd agent internals.
>
>      Bug 5882.
>
>
> > Having said that we will probably live with this issue
> > rather than disrupt users with another upgrade so soon .
>
> An upgrade to 18.08.3 from 18.08.1 shouldn't be disruptive though,
> should it?  We just flip a symlink and the users see the new binaries,
> libraries, etc immediately, we can then restart daemons as and when we
> need to (in the right order of course, slurmdbd, slurmctld and then
> slurmd's).
>
> All the best,
> Chris
> -- 
>   Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181108/435b73ba/attachment.html>


More information about the slurm-users mailing list