[slurm-users] slurm 17.11.2: Socket timed out on send/recv operation

Alessandro Federico a.federico at cineca.it
Wed Jan 17 05:20:49 MST 2018


Hi Matthieu & John

this is the backtrace of slurmctld during the slowdown

(gdb) bt
#0  0x00007fb0e8b1e69d in poll () from /lib64/libc.so.6
#1  0x00007fb0e8617bfa in sss_cli_make_request_nochecks () from /lib64/libnss_sss.so.2
#2  0x00007fb0e86185a3 in sss_nss_make_request () from /lib64/libnss_sss.so.2
#3  0x00007fb0e8619104 in _nss_sss_getpwnam_r () from /lib64/libnss_sss.so.2
#4  0x00007fb0e8aef07d in getpwnam_r@@GLIBC_2.2.5 () from /lib64/libc.so.6
#5  0x00007fb0e9360986 in _getpwnam_r (result=<optimized out>, bufsiz=<optimized out>, buf=<optimized out>, pwd=<optimized out>, name=<optimized out>) at uid.c:73
#6  uid_from_string (name=0x1820e41 "g2bottin", uidp=uidp at entry=0x7fff07f03a6c) at uid.c:111
#7  0x000000000043587d in get_group_members (group_name=0x10ac500 "g2") at groups.c:139
#8  0x000000000047525a in _get_groups_members (group_names=<optimized out>) at partition_mgr.c:2006
#9  0x0000000000475505 in _update_part_uid_access_list (x=0x7fb03401e650, arg=0x7fff07f13bf4) at partition_mgr.c:1930
#10 0x00007fb0e92ab675 in list_for_each (l=0x1763e50, f=f at entry=0x4754d8 <_update_part_uid_access_list>, arg=arg at entry=0x7fff07f13bf4) at list.c:420
#11 0x000000000047911a in load_part_uid_allow_list (force=1) at partition_mgr.c:1971
#12 0x0000000000428e5c in _slurmctld_background (no_data=0x0) at controller.c:1911
#13 main (argc=<optimized out>, argv=<optimized out>) at controller.c:601

As Matthieu said it seems something related to SSS daemon.
However we don't notice any slowdown due to SSSd in our environment. 
As I told you before, we are just testing SLURM on a small 100 nodes cluster before going into production with about 6000 nodes next Wednesday.
At present the other nodes are managed by PBSPro and the 2 PBS servers are running on the same nodes as the SLURM controllers.
PBS queues are also configured with users/groups ACLs and we never noticed any similar slowdown.

Moreover, only 3 SLURM partitions have the AllowGroups ACL

[root at mgmt01 slurm]# grep AllowGroups slurm.conf 
PartitionName=bdw_fua_gwdbg Nodes=r040c03s0[1,2] Default=NO DefMemPerCPU=3000 DefaultTime=00:30:00 MaxTime=00:30:00  State=UP QOS=bdw_fua_gwdbg DenyQos=bdw_qos_special AllowGroups=g2
PartitionName=bdw_fua_gw    Nodes=r040c03s0[1,2] Default=NO DefMemPerCPU=3000 DefaultTime=00:30:00 MaxTime=48:00:00  State=UP QOS=bdw_fua_gw    DenyQos=bdw_qos_special AllowGroups=g2
PartitionName=bdw_fua_gwg2  Nodes=r040c03s0[1,2] Default=NO DefMemPerCPU=3000 DefaultTime=00:30:00 MaxTime=168:00:00 State=UP QOS=bdw_fua_gwg2  DenyQos=bdw_qos_special AllowGroups=g2

So why does the UID-GID mapping take so long?

@John: we defined many partitions on the same nodes but in the production cluster they will be more or less split across the 6K nodes.

thank you very much
ale

----- Original Message -----
> From: "John DeSantis" <desantis at usf.edu>
> To: "Matthieu Hautreux" <matthieu.hautreux at gmail.com>
> Cc: hpc-sysmgt-info at cineca.it, "Slurm User Community List" <slurm-users at lists.schedmd.com>, "Isabella Baccarelli"
> <i.baccarelli at cineca.it>
> Sent: Tuesday, January 16, 2018 8:20:20 PM
> Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on send/recv operation
> 
> Matthieu,
> 
> > I would bet on something like LDAP requests taking too much time
> > because of a missing sssd cache.
> 
> Good point!  It's easy to forget to check something as "simple" as
> user
> look-up when something is taking "too long".
> 
> John DeSantis
> 
> On Tue, 16 Jan 2018 19:13:06 +0100
> Matthieu Hautreux <matthieu.hautreux at gmail.com> wrote:
> 
> > Hi,
> > 
> > In this kind if issues, one good thing to do is to get a backtrace
> > of
> > slurmctld during the slowdown. You should thus easily identify the
> > subcomponent responsible for the issue.
> > 
> > I would bet on something like LDAP requests taking too much time
> > because of a missing sssd cache.
> > 
> > Regards
> > Matthieu
> > 
> > Le 16 janv. 2018 18:59, "John DeSantis" <desantis at usf.edu> a écrit
> > :
> > 
> > > -----BEGIN PGP SIGNED MESSAGE-----
> > > Hash: SHA512
> > >
> > > Ciao Alessandro,
> > >
> > > > setting MessageTimeout to 20 didn't solve it :(
> > > >
> > > > looking at slurmctld logs I noticed many warning like these
> > > >
> > > > Jan 16 05:11:00 r000u17l01 slurmctld[22307]: Warning: Note very
> > > > large processing time from _slurm_rpc_dump_partitions:
> > > > usec=42850604 began=05:10:17.289 Jan 16 05:20:58 r000u17l01
> > > > slurmctld[22307]: Warning: Note very large processing time from
> > > > load_part_uid_allow_list: usec=44861325 began=05:20:13.257 Jan
> > > > 16
> > > > 05:20:58 r000u17l01 slurmctld[22307]: Warning: Note very large
> > > > processing time from _slurmctld_background: usec=44861653
> > > > began=05:20:13.257
> > >
> > > And:
> > >
> > > > 271 Note very large processing time from
> > > > _slurm_rpc_dump_partitions: 67 Note very large processing time
> > > > from load_part_uid_allow_list:
> > >
> > > I believe these values are in microseconds, so an average of 44
> > > seconds per call, mostly related to partition information.  Given
> > > that our configuration has the maximum value set of 90 seconds,
> > > I'd
> > > again recommend another adjustment, perhaps to 60 seconds.
> > >
> > > I'm not sure if redefining your partitions will help, but you do
> > > have several partitions which contain the same set of nodes that
> > > could be condensed - decreasing the amount of partitions.  For
> > > example, the partitions bdw_all_serial & bdw_all_rcm could be
> > > consolidated into a single partition by:
> > >
> > > 1.)  Using AllowQOS=bdw_all_serial,bdw_all_rcm;
> > > 2.)  Setting MaxTime to 04:00:00 and defining a MaxWall via each
> > > QOS
> > > (since one partition has 04:00:00 and the other 03:00:00).
> > >
> > > The same could be done for the partitions
> > > skl_fua_{prod,bprod,lprod} as well.
> > >
> > > HTH,
> > > John DeSantis
> > >
> > >
> > > On Tue, 16 Jan 2018 11:22:44 +0100
> > > Alessandro Federico <a.federico at cineca.it> wrote:
> > >
> > > > Hi,
> > > >
> > > > setting MessageTimeout to 20 didn't solve it :(
> > > >
> > > > looking at slurmctld logs I noticed many warning like these
> > > >
> > > > Jan 16 05:11:00 r000u17l01 slurmctld[22307]: Warning: Note very
> > > > large processing time from _slurm_rpc_dump_partitions:
> > > > usec=42850604 began=05:10:17.289 Jan 16 05:20:58 r000u17l01
> > > > slurmctld[22307]: Warning: Note very large processing time from
> > > > load_part_uid_allow_list: usec=44861325 began=05:20:13.257 Jan
> > > > 16
> > > > 05:20:58 r000u17l01 slurmctld[22307]: Warning: Note very large
> > > > processing time from _slurmctld_background: usec=44861653
> > > > began=05:20:13.257
> > > >
> > > > they are generated in many functions:
> > > >
> > > > [root at r000u17l01 ~]# journalctl -u slurmctld
> > > > --since='2018-01-16
> > > > 00:00:00'  | grep -oP 'Note very large processing time from
> > > > \w+:'
> > > > | sort | uniq -c 4 Note very large processing time from
> > > > dump_all_job_state: 67 Note very large processing time from
> > > > load_part_uid_allow_list: 67 Note very large processing time
> > > > from
> > > > _slurmctld_background: 7 Note very large processing time from
> > > > _slurm_rpc_complete_batch_script: 4 Note very large processing
> > > > time from _slurm_rpc_dump_jobs: 3 Note very large processing
> > > > time
> > > > from _slurm_rpc_dump_job_user: 271 Note very large processing
> > > > time from _slurm_rpc_dump_partitions: 5 Note very large
> > > > processing time from _slurm_rpc_epilog_complete: 1 Note very
> > > > large processing time from _slurm_rpc_job_pack_alloc_info: 3
> > > > Note
> > > > very large processing time from _slurm_rpc_step_complete:
> > > >
> > > > processing times are always around tens of seconds.
> > > >
> > > > I'm attaching sdiag output and slurm.conf.
> > > >
> > > > thanks
> > > > ale
> > > >
> > > > ----- Original Message -----
> > > > > From: "Trevor Cooper" <tcooper at sdsc.edu>
> > > > > To: "Slurm User Community List"
> > > > > <slurm-users at lists.schedmd.com>
> > > > > Sent: Tuesday, January 16, 2018 12:10:21 AM
> > > > > Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out on
> > > > > send/recv operation
> > > > >
> > > > > Alessandro,
> > > > >
> > > > > You might want to consider tracking your Slurm scheduler
> > > > > diagnostics output with some type of time-series monitoring
> > > > > system. The time-based history has proven more helpful at
> > > > > times
> > > > > than log contents by themselves.
> > > > >
> > > > > See Giovanni Torres' post on setting this up...
> > > > >
> > > > >     http://giovannitorres.me/graphing-sdiag-with-graphite.html
> > > > >
> > > > > -- Trevor
> > > > >
> > > > > > On Jan 15, 2018, at 4:33 AM, Alessandro Federico
> > > > > > <a.federico at cineca.it> wrote:
> > > > > >
> > > > > > Hi John
> > > > > >
> > > > > > thanks for the info.
> > > > > > slurmctld doesn't report anything about the server thread
> > > > > > count in the logs
> > > > > > and sdiag show only 3 server threads.
> > > > > >
> > > > > > We changed the MessageTimeout value to 20.
> > > > > >
> > > > > > I'll let you know if it solves the problem.
> > > > > >
> > > > > > Thanks
> > > > > > ale
> > > > > >
> > > > > > ----- Original Message -----
> > > > > >> From: "John DeSantis" <desantis at usf.edu>
> > > > > >> To: "Alessandro Federico" <a.federico at cineca.it>
> > > > > >> Cc: slurm-users at lists.schedmd.com, "Isabella Baccarelli"
> > > > > >> <i.baccarelli at cineca.it>, hpc-sysmgt-info at cineca.it
> > > > > >> Sent: Friday, January 12, 2018 7:58:38 PM
> > > > > >> Subject: Re: [slurm-users] slurm 17.11.2: Socket timed out
> > > > > >> on
> > > > > >> send/recv operation
> > > > > >>
> > > > > >> Ciao Alessandro,
> > > > > >>
> > > > > >>> Do we have to apply any particular setting to avoid
> > > > > >>> incurring the problem?
> > > > > >>
> > > > > >> What is your "MessageTimeout" value in slurm.conf?  If
> > > > > >> it's
> > > > > >> at the default of 10, try changing it to 20.
> > > > > >>
> > > > > >> I'd also check and see if the slurmctld log is reporting
> > > > > >> anything pertaining to the server thread count being over
> > > > > >> its limit.
> > > > > >>
> > > > > >> HTH,
> > > > > >> John DeSantis
> > > > > >>
> > > > > >> On Fri, 12 Jan 2018 11:32:57 +0100
> > > > > >> Alessandro Federico <a.federico at cineca.it> wrote:
> > > > > >>
> > > > > >>> Hi all,
> > > > > >>>
> > > > > >>>
> > > > > >>> we are setting up SLURM 17.11.2 on a small test cluster
> > > > > >>> of
> > > > > >>> about 100
> > > > > >>> nodes. Sometimes we get the error in the subject when
> > > > > >>> running any SLURM command (e.g. sinfo, squeue, scontrol
> > > > > >>> reconf, etc...)
> > > > > >>>
> > > > > >>>
> > > > > >>> Do we have to apply any particular setting to avoid
> > > > > >>> incurring the problem?
> > > > > >>>
> > > > > >>>
> > > > > >>> We found this bug report
> > > > > >>> https://bugs.schedmd.com/show_bug.cgi?id=4002 but it
> > > > > >>> regards the previous SLURM version and we do not set
> > > > > >>> debug3
> > > > > >>> on slurmctld.
> > > > > >>>
> > > > > >>>
> > > > > >>> thanks in advance
> > > > > >>> ale
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > > --
> > > > > > Alessandro Federico
> > > > > > HPC System Management Group
> > > > > > System & Technology Department
> > > > > > CINECA www.cineca.it
> > > > > > Via dei Tizii 6, 00185 Rome - Italy
> > > > > > phone: +39 06 44486708
> > > > > >
> > > > > > All work and no play makes Jack a dull boy.
> > > > > > All work and no play makes Jack a dull boy.
> > > > > > All work and no play makes Jack...
> > > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > > -----BEGIN PGP SIGNATURE-----
> > > Version: GnuPG v2
> > >
> > > iQEcBAEBCgAGBQJaXjxzAAoJEEmckBqrs5nB9FQH/Rq6avZRXV0r1qQhSBH514J6
> > > vHWzGAgVSvBrpxFrtfu3aVTK6fk3bFahB9t2jtVJlg0HgO8dm3Gj6FMNo0nDyemD
> > > NlIePvvXGwZYXeXlif+OtCTu/3fOqvuol1jX8/iXcG89Lm+HA92BhLKPYoqzWsK4
> > > KQ/m8Mlj91Ei3GRZorZfyZrRrfAYNatIV2plmRaGWmuH39MEwQ0bF/qQhci/LAXB
> > > xquAZWAVeSE1uWThXPS4sbzmHjNuenT9RqlGtgQOEMO4z/bHFQwmMVuxqfmS537h
> > > /93icpAcWhJQ1bYe51ePykWk3Jkv901Z7Cr6bG1+hu2asN1loFzz38YugHUcfBs=
> > > =VWA7
> > > -----END PGP SIGNATURE-----
> > >
> 
> 
> 

-- 
Alessandro Federico 
HPC System Management Group 
System & Technology Department 
CINECA www.cineca.it 
Via dei Tizii 6, 00185 Rome - Italy 
phone: +39 06 44486708 

All work and no play makes Jack a dull boy. 
All work and no play makes Jack a dull boy. 
All work and no play makes Jack...



More information about the slurm-users mailing list