[slurm-users] Slurm on POWER9

Keith Ball bipcuds at gmail.com
Fri Sep 14 10:45:58 MDT 2018


So we figured out the problem with "slurmd -C": we had run rpmbuild on the
POWER9 node, but did not have the hwloc-package installed. The build
process looks for this, and if not found, will apparently note use
hwloc/lstopo even if installed post-build.

Now Slurm reports the expected topology for SMT4:

NodeName=enki13 CPUs=160 Boards=1 *SocketsPerBoard=2* *CoresPerSocket=20*
*ThreadsPerCore=4* RealMemory=583992

Best,
  Keith

> > 1.) Slurm seems to be incapable of recognizing sockets/cores/threads on
> > these systems.
> [...]
> > Anyone know if there is a way to get Slurm to recognize the true
topology
> > for POWER nodes?
>
> IIIRC Slurm uses hwloc for discovering topology, so "lstopo-no-graphics"
might
> give you some insights into whether it's showing you the right config.
>
> I'd be curious to see what "lscpu" and "slurmd -C" say as well.

The biggest problem as I see it, is that if I have 2 20-core sockets, if I
have SMT2 set this looks like 80 single-core, single-thread sockets to
Slurm (see slurmd -C output below). If I have SMT4 set, it thinks there are
160 sockets.

NodeName=enki13 CPUs=80 Boards=1 SocketsPerBoard=80 CoresPerSocket=1
ThreadsPerCore=1 RealMemory=583992 UpTime=0-23:20:16
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180914/9299577c/attachment.html>


More information about the slurm-users mailing list