[slurm-users] Wrong hwloc detected?

Diego Zuccato diego.zuccato at unibo.it
Mon Nov 8 08:06:43 UTC 2021


Hi Ole.

I'm using the packages from Debian stable (slurm 20.11.4, hwloc 2.4.1).
And I checked: hwloc is installed on all the nodes. Quite obvious since 
it's a dep for slurmd:
https://packages.debian.org/bullseye/slurmd
Being a dep, i "suspect" slurmd is built with hwloc support.

Diego

Il 07/11/2021 20:22, Ole Holm Nielsen ha scritto:
> Hi Diego,
> 
> Are you sure that the Slurm software installed on all compute nodes was 
> actually built on a system which had the hwloc packages installed?  They 
> should also be installed on the compute nodes.  The prerequisite 
> packages are listed here:
> https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites
> 
> /Ole
> 
> 
> On 05-11-2021 15:38, Diego Zuccato wrote:
>> They aren't using modules so it must be something system-wide :(
>> But not all jobs are impacted. And it seems it's a bit random (doesn't 
>> happen always).
>> I'm out of ideas, currently :(
>>
>> Il 05/11/2021 13:10, Ole Holm Nielsen ha scritto:
>>> On 11/5/21 12:47, Diego Zuccato wrote:
>>>> Some users are reporting this error:
>>>>
>>>> slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() 
>>>> failing, task/affinity plugin may be required to address bug fixed 
>>>> in HWLOC version 1.11.5
>>>> slurmstepd-str957-mtx-01: error: task[0] unable to set taskset '0x0'
>>>>
>>>> I checked on that node and hwloc is newer:
>>>> diego.zuccato at str957-mtx-01:~$ hwloc-info --version
>>>> hwloc-info 2.4.1
>>>>
>>>> How can Slurm detect such an old HWLOC version?
>>>
>>> Maybe the user loads a software module which also loads an old hwloc 
>>> module?   The user should do "module list" in the job to verify this.
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786



More information about the slurm-users mailing list