[slurm-users] Wrong hwloc detected?

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Sun Nov 7 19:22:12 UTC 2021


Hi Diego,

Are you sure that the Slurm software installed on all compute nodes was 
actually built on a system which had the hwloc packages installed?  They 
should also be installed on the compute nodes.  The prerequisite 
packages are listed here:
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites

/Ole


On 05-11-2021 15:38, Diego Zuccato wrote:
> They aren't using modules so it must be something system-wide :(
> But not all jobs are impacted. And it seems it's a bit random (doesn't 
> happen always).
> I'm out of ideas, currently :(
> 
> Il 05/11/2021 13:10, Ole Holm Nielsen ha scritto:
>> On 11/5/21 12:47, Diego Zuccato wrote:
>>> Some users are reporting this error:
>>>
>>> slurmstepd-str957-mtx-01: error: hwloc_get_obj_below_by_type() 
>>> failing, task/affinity plugin may be required to address bug fixed in 
>>> HWLOC version 1.11.5
>>> slurmstepd-str957-mtx-01: error: task[0] unable to set taskset '0x0'
>>>
>>> I checked on that node and hwloc is newer:
>>> diego.zuccato at str957-mtx-01:~$ hwloc-info --version
>>> hwloc-info 2.4.1
>>>
>>> How can Slurm detect such an old HWLOC version?
>>
>> Maybe the user loads a software module which also loads an old hwloc 
>> module?   The user should do "module list" in the job to verify this.



More information about the slurm-users mailing list