Which hardware platform is this? We've had the same issue on Dell with H100 even without MIG setup, we've had to restart the slurmd daemon after boot in order to make sure that everything is fine.
Patryk.
On 25/06/12 01:46, Richard Lefebvre via slurm-users wrote: [-- Type: text/plain; charset=UTF-8, Encoding: 7bit, Size: 0,5K --]
I'm having problems with Autodetect=nvml in gres.conf.
I get on the controller log the following:
error: _check_core_range_matches_sock: gres/gpu GRES autodetected core affinity 16-31 on node node001 doesn't match socket boundaries. (Socket 0 is cores 0-31). Consider setting SlurmdParameters=l3cache_as_socket (recommended) or override this by manually specifying core affinity in gres.conf.
I did set l3cache_as_socket in the slurm.conf of the node, but I still get the error on the slurm controler
I'm running 24.11.5 on AlmaLinux 9.5
Richard
[-- Alternative Type #1: text/html; charset=UTF-8, Encoding: quoted-printable, Size: 0,7K --]
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com