<div dir="ltr">In the gres.conf on one of my nodes I have just the line<br><br>    Autodetect=nvml<br><br>as in the last example in <a href="https://slurm.schedmd.com/gres.conf.html">https://slurm.schedmd.com/gres.conf.html</a>.<br><br>In the slurm.conf on all nodes I have this line for the node with Autodetect=nvml<br><br>    NodeName=slurmnode1 CPUs=16 Boards=1 SocketsPerBoard=1 CoresPerSocket=8 ThreadsPerCore=2 RealMemory=47671 Gres=gpu:gp100:4<br><br>since that node can have up to 4 gpus dynamically assigned.  Without the Gres=gpu:gp100:4 I can't run any job that requires a gpu even if I dynamically assign gpus on that node.  Apparently Autodetect=nvml isn't enough to let the controller know that there are gpus available on that node.<br><br>With this configuration I get this message every second in my slurmctld.log file:<br><br>    error: _slurm_rpc_node_registration node=slurmnode1: Invalid argument<br><br>I've restarted both slurmd and slurmctld and still get the error.  That node also stays in the drain state no matter what I do with it.  Apparently slurm doesn't like this configuration.<br><br>What is the right way to configure a node with Autodetect=nvml?<br><div><br></div></div>