[slurm-users] squeue reports ReqNodeNotAvail but node is available

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Sun Jul 12 13:36:46 UTC 2020


In case your Arp cache is the problem, there is some advice in the Wiki 
page:
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#configure-arp-cache-for-large-networks

I think there are other causes for ReqNodeNotAvail, for example, the 
node being allocated for other jobs.  The "scontrol show node/job" 
should reveal more details.

/Ole


On 11-07-2020 06:00, mercan wrote:
> Hi Janna;
> 
> It sounds like a Arp cache table problem to me. If your slurm head node 
> can reachable ~1000 or more network devices (all connected network 
> cards, switches etc., even they are reachable by different ports of the 
> server), you need to increse some network settings at headnode and 
> servers which can reach same amount of network device :
> 
> http://docs.adaptivecomputing.com/torque/5-0-3/Content/topics/torque/12-appendices/otherConsiderations.htm 
> 
> 
> Also some advices for big cluster at slurm documentation:
> 
> https://slurm.schedmd.com/big_sys.html
> 
> Regards,
> 
> Ahmet M.
> 
> 
> 11.07.2020 01:34 tarihinde Janna Ore Nugent yazdı:
>>
>> Hi All,
>>
>> I’ve got an intermittent situation with gpu nodes that sinfo says are 
>> available and idle, but squeue reports as “ReqNodeNotAvail”.  We’ve 
>> cycled the nodes to restart services but it hasn’t helped.  Any 
>> suggestions for resolving this or digging into it more deeply?



More information about the slurm-users mailing list