[slurm-users] squeue reports ReqNodeNotAvail but node is available

mercan ahmet.mercan at uhem.itu.edu.tr
Sat Jul 11 04:00:43 UTC 2020


Hi Janna;

It sounds like a Arp cache table problem to me. If your slurm head node 
can reachable ~1000 or more network devices (all connected network 
cards, switches etc., even they are reachable by different ports of the 
server), you need to increse some network settings at headnode and 
servers which can reach same amount of network device :

http://docs.adaptivecomputing.com/torque/5-0-3/Content/topics/torque/12-appendices/otherConsiderations.htm

Also some advices for big cluster at slurm documentation:

https://slurm.schedmd.com/big_sys.html

Regards,

Ahmet M.


11.07.2020 01:34 tarihinde Janna Ore Nugent yazdı:
>
> Hi All,
>
> I’ve got an intermittent situation with gpu nodes that sinfo says are 
> available and idle, but squeue reports as “ReqNodeNotAvail”.  We’ve 
> cycled the nodes to restart services but it hasn’t helped.  Any 
> suggestions for resolving this or digging into it more deeply?
>
> Thanks,
>
> Janna
>
> *Janna Nugent, MS*
>
> Sr. Computational Genomics Specialist
>
> Research Computing Services
>
> Northwestern University
>
> www.it.northwestern.edu/research/ 
> <http://www.it.northwestern.edu/research/>
>
> janna.nugent at northwestern.edu
>



More information about the slurm-users mailing list