[slurm-users] A strange situation of different network cards on the same network

James Lam unison2004 at gmail.com
Wed Oct 11 02:29:44 UTC 2023

We have a cluster of 176 nodes consisting Infiniband switch and 10GbE 
and we are using 10GbE as SSH. Currently we have the older cards of
Marvell 10GbE at launch

Current batch of 10GbE Qlogic card

We are using slurm 20.11.4 as server and node health check daemon are 
also deployed using the OpenHPC method. However , we have no issue on 
using the Marvell 10GbE cards - which don't have slurm node down <--> 
idle state. However, we do have the flip-flip situation of the down <--> 
idle state

We tried on increasing the ARP caching , changing the subversion of the 
client to 20.11.9 , which doesn't help with the situation.

We would like to see if anyone faced similar situation?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20231011/cb75db13/attachment-0001.htm>

More information about the slurm-users mailing list