29 Jul
2024
29 Jul
'24
9:23 a.m.
Hi there all, We have Dell server with 2 x Nvidia H100 and running slurm on it. After restart server if we do not write nvidia-smi command slurm fails. When we run nvidia-smi && systemctl restart slurmd && systemctl restart slurmctld , slurm queue begins. Do you have any idea about this error and what can we do for this issue? -- Best regards, Aziz Öğütlü Eduline Bilişim Sanayi ve Ticaret Ltd. Şti. www.eduline.com.tr Merkez Mah. Ayazma Cad. No:37 Papirus Plaza Kat:6 Ofis No:118 Kağıthane - İstanbul - Türkiye 34406 Tel : +90 212 324 60 61 Cep: +90 541 350 40 72