[slurm-users] sbatch script won't accept --gres that requires more than 1 gpu

Marcus Wagner wagner at itc.rwth-aachen.de
Tue Feb 4 09:31:02 UTC 2020


Hi Dean,

could you please try to restart the slurmctld?

This usually helps on our site.
Never saw this with gres happening, but many other times.
This is, why we restart slurmctld once a day by a cron job.


Best
Marcus

On 2/4/20 12:59 AM, Dean Schulze wrote:
> When I run an sbatch script with the line
>
> #SBATCH --gres=gpu:gp100:1
>
> it runs.  When I change it to
>
> #SBATCH --gres=gpu:gp100:3
>
> it fails with "Requested node configuration is not available".  But I 
> have a node with 4 gp100s available.  Here's my slurm.conf:
>
> NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2 
> CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770 Gres=gpu:gp100:4
>
> That node has a gres.conf with these lines:
>
> Name=gpu Type=gp100  File=/dev/nvidia0
> Name=gpu Type=gp100  File=/dev/nvidia1
> Name=gpu Type=gp100  File=/dev/nvidia2
> Name=gpu Type=gp100  File=/dev/nvidia3
>
> The character devices all exist in /dev.
>
> What's the controller complaining about?

-- 
Marcus Wagner, Dipl.-Inf.

IT Center
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de




More information about the slurm-users mailing list