[slurm-users] Spoofing a GPU on a slurm node virtual machine

Dean Schulze dean.w.schulze at gmail.com
Wed Jan 22 23:03:42 UTC 2020


I'm trying to spoof a gpu on a Centos 7.7 virtual machine that is a slurm
node.  I just want slurm to see that this node has a gpu.  I'm not going to
execute any code that uses a gpu.

I created a character device with:
mknod nvidia0 c 1 1

Here's what it looks like:
[root at liqidos-dean-node1 dev]# ls -l nvidia0
crw-------. 1 root root 1, 1 Jan 22 15:43 nvidia0

Here's my gres.conf:
Name=gpu Type=gp100  File=/dev/nvidia0 Cores=0,1

The relevant lines from my slurm.conf are (my full slurm.conf is below):
...
GresTypes=gpu
...
SelectType=select/cons_tres
....
NodeName=liqidos-dean-node1 Gres=gpu:gp100:1 CPUs=2 RealMemory=3770
Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN

After restarting slurmd it slurm doesn't recognize my spoofed gpu:

[liqid at liqidos-dean-node1 ~]$ slurmd -C
NodeName=liqidos-dean-node1 CPUs=2 Boards=1 SocketsPerBoard=2
CoresPerSocket=1 ThreadsPerCore=1 RealMemory=3770
UpTime=0-06:47:11

[liqid at liqidos-dean-node1 ~]$ scontrol show node
NodeName=liqidos-dean-node1 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUTot=2 CPULoad=0.01
   AvailableFeatures=(null)
   ActiveFeatures=(null)
   Gres=(null)
   NodeAddr=liqidos-dean-node1 NodeHostName=liqidos-dean-node1
Version=19.05.4
   OS=Linux 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019
   RealMemory=3770 AllocMem=0 FreeMem=177 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=1 Owner=N/A MCS_label=N/A
   Partitions=debug
   BootTime=2020-01-22T09:12:57 SlurmdStartTime=2020-01-22T15:55:16
   CfgTRES=cpu=2,mem=3770M,billing=2
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 AveWatts=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s


Have I missed something, or is slurm smart enough to recognize that I don't
have a real GPU?

Thanks.



Full slurm.conf:
SlurmctldHost=slurmctld-dean
GresTypes=gpu
MpiDefault=none
PluginDir=/usr/local/lib/slurm
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
StateSaveLocation=/var/spool/slurm/state
SwitchType=switch/none
TaskPlugin=task/affinity
TaskPluginParam=Sched
InactiveLimit=0
KillWait=30
MinJobAge=300
SlurmctldTimeout=120
SlurmdTimeout=300
Waittime=0
SchedulerType=sched/backfill
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
AccountingStorageType=accounting_storage/slurmdbd
AccountingStoreJobComment=YES
ClusterName=cluster
JobCompType=jobcomp/none
JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
SlurmctldDebug=info
SlurmctldLogFile=/var/log/slurm/slurmctld.log
SlurmdDebug=info
SlurmdLogFile=/var/log/slurm/slurmd.log
NodeName=liqidos-dean-node1 Gres=gpu:gp100:1 CPUs=2 RealMemory=3770
Sockets=2 CoresPerSocket=1 ThreadsPerCore=1 State=UNKNOWN
PartitionName=debug Nodes=liqidos-dean-node1 Default=YES MaxTime=INFINITE
State=UP
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200122/652d3de5/attachment.htm>


More information about the slurm-users mailing list