[slurm-users] Query about Compute + GPUs

Tue Nov 21 02:52:44 MST 2017

On Friday, 3 November 2017 10:12:32 CET Merlin Hartley wrote:
> They would need to have different NodeNames - but the same NodeAddr for
> example:
> 
> NodeName=fisesta-21-3 NodeAddr=10.1.21.3 CPUs=6 Weight=20485797
> Feature=rack-21,6CPUs NodeName=fisesta-21-3-gpu NodeAddr=10.1.21.3 CPUs=2
> Weight=20485797 Feature=rack-21,2CPUs Gres=gpu:1
> 
> Hope this is useful!

For me this is not working.

I have the following lines in slurm.conf:

NodeName=gpu1 NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002  Sockets=2 
CoresPerSocket=3 ThreadsPerCore=2 Gres=gpu:TeslaK40c:6

NodeName=gpu1-cpu NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002  Sockets=2 
CoresPerSocket=11 ThreadsPerCore=2

PartitionName=gpu Nodes=gpu1
PartitionName=cpu Nodes=gpu1-cpu

But if i submit to node gpu1-cpu I get the following error:

[2017-11-21T09:06:55.840] launch task 999708.0 request from 1044.1000 at 10.1.2.3 
(port 45252)
[2017-11-21T09:06:55.840] error: Invalid job 999708.0 credential for user 
1044: host gpu1 not in hostset gpu1-cpu
[2017-11-21T09:06:55.840] error: Invalid job credential from 1044 at 10.1.2.3: 
Invalid job credential

It seams I am missing something. Any ideas what that could be?
I am using slurm 16.05.9 on debian stretch.

regards
Markus Köberl
-- 
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeberl at tugraz.at