[slurm-users] Query about Compute + GPUs
Markus Köberl
markus.koeberl at tugraz.at
Tue Nov 21 02:52:44 MST 2017
On Friday, 3 November 2017 10:12:32 CET Merlin Hartley wrote:
> They would need to have different NodeNames - but the same NodeAddr for
> example:
>
> NodeName=fisesta-21-3 NodeAddr=10.1.21.3 CPUs=6 Weight=20485797
> Feature=rack-21,6CPUs NodeName=fisesta-21-3-gpu NodeAddr=10.1.21.3 CPUs=2
> Weight=20485797 Feature=rack-21,2CPUs Gres=gpu:1
>
> Hope this is useful!
For me this is not working.
I have the following lines in slurm.conf:
NodeName=gpu1 NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002 Sockets=2
CoresPerSocket=3 ThreadsPerCore=2 Gres=gpu:TeslaK40c:6
NodeName=gpu1-cpu NodeAddr=10.1.2.3 RealMemory=229376 Weight=998002 Sockets=2
CoresPerSocket=11 ThreadsPerCore=2
PartitionName=gpu Nodes=gpu1
PartitionName=cpu Nodes=gpu1-cpu
But if i submit to node gpu1-cpu I get the following error:
[2017-11-21T09:06:55.840] launch task 999708.0 request from 1044.1000 at 10.1.2.3
(port 45252)
[2017-11-21T09:06:55.840] error: Invalid job 999708.0 credential for user
1044: host gpu1 not in hostset gpu1-cpu
[2017-11-21T09:06:55.840] error: Invalid job credential from 1044 at 10.1.2.3:
Invalid job credential
It seams I am missing something. Any ideas what that could be?
I am using slurm 16.05.9 on debian stretch.
regards
Markus Köberl
--
Markus Koeberl
Graz University of Technology
Signal Processing and Speech Communication Laboratory
E-mail: markus.koeberl at tugraz.at
More information about the slurm-users
mailing list