[slurm-users] How to apply for multiple GPU cards from different worker nodes?
Ran Du
bella.ran.du at gmail.com
Tue Apr 16 08:15:54 UTC 2019
Dear Antony,
It's worked!
I checked the allocation, and here is the record:
Nodes=gpu012 CPU_IDs=0-2 Mem=3072 GRES_IDX=gpu:v100(IDX:0-7)
Nodes=gpu013 CPU_IDs=0 Mem=1024 GRES_IDX=gpu:v100(IDX:0-7)
The job has got what it applied for.
And another question is : how to apply for multiple cards could not
be divided exactly by 8? For example, to apply for 10 GPU cards, 8 cards on
one node and 2 cards on another node?
Thanks a lot again for your kind help.
Best regards,
Ran
On Mon, Apr 15, 2019 at 8:25 PM Ran Du <bella.ran.du at gmail.com> wrote:
> Dear Antony,
>
> Thanks a lot for your reply, I tried to submit a job with your
> advice, and no more sbatch errors.
>
> But because our cluster is under maintenance, I have to wait till
> tomorrow to see if GPU cards are allocated correctly. I will let you know
> as soon as the job is submitted successfully.
>
> Thanks a lot for your kind help.
>
> Best regards,
> Ran
>
> On Mon, Apr 15, 2019 at 4:40 PM Antony Cleave <antony.cleave at gmail.com>
> wrote:
>
>> Ask for 8 gpus on 2 nodes instead.
>>
>> In your script just change the 16 to 8 and it should do what you want.
>>
>> You are currently asking for 2 nodes with 16 gpu each as Gres resources
>> are per node.
>>
>> Antony
>>
>> On Mon, 15 Apr 2019, 09:08 Ran Du, <bella.ran.du at gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> Does anyone know how to set #SBATCH options to get multiple GPU
>>> cards from different worker nodes?
>>>
>>> One of our users would like to apply for 16 NVIDIA V100 cards for
>>> his job, and there are 8 GPU cards on each worker node, I have tried the
>>> following #SBATCH options:
>>>
>>> #SBATCH --partition=gpu
>>> #SBATCH --qos=normal
>>> #SBATCH --account=u07
>>> #SBATCH --job-name=cross
>>> #SBATCH --nodes=2
>>> #SBATCH --mem-per-cpu=1024
>>> #SBATCH --output=test.32^4.16gpu.log
>>> #SBATCH --gres=gpu:v100:16
>>>
>>> but got the sbatch error message :
>>> sbatch: error: Batch job submission failed: Requested node
>>> configuration is not available
>>>
>>> And I found a similar question on stack overflow:
>>>
>>> https://stackoverflow.com/questions/45200926/how-to-access-to-gpus-on-different-nodes-in-a-cluster-with-slurm
>>>
>>> And it is said that multiple GPU cards allocation on different
>>> worker nodes are not available, the post is in 2017, is it still true at
>>> present?
>>>
>>> Thanks a lot for your help.
>>>
>>> Best regards,
>>> Ran
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190416/6b10ea80/attachment-0001.html>
More information about the slurm-users
mailing list