[slurm-users] gres with docker problem

허웅 hoewoonggood at naver.com
Sun Jan 6 21:26:14 MST 2019


I agree with Chris's opinion.
 
I could find out the reason.
 
As Chris said, the problem is cgroup.
 
when I request a job to slurm that using 1 gres:gpu, slurm assign the job to the node who can have enough resource.
 
when slurm assign a job to the node, slurm gives resource information to node after make a cgroup environment.
 
But, the problem is that Docker uses their own cgroup config.
 
That's why I could get right information through slurm-side not Docker-side.
 
Here is my workaround code for get right information in the Docker-side.
 


scontrol show job=$SLURM_JOBID --details | grep GRES_IDX | awk -F "IDX:" '{print $2}' | awk -F ")" '{print $1}'


scontrol show with --details option can get GRES_IDX.
So, I've used this information in my application.
Please refer to this command if someone is suffering this.
 
-----Original Message-----
From: "Chris Samuel"<chris at csamuel.org>
To: <slurm-users at lists.schedmd.com>;
Cc:
Sent: 2019-01-07 (월) 11:59:09
Subject: Re: [slurm-users] gres with docker problem
 
On 4/1/19 5:48 am, Marcin Stolarek wrote:

> I think that the main reason is the lack of access to some /dev "files"
> in your docker container. For singularity nvidia plugin is required,
> maybe there is something similar for docker...

That's unlikely, the problem isn't that nvidia-smi isn't working in
Docker because of a lack of device files, the problem is that it's
seeing all 4 GPUs and thus is no longer being controlled by the device
cgroup that Slurm is creating.

--
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190107/d8035e23/attachment.html>


More information about the slurm-users mailing list