[slurm-users] Two jobs ends up on one GPU?

Chris Samuel chris at csamuel.org
Wed Jan 16 16:21:04 UTC 2019


Hi Magnus,

On 15/1/19 4:15 am, Magnus Jonsson wrote:

> We have a user that in some way have managed to get both jobs to end up 
> on the same GPU (verified via nvidia-smi).

So this was with nvidia-smi run by root from outside, showing both 
processes on the same GPU and the other with none?  That would be really 
strange and I've not noticed it before.

All I can think would be to check the /proc/$pid/cgroup file of them to 
see what cgroups are set and then go poking around in the cgroup 
filesystem to see what restrictions are set for them.

You don't have Docker installed by some chance either? That could allow 
users to escape their cgroup settings as it can set up its own.

All the best,
Chris
-- 
  Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA



More information about the slurm-users mailing list