[slurm-users] Query about Compute + GPUs

Ing. Gonzalo E. Arroyo garroyo at ifimar-conicet.gob.ar
Tue Nov 21 09:38:48 MST 2017

I have a problem detecting RAM and Arch (maybe some more), check this...

NodeName=fisesta-21-3 Arch=x86_64 CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=2 CPULoad=0.01
   NodeAddr= NodeHostName=fisesta-21-3 Version=16.05
   OS=Linux RealMemory=3950 AllocMem=0 FreeMem=0 Sockets=2 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=259967 Weight=20479797 Owner=N/A
   BootTime=2017-10-30T16:39:22 SlurmdStartTime=2017-11-06T16:46:54
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

NodeName=fisesta-21-3-cpus CoresPerSocket=1
   CPUAlloc=0 CPUErr=0 CPUTot=6 CPULoad=0.01
   NodeAddr= NodeHostName=fisesta-21-3-cpus Version=(null)
   RealMemory=1 AllocMem=0 FreeMem=0 Sockets=6 Boards=1
   State=IDLE ThreadsPerCore=1 TmpDisk=0 Weight=20483797 Owner=N/A
   BootTime=None SlurmdStartTime=None
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

For your problem, please share the important lines of nodes and partitions,
you should check your users have permission to run inside very partition /
node splitted by this new configuration

*Este mensaje es confidencial. Puede contener información amparada por el
secreto comercial. Si usted ha recibido este e-mail por error, deberá
eliminarlo de su sistema. No deberá copiar el mensaje ni divulgar su
contenido a ninguna persona. Muchas gracias.*
This message is confidential. It may also contain information that is
privileged or not authorized to be disclosed. If you have received it by
mistake, delete it from your system. You should not copy the messsage nor
disclose its contents to anyone. Thanks.

El mar., 21 de nov. de 2017 a la(s) 11:05, Markus Köberl <
markus.koeberl at tugraz.at> escribió:

> On Tuesday, 21 November 2017 10:26:53 CET Merlin Hartley wrote:
> > Could you give us your submission command?
> > It may be that you are requesting the wrong partition - i.e. relying on
> the
> > default partition selection… try with “--partition cpu”
> I run the following commands:
> srun --gres=gpu --mem-per-cpu="5G" -w gpu1 --pty /bin/bash
> -> works, partition gpu
> srun --mem-per-cpu="5G" -p cpu --pty /bin/bash
> -> works, I get a slot on another node which has only one NodeName entry.
> srun --mem-per-cpu="5G" -p cpu -w gpu1-cpu --pty /bin/bash
> -> error: Invalid job credential...
> srun --mem-per-cpu="5G" -p cpu -w gpu1 --pty /bin/bash
> -> error not in partition...
> I am using the following options:
> EnforcePartLimits=ANY
> GresTypes=gpu
> JobSubmitPlugins=all_partitions
> ProctrackType=proctrack/cgroup
> ReturnToService=2
> TaskPlugin=task/cgroup
> TrackWCKey=yes
> InactiveLimit=3600
> KillWait=1800
> MinJobAge=600
> OverTimeLimit=600
> SlurmctldTimeout=120
> SlurmdTimeout=300
> Waittime=0
> DefMemPerCPU=1000
> FastSchedule=1
> SchedulerType=sched/backfill
> SelectType=select/cons_res
> SelectTypeParameters=CR_Core_Memory
> PriorityType=priority/multifactor
> PriorityDecayHalfLife=7-0
> PriorityFavorSmall=YES
> PriorityWeightAge=50
> PriorityWeightFairshare=25
> PriorityWeightJobSize=50
> PriorityWeightPartition=100
> PriorityWeightTRES=CPU=1000,Mem=2000,Gres/gpu=3000
> AccountingStorageEnforce=associations,limits,qos,WCKey
> AccountingStorageType=accounting_storage/slurmdbd
> AccountingStoreJobComment=YES
> AccountingStorageTRES=CPU,Mem,Gres/gpu
> JobAcctGatherFrequency=30
> JobAcctGatherType=jobacct_gather/cgroup
> regards
> Markus Köberl
> --
> Markus Koeberl
> Graz University of Technology
> Signal Processing and Speech Communication Laboratory
> E-mail: markus.koeberl at tugraz.at
> --
Ing. Gonzalo Arroyo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171121/ac3698ff/attachment-0001.html>

More information about the slurm-users mailing list