[slurm-users] [slurm-us> , -- , Ole Holm Nielsen, PhD, Senior HPC Officer, Department of Physics, Technical University of Denmark, , Building 307, DK-2800 Kongens Lyngby, Denmark, E-mail: Ole.H.Nielsen at fysik.dtu.dk , Homepage: http://dcwww.fysik.dtu.dk/~ohnielse/, Tel: (+45) 4525 3187 / Mobile (+45) 5180 1620 ers] Having Issue in Slurm cluster setup

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Apr 9 10:14:42 UTC 2019


Your slurm.conf line doesn't specify the node's physical memory:

NodeName=ozd2485u Gres=gpu:2 Sockets=2 CoresPerSocket=14 
ThreadsPerCore=2 State=UNKNOWN

See "man slurm.conf":

    RealMemory
               Size of real memory on the node in megabytes (e.g. 
"2048").  The default value is 1.

On 4/9/19 8:47 AM, sudhagar s wrote:
> Attaching my slurm.conf file. can you please help me to find the issue.
> 
> On Tue, Apr 9, 2019 at 12:08 PM Ole Holm Nielsen 
> <Ole.H.Nielsen at fysik.dtu.dk <mailto:Ole.H.Nielsen at fysik.dtu.dk>> wrote:
> 
>     On 09-04-2019 08:33, sudhagar s wrote:
>      > Thanks Ole,
>      >
>      > when i give "scontrol show node" it list down the details. where
>     i can
>      > see RealMemory=1 is this will be a problem?
> 
>     In your "scontrol show node" image I read RealMemory=1 (units of MB)
>     and
>     mem=1M.  I think you configured slurm.conf incorrectly.
> 
>      > On Tue, Apr 9, 2019 at 11:53 AM Ole Holm Nielsen
>      > <Ole.H.Nielsen at fysik.dtu.dk <mailto:Ole.H.Nielsen at fysik.dtu.dk>
>     <mailto:Ole.H.Nielsen at fysik.dtu.dk
>     <mailto:Ole.H.Nielsen at fysik.dtu.dk>>> wrote:
>      >
>      >     On 09-04-2019 07:37, sudhagar s wrote:
>      >      > Hi, Iam newbee in slurm. trying to setup a cluster for ML
>     training
>      >      > purpose. i created controle node and compute node. both are up
>      >     and running.
>      >      >
>      >      > when i enter "srun -N 1 hostname" it says
>      >      > " srun error memory specification can not be satisfied"
>      >      > "unable to allocate resources: requested node
>     configuration is not
>      >      > available"
>      >      >
>      >      > how to fix this?
>      >
>      >     Probably you made some errors in configuring slurm.conf. 
>     Look at your
>      >     NodeName and PartitionName definitions to figure out why the
>     resources
>      >     are incorrect.
>      >
>      >     /Ole



More information about the slurm-users mailing list