[slurm-users] [External] sbatch: error: memory allocation failure

Prentice Bisbal pbisbal at pppl.gov
Thu Jun 17 19:45:08 UTC 2021


Mike,

You don't include your entire sbatch script, so it's really hard to say 
what's going wrong when we only have a single line to work with. Based 
on what you have told us, I'm guessing you are specifying a memory 
requirement per node greater than 128000. When you specify a nodelist, 
Slurm will assign your job to all of those nodes, not a subset that 
matches the other job specifications (--mem or --mem-per-cpu, or 
--tasks, etc.):

> *-w*, *--nodelist*=</node name list/>
>     Request a specific list of hosts. The job will contain /all/ of
>     these hosts and possibly additional hosts as needed to satisfy
>     resource requirements. 
>

Prentice

On 6/7/21 7:46 PM, Yap, Mike wrote:
>
> Hi All
>
> Can another advise the possibilities of me encountering the error 
> message as below when submitting a job ?
>
> *sbatch: error: memory allocation failure*
>
> The same script use work perfectly fine until I include *#SBATCH 
> --nodelist=(compute[015-046])  (once removed it work as it should)*
>
> The issues
>
>  1. For the current setup, I have specific resources available for
>     each compute node
>      1. (NodeName=compute[007-014] Procs=36 CoresPerSocket=18
>         RealMemory=384000 ThreadsPerCore=1 Boards=1 SocketsPerBoard=2)
>         – newer model
>      2. (NodeName=compute[001-006] Procs=16 CoresPerSocket=18
>         RealMemory=128000 ThreadsPerCore=1 Boards=1 SocketsPerBoard=2)
>  2. I have same resources sharing between multiple queue (working fine)
>  3. When running on parallel job, the exact same job run when assigned
>     to the same node category (ie exclusively on 1a or 1b)
>  4. When running the exact same jobs but assigned between 1a and 1b,
>     the job will run on 1b node but no activities on 1a
>
> Any suggestion
>
> Thanks
>
> Mike
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210617/83ced388/attachment.htm>


More information about the slurm-users mailing list