[slurm-users] Memory allocation error

Mahmood Naderan mahmood.nt at gmail.com
Wed Mar 14 02:37:19 MDT 2018


Hi again
I tried with --mem=2000M in the slurm script and put strace command in
front of g09. Please see some last lines


fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fstat(0, {st_mode=S_IFREG|0664, st_size=6542, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fc647a3f000
read(0, "%nprocshared=2\r\n%mem=1GB\r\n# mp2/"..., 8192) = 6542
lseek(3, 0, SEEK_CUR)                   = 0
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fc647a3d000
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lseek(3, 0, SEEK_SET)                   = 0
read(0, "", 8192)                       = 0
write(3, "%nprocshared=2\n%mem=1GB\n# mp2/ge"..., 5668) = 5668
close(3)                                = 0
munmap(0x7fc647a3d000, 8192)            = 0
geteuid()                               = 1000
stat("/usr/local/chem/g09-64-D01/l1.exe", {st_mode=S_IFREG|0751,
st_size=1673376, ...}) = 0
write(1, " Entering Gaussian System, Link "..., 212) = 212
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fc646f69270}, {SIG_DFL,
[], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fc646f69270}, {SIG_DFL,
[], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD,
parent_tidptr=0x7fffe75ed3b0) = 2818
wait4(2818, galloc:  could not allocate memory.: Cannot allocate memory
[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], 0, NULL) = 2818
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fc646f69270}, NULL, 8) =
0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fc646f69270}, NULL, 8)
= 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2818,
si_uid=1000, si_status=SIGSEGV, si_utime=9, si_stime=4} ---
access("/home/mahmood/Gaussian/scratch/Gau-2817.inp", F_OK) = 0
unlink("/home/mahmood/Gaussian/scratch/Gau-2817.inp") = 0
exit_group(1)                           = ?
+++ exited with 1 +++



I think that slurm wrongly detect/set the memory requirements. Maybe it put
a limit and therefore g09 is unable to allocate the required space. I say
that because, I can directly ssh to that node and run the program with no
error.

Any idea?



Regards,
Mahmood



On Wed, Mar 14, 2018 at 1:44 AM, Mahmood Naderan <mahmood.nt at gmail.com>
wrote:

> Excuse me, but it doesn't work. I set --mem to 2GB and I put free
> command in the script. I don't know why it failed.
>
> [mahmood at rocks7 ~]$ sbatch sl.sh
> Submitted batch job 19
> [mahmood at rocks7 ~]$ squeue
>              JOBID PARTITION     NAME     USER ST       TIME  NODES
> NODELIST(REASON)
> [mahmood at rocks7 ~]$ cat test.out
> compute-0-1.local
>               total        used        free      shared  buff/cache
>  available
> Mem:           7.6G        127M        6.8G        8.5M        729M
> 7.3G
> Swap:          2.4G          0B        2.4G
> galloc:  could not allocate memory.: Cannot allocate memory
> [mahmood at rocks7 ~]$ head -n 4 test.gjf
> %nprocshared=2
> %mem=1GB
> # mp2/gen pseudo=read opt freq
>
> [mahmood at rocks7 ~]$ cat sl.sh
> #!/bin/bash
> #SBATCH --output=test.out
> #SBATCH --job-name=ga-test
> #SBATCH --nodelist=compute-0-1
> #SBATCH --ntasks=1
> #SBATCH --cpus-per-task=2
> #SBATCH --mem=2GB
> hostname
> free -mh
> g09 test.gjf
> [mahmood at rocks7 ~]$
> Regards,
> Mahmood
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180314/0156ed6c/attachment.html>


More information about the slurm-users mailing list