[slurm-users] Memory allocation error
Mahmood Naderan
mahmood.nt at gmail.com
Wed Mar 14 02:37:19 MDT 2018
Hi again
I tried with --mem=2000M in the slurm script and put strace command in
front of g09. Please see some last lines
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
fstat(0, {st_mode=S_IFREG|0664, st_size=6542, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fc647a3f000
read(0, "%nprocshared=2\r\n%mem=1GB\r\n# mp2/"..., 8192) = 6542
lseek(3, 0, SEEK_CUR) = 0
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) =
0x7fc647a3d000
fstat(3, {st_mode=S_IFREG|0664, st_size=0, ...}) = 0
lseek(3, 0, SEEK_SET) = 0
read(0, "", 8192) = 0
write(3, "%nprocshared=2\n%mem=1GB\n# mp2/ge"..., 5668) = 5668
close(3) = 0
munmap(0x7fc647a3d000, 8192) = 0
geteuid() = 1000
stat("/usr/local/chem/g09-64-D01/l1.exe", {st_mode=S_IFREG|0751,
st_size=1673376, ...}) = 0
write(1, " Entering Gaussian System, Link "..., 212) = 212
rt_sigaction(SIGINT, {SIG_IGN, [], SA_RESTORER, 0x7fc646f69270}, {SIG_DFL,
[], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER, 0x7fc646f69270}, {SIG_DFL,
[], 0}, 8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD,
parent_tidptr=0x7fffe75ed3b0) = 2818
wait4(2818, galloc: could not allocate memory.: Cannot allocate memory
[{WIFSIGNALED(s) && WTERMSIG(s) == SIGSEGV}], 0, NULL) = 2818
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x7fc646f69270}, NULL, 8) =
0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x7fc646f69270}, NULL, 8)
= 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=2818,
si_uid=1000, si_status=SIGSEGV, si_utime=9, si_stime=4} ---
access("/home/mahmood/Gaussian/scratch/Gau-2817.inp", F_OK) = 0
unlink("/home/mahmood/Gaussian/scratch/Gau-2817.inp") = 0
exit_group(1) = ?
+++ exited with 1 +++
I think that slurm wrongly detect/set the memory requirements. Maybe it put
a limit and therefore g09 is unable to allocate the required space. I say
that because, I can directly ssh to that node and run the program with no
error.
Any idea?
Regards,
Mahmood
On Wed, Mar 14, 2018 at 1:44 AM, Mahmood Naderan <mahmood.nt at gmail.com>
wrote:
> Excuse me, but it doesn't work. I set --mem to 2GB and I put free
> command in the script. I don't know why it failed.
>
> [mahmood at rocks7 ~]$ sbatch sl.sh
> Submitted batch job 19
> [mahmood at rocks7 ~]$ squeue
> JOBID PARTITION NAME USER ST TIME NODES
> NODELIST(REASON)
> [mahmood at rocks7 ~]$ cat test.out
> compute-0-1.local
> total used free shared buff/cache
> available
> Mem: 7.6G 127M 6.8G 8.5M 729M
> 7.3G
> Swap: 2.4G 0B 2.4G
> galloc: could not allocate memory.: Cannot allocate memory
> [mahmood at rocks7 ~]$ head -n 4 test.gjf
> %nprocshared=2
> %mem=1GB
> # mp2/gen pseudo=read opt freq
>
> [mahmood at rocks7 ~]$ cat sl.sh
> #!/bin/bash
> #SBATCH --output=test.out
> #SBATCH --job-name=ga-test
> #SBATCH --nodelist=compute-0-1
> #SBATCH --ntasks=1
> #SBATCH --cpus-per-task=2
> #SBATCH --mem=2GB
> hostname
> free -mh
> g09 test.gjf
> [mahmood at rocks7 ~]$
> Regards,
> Mahmood
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180314/0156ed6c/attachment.html>
More information about the slurm-users
mailing list