[slurm-users] Segfault with 32 processes, OK with 30 ???
Diego Zuccato
diego.zuccato at unibo.it
Mon Oct 5 11:05:15 UTC 2020
Hello all.
I'm seeing (again) this weird issue.
The same executable, launched with 32 processes crashes immediately,
while it runs flawlessy with only 30 processes.
The reported error is:
[str957-bl0-03:05271] *** Process received signal ***
[str957-bl0-03:05271] Signal: Segmentation fault (11)
[str957-bl0-03:05271] Signal code: Address not mapped (1)
[str957-bl0-03:05271] Failing at address: 0x7f3826fb4008
[str957-bl0-03:05271] [ 0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730)[0x7f3825df6730]
[str957-bl0-03:05271] [ 1]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x2936)[0x7f3824553936]
[str957-bl0-03:05271] [ 2]
/usr/lib/x86_64-linux-gnu/libmca_common_dstore.so.1(pmix_common_dstor_init+0x9d3)[0x7f382452a733]
[str957-bl0-03:05271] [ 3]
/usr/lib/x86_64-linux-gnu/pmix/lib/pmix/mca_gds_ds21.so(+0x25b4)[0x7f38245535b4]
[str957-bl0-03:05271] [ 4]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_gds_base_select+0x12e)[0x7f382467946e]
[str957-bl0-03:05271] [ 5]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(pmix_rte_init+0x8cd)[0x7f382463188d]
[str957-bl0-03:05271] [ 6]
/usr/lib/x86_64-linux-gnu/libpmix.so.2(PMIx_Init+0xdc)[0x7f38245edd7c]
[str957-bl0-03:05271] [ 7]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pmix_ext2x.so(ext2x_client_init+0xc4)[0x7f38246e9fe4]
[str957-bl0-03:05271] [ 8]
/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_ess_pmi.so(+0x2656)[0x7f3826fb9656]
[str957-bl0-03:05271] [ 9]
/usr/lib/x86_64-linux-gnu/libopen-rte.so.40(orte_init+0x29a)[0x7f3825b8011a]
[str957-bl0-03:05271] [10]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(ompi_mpi_init+0x252)[0x7f3825e50e62]
[str957-bl0-03:05271] [11]
/usr/lib/x86_64-linux-gnu/libmpi.so.40(MPI_Init+0x6e)[0x7f3825e7f17e]
[str957-bl0-03:05271] [12] ./C-GenIC(+0x23b9)[0x55bf9fa8e3b9]
[str957-bl0-03:05271] [13]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7f3825c4709b]
[str957-bl0-03:05271] [14] ./C-GenIC(+0x251a)[0x55bf9fa8e51a]
[str957-bl0-03:05271] *** End of error message ***
In the past, just installing gdb to try to debug it made the problem
disappear: obviously it was not a solution...
Any hint?
TIA
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
More information about the slurm-users
mailing list