<div dir="auto"><div>This may be due because of this commit :<div dir="auto"><a href="https://github.com/SchedMD/slurm/commit/ee2813870fed48827aa0ec99e1b4baeaca710755" rel="noreferrer noreferrer" target="_blank">https://github.com/SchedMD/slurm/commit/ee2813870fed48827aa0ec99e1b4baeaca710755</a><br></div><div dir="auto"><br></div>It seems that the behavior was changed from a fatal error to something different when requesting cgroup devices on in cgroup.conf without the proper conf file.</div><div dir="auto"><br></div><div dir="auto">If you do not really need to constrain devices then remove the constrain devices=yes.</div><div dir="auto"><br></div><div dir="auto">Regards</div><div dir="auto"><div class="gmail_extra" dir="auto"><br><div class="gmail_quote">Le 1 nov. 2018 6:51 PM, "Bas van der Vlies" <<a href="mailto:bas.vandervlies@surfsara.nl" rel="noreferrer noreferrer" target="_blank">bas.vandervlies@surfsara.nl</a>> a écrit :<br type="attribution"><blockquote class="m_-3461361175065539951m_-5462461812281280260quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Oke if we change:<br>
* TaskPlugin=task/affinity,task/cgroup<br>
<br>
to:<br>
* TaskPlugin=task/affinity<br>
<br>
The pmi2 interface works. Investigating this further<div class="m_-3461361175065539951m_-5462461812281280260elided-text"><br>
<br>
On 31/10/2018 08:26, Bas van der Vlies wrote:<br>
> I am busy with migrating from Torque/Moab to SLURM.<br>
> <br>
> I have installed slurm 18.03 and trying to run an mpi program woth the <br>
> pmi2 interface.<br>
> <br>
> {{{<br>
> ~/mpitest> srun --mpi=list<br>
> srun: MPI types are...<br>
> srun: none<br>
> srun: openmpi<br>
> srun: pmi2<br>
> }}}<br>
> <br>
> The none and openmpi interface works but the pmi2 interface crashes the <br>
> slurmstepd. Have I missed some setting or is this a bug?<br>
> <br>
> {{{<br>
> (gdb) thread apply all bt<br>
> <br>
> Thread 6 (Thread 0x2b9ce9b8b700 (LWP 21945)):<br>
> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225<br>
> #1 0x00002b9ce5c7862b in ?? () from <br>
> /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so<br>
> #2 0x00002b9ce6c08494 in start_thread (arg=0x2b9ce9b8b700) at <br>
> pthread_create.c:333<br>
> #3 0x00002b9ce6f06acf in clone () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97<br>
> <br>
> Thread 5 (Thread 0x2b9ce9c8c700 (LWP 21946)):<br>
> #0 0x00002b9ce6efd67d in poll () at ../sysdeps/unix/syscall-template.S:84<br>
> #1 0x00002b9ce5d16cfb in slurm_eio_handle_mainloop () from <br>
> /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so<br>
> #2 0x00005631c29f69f6 in ?? ()<br>
> #3 0x00002b9ce6c08494 in start_thread (arg=0x2b9ce9c8c700) at <br>
> pthread_create.c:333<br>
> #4 0x00002b9ce6f06acf in clone () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97<br>
> <br>
> Thread 4 (Thread 0x2b9ceaedb700 (LWP 21948)):<br>
> #0 0x00002b9ce6efd67d in poll () at ../sysdeps/unix/syscall-template.S:84<br>
> #1 0x00002b9cea2a8f52 in ?? () from <br>
> /usr/lib/x86_64-linux-gnu/slurm//task_cgroup.so<br>
> #2 0x00002b9ce6c08494 in start_thread (arg=0x2b9ceaedb700) at <br>
> pthread_create.c:333<br>
> #3 0x00002b9ce6f06acf in clone () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97<br>
> <br>
> Thread 3 (Thread 0x2b9ceadda700 (LWP 21947)):<br>
> #0 0x00002b9ce6efd67d in poll () at ../sysdeps/unix/syscall-template.S:84<br>
> #1 0x00002b9ce5d16cfb in slurm_eio_handle_mainloop () from <br>
> /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so<br>
> #2 0x00002b9ceaac7355 in ?? () from <br>
> /usr/lib/x86_64-linux-gnu/slurm//mpi_pmi2.so<br>
> #3 0x00002b9ce6c08494 in start_thread (arg=0x2b9ceadda700) at <br>
> pthread_create.c:333<br>
> #4 0x00002b9ce6f06acf in clone () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97<br>
> <br>
> Thread 2 (Thread 0x2b9ce5ae0700 (LWP 21944)):<br>
> #0 pthread_cond_wait@@GLIBC_2.3.2 () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185<br>
> #1 0x00002b9ce5c7e65d in ?? () from <br>
> /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so<br>
> #2 0x00002b9ce6c08494 in start_thread (arg=0x2b9ce5ae0700) at <br>
> pthread_create.c:333<br>
> #3 0x00002b9ce6f06acf in clone () at <br>
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:97<br>
> <br>
> Thread 1 (Thread 0x2b9ce59dd080 (LWP 21943)):<br>
> #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51<br>
> #1 0x00002b9ce6e5242a in __GI_abort () at abort.c:89<br>
> #2 0x00002b9ce6e8ec00 in __libc_message (do_abort=do_abort@entry=2, <br>
> fmt=fmt@entry=0x2b9ce6f83d98 "*** Error in `%s': %s: 0x%s ***\n")<br>
> at ../sysdeps/posix/libc_fatal.c:175<br>
> #3 0x00002b9ce6e94fc6 in malloc_printerr (action=3, str=0x2b9ce6f8094a <br>
> "free(): invalid pointer", ptr=<optimized out>,<br>
> ar_ptr=<optimized out>) at malloc.c:5049<br>
> #4 0x00002b9ce6e9580e in _int_free (av=0x2b9ce71b7b00 <main_arena>, <br>
> p=0x2b9ce71bba60 <lock>, have_lock=0) at malloc.c:3905<br>
> #5 0x00002b9ce5d1084d in slurm_xfree () from <br>
> /usr/lib/x86_64-linux-gnu/slurm/libslurmfull.so<br>
> #6 0x00002b9cea2ab0b0 in task_cgroup_devices_create () from <br>
> /usr/lib/x86_64-linux-gnu/slurm//task_cgroup.so<br>
> #7 0x00002b9cea2a5977 in task_p_pre_setuid () from <br>
> /usr/lib/x86_64-linux-gnu/slurm//task_cgroup.so<br>
> #8 0x00005631c2a04216 in task_g_pre_setuid ()<br>
> #9 0x00005631c29e713d in ?? ()<br>
> #10 0x00005631c29ec3f4 in job_manager ()<br>
> #11 0x00005631c29e9374 in main ()<br>
> }}}}<br>
> <br>
> <br>
> <br>
<br>
-- <br>
--<br>
Bas van der Vlies<br>
| Operations, Support & Development | SURFsara | Science Park 140 | 1098 <br>
XG Amsterdam<br>
| T +31 (0) 20 800 1300 | <a href="mailto:bas.vandervlies@surfsara.nl" rel="noreferrer noreferrer noreferrer" target="_blank">bas.vandervlies@surfsara.nl</a> | <a href="http://www.surfsara.nl" rel="noreferrer noreferrer noreferrer noreferrer" target="_blank">www.surfsara.nl</a> |<br>
<br>
</div></blockquote></div><br></div></div></div>