[slurm-users] Job cannot start on slurm v18.08.0pre2

Artem Polyakov artpol84 at gmail.com
Tue Aug 21 16:02:28 MDT 2018


Hello,

I can try to tell from PMIx/UCX perspective.
Do you have "MPI=pmix" parameter in your slurm.conf or have you specified
"--mpi=pmix" in your srun command? If not - you are not running PMIx and
thus UCX (UCX support is only in the PMIx plugin).
I think this is confirmed by the log output that you have provided, I don't
see any traces of PMIx plugin.

пт, 17 авг. 2018 г. в 20:43, zhangtao102019 at 126.com <zhangtao102019 at 126.com
>:

> Hi,
> I have installed SLURM 18.08.0-0pre2 on a my cluster based on RHEL7.4
> (x86_64).
> My configure parameters likes this:
> ./configure --prefix=/opt/slurm17 --with-munge=/opt/munge
> --with-pmix=/opt/pmix --with-ucx=/opt/openucx --with-hwloc=/usr
> (openucx version is 1.5.0, pmix version is 3.0.0, hwloc version is 1.11.8)
>
> After completing the installation and configuration, it looks like slurm
> is working normally. But when I submitted a simple test job with sbatch
> sleep.sh(just call srun sleep 30 at single computing node), I found that
> the job (ID=1032) state was R, but the job did not start normally on the
> computation node (no process found).
>
> The appendix is the output log of the computing node of the management
> node.
> I can't tell if the cause of this problem is related to the compilation
> parameters I specify (such as pmix, ucx), and I've never seen anything
> similar in earlier versions.
> Has anyone ever responded to a similar phenomenon with me? How to solve
> the problem?
>
> Best regards
>
> ------------------------------
> zhangtao102019 at 126.com
>


-- 
С Уважением, Поляков Артем Юрьевич
Best regards, Artem Y. Polyakov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180821/fd7c9e38/attachment-0001.html>


More information about the slurm-users mailing list