[slurm-users] Jobs exiting together

Alexander Silva alex.msilva20 at gmail.com
Fri Jan 19 08:36:38 UTC 2024


Recently,  i have built an hpc cluster with slurm as workload. The test
jobs with quatum chemistry codes have worked fine. However,  production
jobs with lammps have shown an unexpected behavior when the first job
completed, normally or not,  cause the termination of the others in the
same compute node. Initially,  I thought that was due to mpi malfunction,
but this behavior is algo observed for serial lammps code. The lammps group
said to me that behavior could be generated by slurm. My question to you is
about what parameter in slurm.conf could be responsible for the termination
of the other jobs. I am using an epilogue script that work normally in
another cluster.

Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240119/42fc4069/attachment-0001.htm>


More information about the slurm-users mailing list