[slurm-users] Jobs exiting together
Alexander Silva
alex.msilva20 at gmail.com
Fri Jan 19 08:36:38 UTC 2024
Recently, i have built an hpc cluster with slurm as workload. The test
jobs with quatum chemistry codes have worked fine. However, production
jobs with lammps have shown an unexpected behavior when the first job
completed, normally or not, cause the termination of the others in the
same compute node. Initially, I thought that was due to mpi malfunction,
but this behavior is algo observed for serial lammps code. The lammps group
said to me that behavior could be generated by slurm. My question to you is
about what parameter in slurm.conf could be responsible for the termination
of the other jobs. I am using an epilogue script that work normally in
another cluster.
Thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20240119/42fc4069/attachment-0001.htm>
More information about the slurm-users
mailing list