[slurm-users] srun: error: io_init_msg_unpack: unpack error
Ole Holm Nielsen
Ole.H.Nielsen at fysik.dtu.dk
Tue Aug 9 06:02:30 UTC 2022
On 09-08-2022 01:11, David Magda wrote:
> On Aug 6, 2022, at 15:13, Chris Samuel <chris at csamuel.org> wrote:
>> It's also safe to restart slurmd's with running jobs, though you may want to drain them before that so slurmctld won't try and send them a job in the middle.
>
> My testing has shown that this is not the case: any jobs that are running are killed with signal 15 if I do a ’systemctl restart slurmd’ or ’service slurmd restart’. Is there some flag in slurm.conf that allows for uninterruption of jobs?
We have never had any issues with restarting slurmd while jobs are
running. AFAIK we don't have to configure anything to obtain this
behavior. We use RPM installation of Slurm, so maybe your /opt/slurm
link is causing problems?
When you jobs get killed as you experienced, what's logged to the node's
slurmd.log and the controller's slurmctld.log?
/Ole
More information about the slurm-users
mailing list