[slurm-users] srun: error: io_init_msg_unpack: unpack error

Ole Holm Nielsen Ole.H.Nielsen at fysik.dtu.dk
Tue Aug 9 06:02:30 UTC 2022


On 09-08-2022 01:11, David Magda wrote:
> On Aug 6, 2022, at 15:13, Chris Samuel <chris at csamuel.org> wrote:
>> It's also safe to restart slurmd's with running jobs, though you may want to drain them before that so slurmctld won't try and send them a job in the middle.
> 
> My testing has shown that this is not the case: any jobs that are running are killed with signal 15 if I do a ’systemctl restart slurmd’ or ’service slurmd restart’. Is there some flag in slurm.conf that allows for uninterruption of jobs?

We have never had any issues with restarting slurmd while jobs are 
running.  AFAIK we don't have to configure anything to obtain this 
behavior.  We use RPM installation of Slurm, so maybe your /opt/slurm 
link is causing problems?

When you jobs get killed as you experienced, what's logged to the node's 
slurmd.log and the controller's slurmctld.log?

/Ole



More information about the slurm-users mailing list