These are scary news. I just updated to 23.11.1, but couldn't confirm the problems described so far. I'll do some more extensive and intensive tests. In case of desaster: Does anyone knows how to rollback the DB, as some new DB 'objects' attributes are introduced in 23.11.1. I never had the chance to do this before :-0 As we have support contract I would open a ticket.
-----Original Message----- From: slurm-users slurm-users-bounces@lists.schedmd.com On Behalf Of Ole Holm Nielsen Sent: Tuesday, 30 January 2024 10:04 To: slurm-users@lists.schedmd.com Subject: Re: [slurm-users] after upgrade to 23.11.1 nodes stuck in completion state
On 1/30/24 09:36, Fokke Dijkstra wrote:
We had similar issues with Slurm 23.11.1 (and 23.11.2). Jobs get stuck in a completing state and slurmd daemons can't be killed because they are left in a CLOSE-WAIT state. See my previous mail to the mailing list for the details. And also https://bugs.schedmd.com/show_bug.cgi?id=18561 https://bugs.schedmd.com/show_bug.cgi?id=18561 for another site having issues.
Bug 18561 was submitted by a user with no support contract, so it's unlikely that SchedMD will look into it.
I guess many sites are considering the upgrade to 23.11, and if there is an issue as reported, a site with a valid support contract needs to open a support case. I'm very interested in hearing about any progress with 23.11!
Thanks, Ole