[slurm-users] Slurm statesave directory -- location and management

Wed Aug 28 15:00:46 UTC 2019

The save state location is where slurm stores its current information 
about jobs.  That location is the live data of the cluster and is what 
allows it to survive restarts of the slurmctld.  The slurmdbd is almost 
live information and is not used by the slurmctld for current job 
state.  Thus if the data in the save state location is corrupt or old, 
slurm will either purge those jobs not in that location or the slurmctld 
will fail to start.

Given the critical nature of the save state location, we taken the step 
of rsyncing it to a different location periodically. However, be advised 
once you restart the slurmctld with a specific save state it will 
rectify all the running jobs.  So you have one shot to get it right.  So 
if you suspect that something happened to your save state info, don't 
restart the slurmd's or slurmctld until you think you have a good copy, 
else the slurmctld will look at the save state you gave it and then look 
at the jobs it sees on the slurmd and then rectify the two (namely 
dropping jobs it doesn't see in both places).

In general this is one of the reasons we have opted to not do HA with 
slurm but rather just rely on a single box with backups of the slurm 
save state.  Our save state is on a local drive, which also helps with 
speed.  Our backup is rsynced to our isilon and snapshotted.  That said 
if I had to go back to one of those snapshots, I know for a fact that I 
would be losing jobs.  There really isn't much for it at that point.

-Paul Edmon-

On 8/28/19 10:49 AM, David Baker wrote:
> Hello,
>
> I apologise that this email is a bit vague, however we are keen to 
> understand the role of the Slurm "StateSave" location. I can see the 
> value of the information in this location when, for example, we are 
> upgrading Slurm and the database is temporarily down, however as I 
> note above we are keen to gain a much better understanding of this 
> directory.
>
> We have two Slurm controller nodes (one of them is a backup 
> controller), and currently we have put the "StateSave" directory on 
> one of the global GPFS file stores. In other respects Slurm operates 
> independently of the GPFS file stores -- apart from the fact that if 
> GPFS fails jobs will subsequently fail. There was a GPFS failure when 
> I was away from the university. Once GPFS had been restored they 
> attempted to start Slurm, however the StateSave data was out of date. 
> They eventually restarted Slurm, however lost all the queued jobs and 
> the job sequence counter restarted at one.
>
> Am I correct in thinking the the information in the StateSave location 
> relates to the state of (a) jobs currently running on the cluster and 
> (b) jobs queued? Am I also correct in thinking that this information 
> is not stored in the slurm database? In other words if you lose the 
> statesave data or it gets corrupted then you will lose all 
> running/queued jobs?
>
> Any advice on the management and location of the statesave directory 
> in a dual controller system would be appreciated, please.
>
> Best regards,
> David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190828/9a4f3b50/attachment-0001.htm>