[slurm-users] Repost: Odd sacct behavior?

Thu May 3 06:59:50 MDT 2018

Hello all,

I apologize in advance if this message was actually seen by users
oustide of accessing the slurm google groups page;  I didn't see an
actual delivery (probably because the message was marked as SPAM), so
I'm reposting.

Anyways...

I'm just wondering if anyone is able to reproduce the behavior I'm
seeing with `sacct`, or if anyone has experienced it previously.

In a nutshell, I usually can query jobs from specified nodes, similar
to the following:

`sacct -o $OPTIONLIST -N nodename -S START -E END -s r`

Up until today, it has never failed and the results are what I expect.
However, I noticed that when I attempted to query a job from ~10 days
ago using the formula above, I get zero output and the following
message in the slurmdbd log:

error: Problem getting jobs for cluster $CLUSTERNAME

There is no issue querying the job ID directly.

Looking through my history, I've been able to query jobs in
the same exact fashion described above, but usually the jobs I'm looking
at are less than a week old, and output is returned.

If I exclude a nodename or nodelist, and keep the start and end times,
I get results, and no error is returned:

`sacct -o $OPTIONLIST -S START -E END -s r`

I was able to query the DB itself and was able to retrieve information,
so it doesn't appear to be an issue with purged records.  I also
restarted MySQL and the slurmdbd and didn't see any improvement.

So, has anyone else run into a similar issue?

I'm using slurm 16.05.10-2 and slurmdbd 16.05.10-2.  

Thanks,
John DeSantis