Dear all,

Heads-up for anyone running 25.11.2 with cgroup/v2 + jobacct_gather/cgroup: there's a regression in the recent series that effectively disables memory.peak accounting for every task after the first one per slurmstepd.

The commit is f2d05c08e6 ("cgroup/v2 - Remove static qualifier from variable", ticket 23646, cherry-picked from e1835a1602). It dropped static from memory_peak_interface in cgroup_p_task_get_acct_data while leaving interfaces_checked static. The function now looks like:

bool memory_peak_interface = false;
static bool interfaces_checked = false;

...
if (!interfaces_checked) {
      memory_peak_interface = cgroup_p_has_feature(CG_MEMCG_PEAK);
      interfaces_checked = true;
}
...
if (memory_peak_interface) {
      common_cgroup_get_param(..., "memory.peak", ...);
}

interfaces_checked persists across calls; memory_peak_interface no longer does. So the feature-probe block runs exactly once. From the second call onward memory_peak_interface is a fresh stack-local set to false, and the memory.peak read in the second if is skipped. stats->memory_peak ends up INFINITE64 for every task except the first one ever sampled.

User-visible effect: MaxRSS in sacct stops reflecting the kernel-tracked peak and falls back to polling memory.current — which, at typical jobacct_gather intervals, routinely misses short-lived peaks. Which is of course exactly the problem memory.peak was added to solve (see the 2024 thread that prompted the original implementation: https://groups.google.com/g/slurm-users/c/BX1_4bztrJA).

One-line fix is to put static back on memory_peak_interface. Or, arguably more robust, drop the interfaces_checked cache entirely and run the feature probe on every call — the cost is one access()-equivalent per accounting poll, basically free.

he original v2 implementation in ba34862a40 was great, thanks for that — this looks like a small static-qualifier slip on the way through a cleanup.

Cheers,

Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation