Hi,
With Slurm 24.11.5 for some jobs I am seeing differences between the memory usage reported by 'seff' and that shown by Prometheus as 'cgroup_memory_rss_bytes' (and ultimately reported by 'jobstats' [1]). Certainly at the University of Delft they seem to feel that the memory usage reported by 'seff' is unreliable [2].
Is that indeed the case?
Cheers,
Loris
Footnotes: [1] https://github.com/PrincetonUniversity/jobstats [2] https://doc.dhpc.tudelft.nl/delftblue/Slurm-trouble-shooting/