[slurm-users] CPU & memory usage summary for a job

Renfro, Michael Renfro at tntech.edu
Sun Dec 9 09:21:22 MST 2018


For the simpler questions (for the overall job step, not real-time), you can 'sacct --format=all’ to get data on completed jobs, and then:

- compare the MaxRSS column to the ReqMem column to see how far off their memory request was
- compare the TotalCPU column to the product of the NCPUS and ElapsedRaw to see how far off their core request was

> On Dec 9, 2018, at 7:39 AM, Aravindh Sampathkumar <aravindh at fastmail.com> wrote:
> 
> Hi All.
> 
> I was wondering if anybody has thought of or hacked around a way to record CPU and memory consumption of a job during its entire duration and give a summary of the usage pattern within that job? 
> Not the MaxRSS and CPU Time that already gets reported for every job. 
> 
> I'm thinking more like a chart of CPU utilisation, memory usage, and disk usage on a per second basis or something like that. 
> 
> Asking because some of my users have no clue about the resource consumption of their jobs, and just blindly ask for way more resources as "safe" option. It would be a nice way for users to know simple things like - they asked for 8 cores, but their job ran on just 1 core the entire time because a library they used is single core limited. 
> We use Cgroups for process accounting and limiting job's cpu and memory usage. We also use QoS for limiting resource reservations at user level. 
> 
> --
>   Aravindh Sampathkumar
>   aravindh at fastmail.com



More information about the slurm-users mailing list