[slurm-users] CPU & memory usage summary for a job

Paul Edmon pedmon at cfa.harvard.edu
Sun Dec 9 08:22:59 MST 2018


This is the idea behind XDMod's SUPReMM.  It does generate a ton of data 
though, so it does not scale to very active systems (i.e. churning over 
tens of thousands of jobs).

https://github.com/ubccr/xdmod-supremm

-Paul Edmon-


On 12/9/2018 8:39 AM, Aravindh Sampathkumar wrote:
> Hi All.
>
> I was wondering if anybody has thought of or hacked around a way to 
> record CPU and memory consumption of a job during its entire duration 
> and give a summary of the usage pattern within that job?
> Not the MaxRSS and CPU Time that already gets reported for every job.
>
> I'm thinking more like a chart of CPU utilisation, memory usage, and 
> disk usage on a per second basis or something like that.
>
> Asking because some of my users have no clue about the resource 
> consumption of their jobs, and just blindly ask for way more resources 
> as "safe" option. It would be a nice way for users to know simple 
> things like - they asked for 8 cores, but their job ran on just 1 core 
> the entire time because a library they used is single core limited.
> We use Cgroups for process accounting and limiting job's cpu and 
> memory usage. We also use QoS for limiting resource reservations at 
> user level.
>
> --
>   Aravindh Sampathkumar
>   aravindh at fastmail.com
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181209/438eb893/attachment.html>


More information about the slurm-users mailing list