[slurm-users] CPU & memory usage summary for a job

Jacob Jenson jacob at schedmd.com
Mon Dec 10 09:50:02 MST 2018


Would job profiling with HDF5 work as well?
https://slurm.schedmd.com/hdf5_profile_user_guide.html

Jacob


On Sun, Dec 9, 2018 at 4:17 PM Sam Hawarden <sam.hawarden at otago.ac.nz>
wrote:

> Hi Aravindh
>
> For our small 3 node cluster I've hacked together a per-node python script
> that collects current and peak cpu, memory and scratch disk usage data on
> all jobs running on the cluster and builds a fairly simple web-page based
> on it. It shouldn't be hard to make it store those data points over time,
> then shove them through an R script to plot the usage:
>
> https://github.com/shawarden/simple-web​
>
> Cheers,
>   Sam
>
> ------------------------------
> Sam Hawarden
> Assistant Research Fellow
> Pathology Department
> Dunedin School of Medicine
> ------------------------------
> *From:* slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of
> Aravindh Sampathkumar <aravindh at fastmail.com>
> *Sent:* Monday, 10 December 2018 02:39
> *To:* slurm-users at lists.schedmd.com
> *Subject:* [slurm-users] CPU & memory usage summary for a job
>
> Hi All.
>
> I was wondering if anybody has thought of or hacked around a way to record
> CPU and memory consumption of a job during its entire duration and give a
> summary of the usage pattern within that job?
> Not the MaxRSS and CPU Time that already gets reported for every job.
>
> I'm thinking more like a chart of CPU utilisation, memory usage, and disk
> usage on a per second basis or something like that.
>
> Asking because some of my users have no clue about the resource
> consumption of their jobs, and just blindly ask for way more resources as
> "safe" option. It would be a nice way for users to know simple things like
> - they asked for 8 cores, but their job ran on just 1 core the entire time
> because a library they used is single core limited.
> We use Cgroups for process accounting and limiting job's cpu and memory
> usage. We also use QoS for limiting resource reservations at user level.
>
> --
>   Aravindh Sampathkumar
>   aravindh at fastmail.com
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20181210/10b5f4cc/attachment.html>


More information about the slurm-users mailing list