[slurm-users] practical tips to budget cluster expansion for a research center with heterogeneous workloads?

Thu Mar 21 16:38:28 UTC 2019

Hey Graziano,

To make your decision more "data-driven", you can pipe your SLURM
accounting logs into a tool like XDMOD which will make you pie charts of
usage by  user, group, job, gres, etc.

https://open.xdmod.org/8.0/index.html

You may also consider assigning this task to one of your "machine learning"
researchers and ask them to "predict" the resources needed. :)

Regards,
Alex

On Thu, Mar 21, 2019 at 8:48 AM Graziano D'Innocenzo <
graziano.dinnocenzo at adaptcentre.ie> wrote:

> Dear Slurm users,
>
> my team is managing a HPC cluster (running Slurm) for a research
> centre. We are planning to expand the cluster in the next couple of
> years and we are facing a problem. We would like to put a figure on
> how many resources will be needed on average for each user (in terms
> of CPU cores, RAM, GPUs) but we have almost one hundred researchers
> using the cluster for all sorts of different use cases so there isn't
> a typical workload that we could take as a model. Most of the work is,
> however, in the field of machine learning and deep learning. Users go
> all the range from first year PhD students with limited skills to
> researchers and professors with many years of experience.
> In principle we could use a mix of: looking at current usage patterns,
> user surveys, etc.
>
> I was just wondering whether anyone here, working in a similar
> setting, had some sort of guidelines that they have been using for
> budgeting hardware purchases and that they would be willing to share?
>
> Many thanks and regards
>
>
>
> --
> Graziano D'Innocenzo (PGP key: 9213BE46)
> Systems Administrator - ADAPT Centre
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190321/918c1408/attachment.html>