[slurm-users] practical tips to budget cluster expansion for a research center with heterogeneous workloads?
alex at calicolabs.com
Thu Mar 21 16:38:28 UTC 2019
To make your decision more "data-driven", you can pipe your SLURM
accounting logs into a tool like XDMOD which will make you pie charts of
usage by user, group, job, gres, etc.
You may also consider assigning this task to one of your "machine learning"
researchers and ask them to "predict" the resources needed. :)
On Thu, Mar 21, 2019 at 8:48 AM Graziano D'Innocenzo <
graziano.dinnocenzo at adaptcentre.ie> wrote:
> Dear Slurm users,
> my team is managing a HPC cluster (running Slurm) for a research
> centre. We are planning to expand the cluster in the next couple of
> years and we are facing a problem. We would like to put a figure on
> how many resources will be needed on average for each user (in terms
> of CPU cores, RAM, GPUs) but we have almost one hundred researchers
> using the cluster for all sorts of different use cases so there isn't
> a typical workload that we could take as a model. Most of the work is,
> however, in the field of machine learning and deep learning. Users go
> all the range from first year PhD students with limited skills to
> researchers and professors with many years of experience.
> In principle we could use a mix of: looking at current usage patterns,
> user surveys, etc.
> I was just wondering whether anyone here, working in a similar
> setting, had some sort of guidelines that they have been using for
> budgeting hardware purchases and that they would be willing to share?
> Many thanks and regards
> Graziano D'Innocenzo (PGP key: 9213BE46)
> Systems Administrator - ADAPT Centre
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the slurm-users