We have a node with 8 H100 GPUs that are split into MIG instances. We are using cgroups. This seems to work fine. Users can do something like
sbatch --gres="gpu:1g.10gb:1"...
and the job starts on the node with the gpus and cuda visible devices and the pytorch debug shows that the cgroup only gives them the gpu they asked for.
In the accounting database, jobs in the job table always have the "gres_used" column be empty. I'd expect to see "gpu:1g.10gb:1" appearing for the job above.
I have …
[View More]this set in slurm.conf
AccountingStorageTRES=gres/gpu
How can I see what gres was requested with the job ? At the moment I only see something like this in AllocTres
billing=1,cpu=1,gres/gpu=1,mem=8G,node=1
and can't see any way to see what the specific MIG gpu asked for was. This is related to the email from Richard Lefebvre dated 7th June 2023 entitled "Billing/accounting for MIGs is not working". As far as I can see this got no replies.
We are running slurm version 23.11.6.
Regards,
Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation
[View Less]
Available presentations from this year's SLUG event are now online.
They can be found at https://www.schedmd.com/publications/
We thank all those who presented and attended for a great event!
--
Victoria Hobson
SchedMD LLC
Vice President of Marketing
Dear all,
I am working on a script to take completed job accounting data from the slurm accounting database and insert the equivalent data into a clickhouse table for fast reporting
I can see that all the information is included in the cluster_job_table and cluster_job_step_table which seem to be joined on job_db_inx
To get the cpu usage and peak memory usage etc. I can see that I need to parse the tres columns in the job steps. I couldn't find any column called MaxRSS in the database even …
[View More]though the sacct command prints this. I then found some data in tres_table and assume that sacct is using this. Please correct me if I'm wrong and if sacct is getting information from somwhere other than the accounting database?
for the state column I get this...
select state, count(*) as num from crg_step_table group by state order by num desc limit 10;
+-------+--------+
| state | num |
+-------+--------+
| 3 | 590635 |
| 5 | 28345 |
| 4 | 4401 |
| 11 | 962 |
| 1 | 8 |
+-------+--------+
When I use sacct I see statuses seach as COMPLETED, OUT_OF_MEMORY etc. so there must be a mapping somewhere between these state ids and that text. Can someone prvide that mapping or point me to where it's defined in the database or in the code ?
Many thanks,
Emyr James
Head of Scientific IT
CRG - Centre for Genomic Regulation
[View Less]
Hello,
I am in the process of setting up SLURM to be used in a profiling cluster.
The purpose of SLURM is to allow users to submit jobs to be profiled. The
latency is a very important aspect of profiling the applications correctly.
I was able to leverage cgroupsv2.0 to isolate user.slice from the cores
that would be used by SLURM jobs. The issue is that slurmstepd shares the
resources with system.slice; I was digging through the code, and I saw that
the creation of the scope is here:
https://…
[View More]github.com/SchedMD/slurm/blob/master/src/plugins/cgroup/v2/cgroup_v…
And I noticed that the slice is hardcoded in the following line:
https://github.com/SchedMD/slurm/blob/master/src/plugins/cgroup/v2/cgroup_v…
So, my question, now, is about why is the slice hardcoded? What was the
reason behind such a decision? I would have thought that the slice chosen
would be set through cgroups.conf, instead.
I would like to switch the slice for slurmstepd to a slice other than
system.slice; by doing so, I would be able to isolate cores better by
making sure that services' processes are isolated from the cores used for
SLURM jobs. I can definitely change the defined value in the code and
recompile. Are there anything to consider before doing so?
Thanks,
Khalid
[View Less]
Awesome, thanks Victoria!
Cheers,
--
Kilian
On Thu, Sep 26, 2024 at 11:17 AM Victoria Hobson <victoria(a)schedmd.com>
wrote:
> Hi Kilian,
>
> We're getting these posted now and an email will go out when they are
> available!
>
> Thanks,
>
>
> Victoria Hobson
>
> *Vice President of Marketing *
>
> 909.609.8889
>
> www.schedmd.com
>
>
> On Mon, Sep 23, 2024 at 10:49 AM Kilian Cavalotti via slurm-users <
> slurm-users(a)lists.…
[View More]schedmd.com> wrote:
>
>> Hi SchedMD,
>>
>> I'm sure they will eventually, but do you know when the slides of the
>> SLUG'24 presentation will be available online at
>> https://slurm.schedmd.com/publications.html, like previous editions'?
>>
>> Thanks!
>> --
>> Kilian
>>
>> --
>> slurm-users mailing list -- slurm-users(a)lists.schedmd.com
>> To unsubscribe send an email to slurm-users-leave(a)lists.schedmd.com
>>
>
--
Kilian
[View Less]
Hi all,
We hit a snag when updating our clusters from Slurm 23.02 to 24.05. After updating the slurmdbd, our multi cluster setup was broken until everything was updated to 24.05. We had not anticipated this.
SchedMD says that fixing it would be a very complex operation.
Hence, this warning to everybody on planning to update: make sure to quickly updating everything once you've updated the slurmdbd daemon.
Reference: https://support.schedmd.com/show_bug.cgi?id=20931
Ward
Hi,
On our cluster we have some jobs that are queued even though there are available nodes to run on. The listed reason is "priority" but that doesn't really make sense to me. Slurm isn't picking another job to run on those nodes; it's just not running anything at all. We do have a quite heterogeneous cluster, but as far as I can tell the queued jobs aren't requesting anything that would preclude them from running on the idle nodes. They are array jobs, if that makes a difference.
Thanks for …
[View More]any help you all can provide.
[View Less]
Hello,
We are looking for a method to limit the TRES used by each user on a per-node basis. For example, we would like to limit the total memory allocation of jobs from a user to 200G per node.
There is MaxTRESperNode (https://slurm.schedmd.com/sacctmgr.html#OPT_MaxTRESPerNode), but unfortunately, this is a per-job limit, not per user.
Ideally, we would like to apply this limit on partitions and/or QoS. Does anyone know if this is possible and how to achieve it?
Thank you,