Dear All, I need to configure the slurm so the user must take a certain minimum number of CPU cores for a particular partition(not system-wide). Otherwise, the job must not run.
Any suggestions will be highly appreciated.
With Thanks and Regards
Hi Jeherul Islam,
Such a policy may be implemented using a job_submit plugin which you have to write yourself. You may perhaps find this Wiki page useful: https://wiki.fysik.dtu.dk/Niflheim_system/Slurm_configuration/#job-submit-pl...
On 3/28/25 05:36, Jeherul Islam via slurm-users wrote:
I need to configure the slurm so the user must take a certain minimum number of CPU cores for a particular partition(not system-wide). Otherwise, the job must not run.
Any suggestions will be highly appreciated.
IHTH, Ole
You can set a partition QoS which specifies a minimum. We have such a qos on our large-gpu partition; we don’t want people scheduling small stuff to it, so we have this qos:
$ sacctmgr show qos large-gpu --json | jq '.QOS[] | { name: .name, min_limits: .limits.min }'
{
"name": "large-gpu",
"min_limits": {
"priority_threshold": {
"set": false,
"infinite": true,
"number": 0
},
"tres": {
"per": {
"job": [
{
"type": "cpu",
"name": "",
"id": 1,
"count": 32
},
{
"type": "mem",
"name": "",
"id": 2,
"count": 262144
},
{
"type": "gres",
"name": "gpu",
"id": 1002,
"count": 3
}
]
}
}
}
}
i.e. the user has to request 32 cores and at least 3 GPUs. If I try to allocate less I get an error:
$ salloc -p large-gpu --gres=gpu:1 -c 32 --mem 256G
salloc: error: QOSMinGRES
salloc: error: Job submit/allocate failed: Job violates accounting/QOS policy (job submit limit, user's size and/or time limits)
-- Tim Cutts Senior Director, R&D IT - Data, Analytics & AI, Scientific Computing Platform AstraZeneca
Find out more about R&D IT Data, Analytics & AI and how we can support you by visiting our Service Cataloguehttps://azcollaboration.sharepoint.com/sites/CMU993 |
From: Jeherul Islam via slurm-users slurm-users@lists.schedmd.com Date: Friday, 28 March 2025 at 4:39 am To: Slurm User Community List slurm-users@lists.schedmd.com Subject: [slurm-users] Minimum cpu cores per node partition level configuration Dear All, I need to configure the slurm so the user must take a certain minimum number of CPU cores for a particular partition(not system-wide). Otherwise, the job must not run.
Any suggestions will be highly appreciated.
With Thanks and Regards -- Jeherul Islam ________________________________
AstraZeneca UK Limited is a company incorporated in England and Wales with registered number:03674842 and its registered office at 1 Francis Crick Avenue, Cambridge Biomedical Campus, Cambridge, CB2 0AA.
This e-mail and its attachments are intended for the above named recipient only and may contain confidential and privileged information. If they have come to you in error, you must not copy or show them to anyone; instead, please reply to this e-mail, highlighting the error to the sender and then immediately delete the message. For information about how AstraZeneca UK Limited and its affiliates may process information, personal data and monitor communications, please see our privacy notice at www.astrazeneca.comhttps://www.astrazeneca.com
Hi Tim,
"Cutts, Tim via slurm-users"
slurm-users-rGrgPyRx505G7+FkpxDULAC/G2K4zDHf@public.gmane.org writes:
You can set a partition QoS which specifies a minimum. We have such a qos on our large-gpu partition; we don’t want people scheduling small stuff to it, so we have this qos:
How does this affect total throughput? Presumably, 'small' GPU jobs might potentially have to wait for resources in other partitions, even if resources are free in 'large-gpu'. Do you other policies which ameliorate this?
Cheers,
Loris
[snip (135 lines)]