[slurm-users] Drain a single user's jobs
Mark Dixon
mark.c.dixon at durham.ac.uk
Wed Apr 1 14:27:21 UTC 2020
Hi Ahmet,
Another way to do it! Many thanks - very useful :)
But does anyone know why the a user association with my qos stopped jobs
running with InvalidQOS?
I can imagine using a user qos to override a partition qos being useful
for other things, so would be nice to know what I've done wrong.
Best,
Mark
On Wed, 1 Apr 2020, mercan wrote:
> Hi;
>
> If you have working job_submit.lua script, you can put a block new jobs of
> the spesific user:
>
> if job_desc.user_name == "baduser" then
> return 2045
> end
>
> thats all!
>
> Regards;
>
> Ahmet M.
>
>
> 1.04.2020 16:22 tarihinde Mark Dixon yazdı:
>> Hi David,
>>
>> Thanks for this, it sounds like I've not been trying crazy methods - but
>> they don't work for me:
>>
>> - "sacctmgr modify user foo set qos=drain" did set up the association
>> ("sacctmgr show associations" showed that QoS changed from "normal" to
>> "drain"), but this is when foo's jobs refused to start because of reason
>> "InvalidQOS".
>>
>> - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
>> were already set on the partitions.
>>
>> But... good news!
>>
>> We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user
>> foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is
>> exactly what I wanted - thanks!
>>
>> But if anyone knows why my attempt at using a "drain" qos stopped foo's
>> previously submitted jobs from running, I'd be very interested to hear
>> about it.
>>
>> Thanks again,
>>
>> Mark
>>
>> On Wed, 1 Apr 2020, David Rhey wrote:
>>
>>> Hi Mark,
>>>
>>> I *think* you might need to update the user account to have access to
>>> that
>>> QoS (as part of their association). Using sacctmgr modify user <foo> +
>>> some
>>> additional args (they escape me at the moment).
>>>
>>> Also, you *might* have been able to set the MaxSubmitJobs at their
>>> account
>>> level to 0 and have them run without having to do the QoS approach - but
>>> that's just a guess on my end based on how we've done some things here.
>>> We
>>> had a "free period" for our clusters and once it was over we set the
>>> GrpSubmit jobs on an account to 0 which allowed in-flight jobs to
>>> continue
>>> but no new work to be submitted.
>>>
>>> HTH,
>>>
>>> David
>>>
>>> On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon <mark.c.dixon at durham.ac.uk>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.
>>>>
>>>> I'd like to stop user foo from submitting new jobs but allow their
>>>> existing jobs to run.
>>>>
>>>> We have several partitions, each with its own qos and MaxSubmitJobs
>>>> typically set to some vaue. These qos are stopping a "sacctmgr update
>>>> user
>>>> foo set maxsubmitjobs=0" from doing anything useful, as per the
>>>> documentation.
>>>>
>>>> I've tried setting up a competing qos:
>>>>
>>>> sacctmgr add qos drain
>>>> sacctmgr modify qos drain set MaxSubmitJobs=0
>>>> sacctmgr modify qos drain set flags=OverPartQOS
>>>> sacctmgr modify user foo set qos=drain
>>>>
>>>> This has successfully prevented the user from submitting new jobs, but
>>>> their existing jobs aren't running. I'm seeing the reason code
>>>> "InvalidQOS".
>>>>
>>>> Any ideas what I should be looking at, please?
>>>>
>>>> Thanks,
>>>>
>>>> Mark
>>>>
>>>>
>>>
>>> --
>>> David Rhey
>>> ---------------
>>> Advanced Research Computing - Technology Services
>>> University of Michigan
>>>
>>
>
--
Mark Dixon <mark.c.dixon at durham.ac.uk> Tel: +44(0)191 33 41383
Advanced Research Computing (ARC), Durham University, UK
More information about the slurm-users
mailing list