[slurm-users] Drain a single user's jobs
Mark Dixon
mark.c.dixon at durham.ac.uk
Wed Apr 1 19:10:25 UTC 2020
Ah-ha! Figured out what I did wrong:
"sacctmgr modify user foo set qos=drain"
This set the list of qos available to the user. The user inherited a
default qos job setting of "normal", which wasn't allowed - hence the
InvalidQOS.
I needed to override the default qos for foo's jobs:
"sacctmgr modify user foo set qos=drain defaultqos=drain"
And then update the qos on all of foo's waiting jobs.
I'll be using David's GrpSubmitJobs=0 suggestion instead.
Thanks for everyone's help,
Mark
On Wed, 1 Apr 2020, Mark Dixon wrote:
> Hi Ahmet,
>
> Another way to do it! Many thanks - very useful :)
>
> But does anyone know why the a user association with my qos stopped jobs
> running with InvalidQOS?
>
> I can imagine using a user qos to override a partition qos being useful for
> other things, so would be nice to know what I've done wrong.
>
> Best,
>
> Mark
>
> On Wed, 1 Apr 2020, mercan wrote:
>
>> Hi;
>>
>> If you have working job_submit.lua script, you can put a block new jobs of
>> the spesific user:
>>
>> if job_desc.user_name == "baduser" then
>> return 2045
>> end
>>
>> thats all!
>>
>> Regards;
>>
>> Ahmet M.
>>
>>
>> 1.04.2020 16:22 tarihinde Mark Dixon yazdı:
>>> Hi David,
>>>
>>> Thanks for this, it sounds like I've not been trying crazy methods - but
>>> they don't work for me:
>>>
>>> - "sacctmgr modify user foo set qos=drain" did set up the association
>>> ("sacctmgr show associations" showed that QoS changed from "normal" to
>>> "drain"), but this is when foo's jobs refused to start because of
>>> reason
>>> "InvalidQOS".
>>>
>>> - "sacctmgr update user foo set maxsubmitjobs=0" was ignored because qos
>>> were already set on the partitions.
>>>
>>> But... good news!
>>>
>>> We hadn't used GrpSubmitJobs in any of our qos, so "sacctmgr modify user
>>> foo set GrpSubmitJobs=0" isn't overridden anywhere, and the effect is
>>> exactly what I wanted - thanks!
>>>
>>> But if anyone knows why my attempt at using a "drain" qos stopped foo's
>>> previously submitted jobs from running, I'd be very interested to hear
>>> about it.
>>>
>>> Thanks again,
>>>
>>> Mark
>>>
>>> On Wed, 1 Apr 2020, David Rhey wrote:
>>>
>>>> Hi Mark,
>>>>
>>>> I *think* you might need to update the user account to have access to
>>>> that
>>>> QoS (as part of their association). Using sacctmgr modify user <foo> +
>>>> some
>>>> additional args (they escape me at the moment).
>>>>
>>>> Also, you *might* have been able to set the MaxSubmitJobs at their
>>>> account
>>>> level to 0 and have them run without having to do the QoS approach -
>>>> but
>>>> that's just a guess on my end based on how we've done some things here.
>>>> We
>>>> had a "free period" for our clusters and once it was over we set the
>>>> GrpSubmit jobs on an account to 0 which allowed in-flight jobs to
>>>> continue
>>>> but no new work to be submitted.
>>>>
>>>> HTH,
>>>>
>>>> David
>>>>
>>>> On Wed, Apr 1, 2020 at 5:57 AM Mark Dixon <mark.c.dixon at durham.ac.uk>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm a slurm newbie who has inherited a working slurm 16.05.10 cluster.
>>>>>
>>>>> I'd like to stop user foo from submitting new jobs but allow their
>>>>> existing jobs to run.
>>>>>
>>>>> We have several partitions, each with its own qos and MaxSubmitJobs
>>>>> typically set to some vaue. These qos are stopping a "sacctmgr update
>>>>> user
>>>>> foo set maxsubmitjobs=0" from doing anything useful, as per the
>>>>> documentation.
>>>>>
>>>>> I've tried setting up a competing qos:
>>>>>
>>>>> sacctmgr add qos drain
>>>>> sacctmgr modify qos drain set MaxSubmitJobs=0
>>>>> sacctmgr modify qos drain set flags=OverPartQOS
>>>>> sacctmgr modify user foo set qos=drain
>>>>>
>>>>> This has successfully prevented the user from submitting new jobs, but
>>>>> their existing jobs aren't running. I'm seeing the reason code
>>>>> "InvalidQOS".
>>>>>
>>>>> Any ideas what I should be looking at, please?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Mark
>>>>>
>>>>>
>>>>
>>>> --
>>>> David Rhey
>>>> ---------------
>>>> Advanced Research Computing - Technology Services
>>>> University of Michigan
>>>>
>>>
>>
More information about the slurm-users
mailing list