[slurm-users] Unexpected negative NICE values

Wed May 3 11:37:07 UTC 2023

Hi Jürgen,

This was it! Thank you so much for the hint! I did not know about the 
"top" command and was also not aware that this option was enabled in our 
slurm.conf.

Thanks for the help!

Sebastian

On 03.05.23 12:10, Juergen Salk wrote:
> Hi Sebastian,
>
> maybe it's a silly thought on my part, but do you have the
> `enable_user_top´ Option included in your SchedulerParameters
> configuration?
>
> This would allow regular users to use `scontrol top <job_list>´ to
> push some of their jobs ahead of other jobs owned by them and this
> works internally by adjusting the nice values of the specified jobs.
>
> I may be totally wrong, but if I remember correctly it is not
> recommended to configure SchedulerParameters=enable_user_top in
> general, though, because regular user use of `scontrol top´ is (or
> was?) supposed to introduce bad side effects in certain scenarios that
> would allow users to push pending jobs ahead of normal (also
> other user's) jobs in the queue, if only one of their jobs has already
> a negative nice value assigned, e.g. by an administrator.
>
> Best regards
> Jürgen
>
>
> * Sebastian Potthoff <s.potthoff at uni-muenster.de> [230503 10:36]:
>> Hello all,
>>
>> I am encountering some unexpected behavior where the jobs (queued & running)
>> of one specific user have negative NICE values and therefore an increased
>> priority. The user is not privileged in any way and cannot explicitly set
>> the nice value to a negative value by e.g. adding "--nice=-INT" . There are
>> also no QoS which would allow this (is this even possible?). The cluster is
>> using the "priority/multifactor" plugin with weights set for Age, FaireShare
>> and JobSize.
>>
>> This is the only user on the whole cluster where this occurs. From what I
>> can tell, he/she is not doing anything out of the ordinary. However, in the
>> job scripts the user does set a nice value of "0". The user also uses some
>> "strategy" where he/she submits the same job to multiple partitions and, as
>> soon as one of these jobs starts, all other jobs (with the same jobname)
>> will be set on "hold".
>>
>> Does anyone have an idea how this could happen? Does Slurm internally adjust
>> the NICE values in certain situations? (I searched the sources but couldn't
>> find anything that would suggest this).
>>
>> Slurm version is 23.02.1
>>
>> Example squeue output:
>>
>> [root at mgmt ~]# squeue -u USERID -O JobID,Nice
>> JOBID               NICE
>> 14846760            -5202
>> 14846766            -8988
>> 14913146            -13758
>> 14917361            -15103
>>
>>
>> Any hints are appreciated.
>>
>> Kind regards
>> Sebastian
>>
-- 
Westfälische Wilhelms-Universität (WWU) Münster
WWU IT
Sebastian Potthoff, M.Sc. (eScience/HPC)
Röntgenstraße 7-13, R.207/208
48149 Münster
Tel. +49 251 83-31640
E-Mail: s.potthoff at uni-muenster.de
Website: www.uni-muenster.de/it

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5077 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230503/ad1868d2/attachment.bin>