[slurm-users] Verifying preemption WON'T happen

Groner, Rob rug262 at psu.edu
Fri Sep 29 20:50:24 UTC 2023


Well again, I don't want to tweak things just to get the test to happen quicker.  I DO have to keep in mind the scheduler and backfill settings, though.  For instance, I think the default scheduler and backfill interval is 60 and 30 seconds...or vice versa.  So, before I check the Scheduler value for the high priority job via scontrol, I wait 90 seconds and then some.  In a perfect world, that SHOULD have given the scheduler and backfill scheduler time to get to it.  I THINK, however, that in a sufficiently busy system, there's no guarantee even after that amount of time that the new high priority job has been evaluated.

I'll take a look at sdiag and see if it can tell me where the job is at, thanks for the suggestion.

Rob

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Ryan Novosielski <novosirj at rutgers.edu>
Sent: Friday, September 29, 2023 4:19 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You can get some information on that from sdiag, and there are tweaks you can make to backfill scheduling that affect how quickly it will get to a job.

That doesn’t really answer your real question, but might help you when you are looking into this.

Sent from my iPhone

On Sep 29, 2023, at 16:10, Groner, Rob <rug262 at psu.edu> wrote:


I'm not looking for a one-time answer.  We run these tests anytime we change anything related to slurm....version, configuration, etc.    We certainly run the test after the system comes back up after an outage, and an hour would be a long time to wait for that.  That's certainly the brute-force approach, but I'm hoping there's a definitive way to show, through scontrol job output, that the job won't preempt.

I could set the preemptexempttime to a smaller value, say 5 minutes instead of 1 hour, that is true, but there's a few issues with that.


  1.  I would then no longer be testing the system as it actually is.  I want to test the system in its actual production configuration.
  2.  If I did lower its value, what would be a safe value?  5 minutes?  Does running for 5 minutes guarantee that the higher priority job had a chance to preempt it but didn't?  Or did the scheduler even ever get to it?  On a test cluster with few jobs, you could be reasonably assured it did, but running tests on the production cluster...isn't it possible the scheduler hasn't yet had a chance to process it, even after 5 minutes?  Depends on the slurm scheduler  settings I suppose....

rob

________________________________
From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) <noam.bernstein at nrl.navy.mil>
Sent: Friday, September 29, 2023 3:14 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You don't often get email from noam.bernstein at nrl.navy.mil. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
On Sep 29, 2023, at 2:51 PM, Davide DelVento <davide.quantum at gmail.com<mailto:davide.quantum at gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it sounds like a good thing which I would test with a simulator, if I had one. I've been intrigued by (but really not looked much into) https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob <rug262 at psu.edu<mailto:rug262 at psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to this post? Also, you could set up the minimum time to a much smaller value, so it won't take as long to test.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230929/11935897/attachment-0001.htm>


More information about the slurm-users mailing list