[slurm-users] Verifying preemption WON'T happen

Ryan Novosielski novosirj at rutgers.edu
Fri Sep 29 20:19:33 UTC 2023

You can get some information on that from sdiag, and there are tweaks you can make to backfill scheduling that affect how quickly it will get to a job.

That doesn’t really answer your real question, but might help you when you are looking into this.

Sent from my iPhone

On Sep 29, 2023, at 16:10, Groner, Rob <rug262 at psu.edu> wrote:

I'm not looking for a one-time answer.  We run these tests anytime we change anything related to slurm....version, configuration, etc.    We certainly run the test after the system comes back up after an outage, and an hour would be a long time to wait for that.  That's certainly the brute-force approach, but I'm hoping there's a definitive way to show, through scontrol job output, that the job won't preempt.

I could set the preemptexempttime to a smaller value, say 5 minutes instead of 1 hour, that is true, but there's a few issues with that.

  1.  I would then no longer be testing the system as it actually is.  I want to test the system in its actual production configuration.
  2.  If I did lower its value, what would be a safe value?  5 minutes?  Does running for 5 minutes guarantee that the higher priority job had a chance to preempt it but didn't?  Or did the scheduler even ever get to it?  On a test cluster with few jobs, you could be reasonably assured it did, but running tests on the production cluster...isn't it possible the scheduler hasn't yet had a chance to process it, even after 5 minutes?  Depends on the slurm scheduler  settings I suppose....


From: slurm-users <slurm-users-bounces at lists.schedmd.com> on behalf of Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) <noam.bernstein at nrl.navy.mil>
Sent: Friday, September 29, 2023 3:14 PM
To: Slurm User Community List <slurm-users at lists.schedmd.com>
Subject: Re: [slurm-users] Verifying preemption WON'T happen

You don't often get email from noam.bernstein at nrl.navy.mil. Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
On Sep 29, 2023, at 2:51 PM, Davide DelVento <davide.quantum at gmail.com<mailto:davide.quantum at gmail.com>> wrote:

I don't really have an answer for you other than a "hallway comment", that it sounds like a good thing which I would test with a simulator, if I had one. I've been intrigued by (but really not looked much into) https://slurm.schedmd.com/SLUG23/LANL-Batsim-SLUG23.pdf

On Fri, Sep 29, 2023 at 10:05 AM Groner, Rob <rug262 at psu.edu<mailto:rug262 at psu.edu>> wrote:

I could obviously let the test run for an hour to verify the lower priority job was never preempted...but that's not really feasible.

Why not? Isn't it going to take longer than an hour to wait for responses to this post? Also, you could set up the minimum time to a much smaller value, so it won't take as long to test.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230929/4f2b30ee/attachment.htm>

More information about the slurm-users mailing list