[slurm-users] Inaccurate Preemption Notification?

Jason Simms jsimms1 at swarthmore.edu
Mon Apr 24 23:29:00 UTC 2023

Hello all,

A user received an email from Slurm that one of his jobs was preempted.
Normally when a job is preempted, the logs will show something like this:

[2023-03-30T08:19:16.535] [25538.batch] error: *** JOB 25538 ON node07
CANCELLED AT 2023-03-30T08:19:16 DUE TO PREEMPTION ***
[2023-03-30T08:19:16.573] [25538.1] error: *** STEP 25538.1 ON node07
CANCELLED AT 2023-03-30T08:19:16 DUE TO PREEMPTION ***

There was no such entry for this job; what was in the log for the job was

[2023-04-24T17:06:24.105] [26446.batch] error: *** JOB 26446 ON node07
CANCELLED AT 2023-04-24T17:06:24 ***
[2023-04-24T17:06:24.105] [26446.1] error: *** STEP 26446.1 ON node07
CANCELLED AT 2023-04-24T17:06:24 ***
[2023-04-24T17:06:24.155] [26446.extern] done with job
[2023-04-24T17:06:25.161] [26446.batch] sending
[2023-04-24T17:06:25.163] [26446.batch] done with job
[2023-04-24T17:06:27.462] [26446.1] error: Failed to send
MESSAGE_TASK_EXIT: Connection refused
[2023-04-24T17:06:27.464] [26446.1] done with job

It's unclear to me whether this was actually preempted, but perhaps there
is a different way it logs preemption for MPI jobs. I do not, however,
believe that it was preempted, because he was running on a partition to
which the account he was using was the only account permitted to use that
partition, and in any case, that partition has the highest partition
priority. Moreover, the job immediately restarted (after a requeue, with a
new job id) on the same partition.

Any thoughts as to whether this job was actually preempted, and if not, why
the email notification would say it was?

Warmest regards,

*Jason L. Simms, Ph.D., M.P.H.*
Manager of Research Computing
Swarthmore College
Information Technology Services
(610) 328-8102
Schedule a meeting: https://calendly.com/jlsimms
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230424/517f58c2/attachment.htm>

More information about the slurm-users mailing list