[slurm-users] Slurm cancelling jobs even when dependencies are successful

Roger Moye rmoye at quantlab.com
Tue Oct 29 20:55:39 UTC 2019


We have a situation where Slurm is cancelling jobs due to the error "job dependency can't be satisfied".     However, all of the job dependencies are completing successfully.   Sacct shows that they are successful and scontrol shows they all have exit codes of 0.

Subsequently, once we have determined that the dependencies were successful we requeued the cancelled job.  Again it was cancelled:
JobState=CANCELLED Reason=Dependency Dependency=afterok:7438267_*

This seems to be a random event because we have submitted jobs of this type often and only some of them get cancelled.  Shortly after this happened today I tried to recreate the problem by submitting a job array and then submitting a second job that was dependent on the job array.   The dependent job was successful.

Has anyone seen this behavior before or can anyone shed light on this?

We are using Slurm 18.08.4.

Thanks in advance!
-Roger


[cid:image001.png at 01D22319.C7D5D540]
Roger Moye
HPC Engineer
713.425.6236 Office
713.898.0021 Mobile

QUANTLAB Financial, LLC
3 Greenway Plaza
Suite 200
Houston, Texas 77046
www.quantlab.com<https://www.quantlab.com/>

-----------------------------------------------------------------------------------

The information in this communication and any attachment is confidential and intended solely for the attention and use of the named addressee(s). All information and opinions expressed herein are subject to change without notice. This communication is not to be construed as an offer to sell or the solicitation of an offer to buy any security. Any such offer or solicitation can only be made by means of the delivery of a confidential private offering memorandum (which should be carefully reviewed for a complete description of investment strategies and risks). Any reliance one may place on the accuracy or validity of this information is at their own risk. Past performance is not necessarily indicative of the future results of an investment. All figures are estimated and unaudited unless otherwise noted. If you are not the intended recipient, or a person responsible for delivering this to the intended recipient, you are not authorized to and must not disclose, copy, distribute, or retain this message or any part of it. In this case, please notify the sender immediately at 713-333-5440
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191029/75742d02/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 3364 bytes
Desc: image001.png
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191029/75742d02/attachment-0001.png>


More information about the slurm-users mailing list