<div dir="ltr"><div>Hi</div><div><br></div><div>I'm having a hard time figuring out the distribution of jobs between 2 clusters in a Slurm multi-cluster environment. The documentation says that each job is submitted to the cluster that provides the earliest start time, and once the task is submitted to a cluster, it can't be re-distributed to another cluster. The file "<slurm_github+repository>/src/common/slurmdb_defs.c" lists 3 comparison criteria to choose a suitable cluster: 1) First, it investigates the cluster with the earliest start time. 2) If the start times of both clusters are equal, then the cluster with the lower preempt_cnt. 3) If equal, then the local cluster is chosen.<br></div><ul><li style="margin-left:15px">I wonder how the start time is calculated. I tried to deduce it from the source code, but I got lost in the code. Is it calculated for each job, and the least start_time+job_execution_time for all jobs is chosen as the start_time of the cluster?</li><li style="margin-left:15px">Is it possible for 2 or more jobs to see the same start time of the cluster if the jobs are submitted almost simultaneously (i.e., before the start time is modified by any task)? because it seems so to me as one cluster receives most of the jobs despite the other cluster being much less loaded (with faster processors). Besides, sometimes, the 'squeue' shows less number of jobs than what is already submitted (by almost 1 job)</li></ul><br>Regards<font color="#888888"><br clear="all"><div><br></div></font><div><br></div><span class="gmail_signature_prefix">-- </span><br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Mohammed<br></div></div>