<span style="color:rgb(34,34,34);font-size:14px">Hi All,</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">I want to preempt a job from a low-priority partition and restart it when resources are available again, but a restarted job fails immediately.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">Are there any manners or configurations for job preemption?</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">I used slurm-docker-cluster to build the slurm cluster for testing. The same problem occurred in other clusters deployed with NVIDA/DeepOps.</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">The relevant configurations are as follows:</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">SLURM_VERSION           = 19.05.1-2</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">SchedulerType           = sched/backfill</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">SelectType              = select/cons_res</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">SelectTypeParameters    = CR_CPU_MEMORY</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">PreemptMode             = REQUEUE</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">PreemptType             = preempt/partition_prio</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">Low-priority partition info:</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">PartitionName=common.q</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   AllocNodes=ALL Default=NO QoS=N/A</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   Nodes=c[1-2]</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=NO</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   OverTimeLimit=NONE PreemptMode=REQUEUE</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   State=UP TotalCPUs=2 TotalNodes=2 SelectTypeParameters=NONE</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   JobDefaults=(null)</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   DefMemPerCPU=UNLIMITED MaxMemPerNode=UNLIMITED</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">High-priority partition info:</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">PartitionName=sepang.q</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   AllowGroups=ALL AllowAccounts=sepang AllowQos=ALL</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   AllocNodes=ALL Default=NO QoS=sepang</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   Nodes=c[1-2]</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   PriorityJobFactor=1 PriorityTier=20 RootOnly=NO ReqResv=NO OverSubscribe=NO</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   OverTimeLimit=NONE PreemptMode=OFF</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   State=UP TotalCPUs=2 TotalNodes=2 SelectTypeParameters=NONE</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   JobDefaults=(null)</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">   DefMemPerCPU=UNLIMITED MaxMemPerNode=UNLIMITED</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">I executed following script.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">The first job was preempted by the third job and requeued.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">The requeued job restarted but sometimes failed immediately.</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">#!/usr/bin/env bash</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">set -ex</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">time=130</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">sbatch -p common.q -n 1 --wrap="srun -n 1 sleep $time"</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">sleep 5</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">sbatch -p common.q -n 1 --wrap="srun -n 1 sleep $time"</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">sleep 5</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">sbatch -p sepang.q -n 1 --wrap="srun -n 1 sleep $time"</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">sleep 1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">It seems that jobs that restarted by the main scheduling loop or events fail immediately,</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">and jobs that restarted by the backfill scheduler complete successfully.</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">What is the difference between "sched" and "backfill".</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">The following JobId 117 and 120 jobs were both preempted and requeued.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">JobId 117 was restarted by sched, but jobid 120 by backfill.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">If the script sleep for 120 seconds, the backfill will tend to restart the job, and if 130 seconds, sched will restart.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">JobId 117-119 are the former and jobid 120-122 are the latter.</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px"># sacct -X -o jobid,jobname,partition,</span><span style="color:rgb(34,34,34);font-size:14px">allocc<wbr>pus,state,exitcode</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">       JobID    JobName  Partition  AllocCPUS      State ExitCode</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">------------ ---------- ---------- ---------- ---------- --------</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">118                wrap   common.q          1  COMPLETED      0:0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">117                wrap   common.q          1  COMPLETED      0:0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">119                wrap   sepang.q          1  COMPLETED      0:0</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">121                wrap   common.q          1  COMPLETED      0:0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">122                wrap   sepang.q          1  COMPLETED      0:0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">120                wrap   common.q          1     FAILED      1:0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">slurmctld.log:</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px"># FAILED case</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.322] sched: Allocate JobId=120 NodeList=c2 #CPUs=1 Partition=common.q</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.322] debug2: _group_cache_lookup_internal: found valid entry for tanaka</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.323] debug2: Spawning RPC agent for msg_type REQUEST_BATCH_JOB_LAUNCH</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.323] debug2: Tree head got back 0 looking for 1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.324] debug2: Tree head got back 1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.328] debug2: node_did_resp c2</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.374] debug2: Processing RPC: REQUEST_JOB_PACK_ALLOC_INFO from uid=1016</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.374] debug:  _slurm_rpc_job_pack_alloc_</span><span style="color:rgb(34,34,34);font-size:14px">inf<wbr>o: JobId=120 NodeList=c2 usec=81</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.385] debug2: Processing RPC: REQUEST_COMPLETE_BATCH_SCRIPT from uid=0 JobId=120</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.386] _job_complete: JobId=120 WEXITSTATUS 1</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px"># COMPLETED case</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.243] backfill: Started JobId=117 in common.q on c1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.243] debug2: _group_cache_lookup_internal: found valid entry for tanaka</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.243] debug2: altering JobId=117 QOS normal got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.243] debug2: altering JobId=117 QOS normal got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 QOS normal got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 12(sepang/tanaka/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 12(sepang/tanaka/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 12(sepang/tanaka/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 5(sepang/(null)/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 5(sepang/(null)/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 5(sepang/(null)/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 1(root/(null)/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 1(root/(null)/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: altering JobId=117 assoc 1(root/(null)/(null)) got 31536000 just removed 257698037700 and added 31536000</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.244] debug2: Spawning RPC agent for msg_type REQUEST_BATCH_JOB_LAUNCH</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.245] debug2: Tree head got back 0 looking for 1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.246] debug2: Tree head got back 1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.250] debug2: node_did_resp c1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.277] debug2: Processing RPC: REQUEST_JOB_PACK_ALLOC_INFO from uid=1016</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.277] debug:  _slurm_rpc_job_pack_alloc_</span><span style="color:rgb(34,34,34);font-size:14px">inf<wbr>o: JobId=117 NodeList=c1 usec=78</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.278] debug:  laying out the 1 tasks on 1 hosts c1 dist 2</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:57:39.279] debug2: _group_cache_lookup_internal: found valid entry for tanaka</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:02.533] debug2: Testing job time limits and checkpoints</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:09.244] debug:  backfill: beginning</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:09.244] debug:  backfill: no jobs to backfill</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:25.559] debug2: Performing purge of old job records</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:25.559] debug2: purge_old_job: purged 1 old job records</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:25.559] debug2: _purge_files_thread: starting, 1 jobs to purge</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:25.559] debug2: _purge_files_thread: purging files from JobId=115</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:25.559] debug:  sched: Running job scheduler</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:32.567] debug2: Testing job time limits and checkpoints</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:39.245] debug:  backfill: beginning</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:58:39.245] debug:  backfill: no jobs to backfill</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:02.601] debug2: Testing job time limits and checkpoints</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:25.627] debug2: Performing purge of old job records</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:25.627] debug:  sched: Running job scheduler</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:32.635] debug2: Testing job time limits and checkpoints</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:39.324] debug2: full switch release for JobId=117 StepId=1, nodes c1</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:39.342] debug2: Processing RPC: REQUEST_COMPLETE_BATCH_SCRIPT from uid=0 JobId=117</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:39.342] _job_complete: JobId=117 WEXITSTATUS 0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T09:59:39.342] _job_complete: JobId=117 done                                                    </span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">slurmd.log:</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px"># FAILED case</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.341] [120.batch] task 0 (1108) started 2019-11-26T10:07:35</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.342] [120.batch] debug:  task_p_pre_launch_priv: 120.4294967294</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.342] [120.batch] debug2: adding task 0 pid 1108 on node 0 to jobacct</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.344] [120.batch] debug2: _get_precs: energy = 0 watts = 0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.356] [120.batch] debug2: xcgroup_load: unable to get cgroup '(null)/cpuset' entry '(null)/cpuset/system' properties: No such file or directory</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.356] [120.batch] debug2: xcgroup_load: unable to get cgroup '(null)/memory' entry '(null)/memory/system' properties: No such file or directory</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug:  task_p_pre_launch: 120.4294967294, task 0</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_CPU no change in value: 18446744073709551615</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_FSIZE no change in value: 18446744073709551615</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_DATA no change in value: 18446744073709551615</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_STACK no change in value: 8388608</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_CORE no change in value: 18446744073709551615</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_RSS no change in value: 18446744073709551615</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: RLIMIT_NPROC  : max:inf cur:inf req:4096</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_NPROC succeeded</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_NOFILE no change in value: 1048576</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_MEMLOCK no change in value: 65536</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.358] [120.batch] debug2: _set_limit: conf setrlimit RLIMIT_AS no change in value: 18446744073709551615</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.379] [120.batch] debug2: _get_precs: energy = 0 watts = 0                                                                                                                                                               [2019-11-26T10:07:35.379] [120.batch] debug2: removing task 0 pid 1108 from jobacct</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">[2019-11-26T10:07:35.380] [120.batch] task 0 (1108) exited with exit code 1.</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px"># sacct -o jobid,jobname,partition,</span><span style="color:rgb(34,34,34);font-size:14px">allocc<wbr>pus,state,exitcode,</span><span style="color:rgb(34,34,34);font-size:14px">submit,<wbr>start,end,elapsed -j 117,118,119,120,121,122</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">       JobID    JobName  Partition  AllocCPUS      State ExitCode              Submit               Start                 End    Elapsed</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">------------ ---------- ---------- ---------- ---------- -------- ------------------- ------------------- ------------------- ----------</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">118                wrap   common.q          1  COMPLETED      0:0 2019-11-26T09:55:29 2019-11-26T09:55:30 2019-11-26T09:57:30   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">118.batch         batch                     1  COMPLETED      0:0 2019-11-26T09:55:30 2019-11-26T09:55:30 2019-11-26T09:57:30   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">118.0             sleep                     1  COMPLETED      0:0 2019-11-26T09:55:30 2019-11-26T09:55:30 2019-11-26T09:57:30   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">117                wrap   common.q          1  COMPLETED      0:0 2019-11-26T09:55:34 2019-11-26T09:57:39 2019-11-26T09:59:39   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">117.batch         batch                     1  COMPLETED      0:0 2019-11-26T09:57:39 2019-11-26T09:57:39 2019-11-26T09:59:39   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">117.1             sleep                     1  COMPLETED      0:0 2019-11-26T09:57:39 2019-11-26T09:57:39 2019-11-26T09:59:39   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">119                wrap   sepang.q          1  COMPLETED      0:0 2019-11-26T09:55:34 2019-11-26T09:55:34 2019-11-26T09:57:34   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">119.batch         batch                     1  COMPLETED      0:0 2019-11-26T09:55:34 2019-11-26T09:55:34 2019-11-26T09:57:34   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">119.0             sleep                     1  COMPLETED      0:0 2019-11-26T09:55:34 2019-11-26T09:55:34 2019-11-26T09:57:34   00:02:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">121                wrap   common.q          1  COMPLETED      0:0 2019-11-26T10:05:24 2019-11-26T10:05:25 2019-11-26T10:07:35   00:02:10</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">121.batch         batch                     1  COMPLETED      0:0 2019-11-26T10:05:25 2019-11-26T10:05:25 2019-11-26T10:07:35   00:02:10</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">121.0             sleep                     1  COMPLETED      0:0 2019-11-26T10:05:25 2019-11-26T10:05:25 2019-11-26T10:07:35   00:02:10</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">122                wrap   sepang.q          1  COMPLETED      0:0 2019-11-26T10:05:29 2019-11-26T10:05:30 2019-11-26T10:07:40   00:02:10</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">122.batch         batch                     1  COMPLETED      0:0 2019-11-26T10:05:30 2019-11-26T10:05:30 2019-11-26T10:07:40   00:02:10</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">122.0             sleep                     1  COMPLETED      0:0 2019-11-26T10:05:30 2019-11-26T10:05:30 2019-11-26T10:07:40   00:02:10</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">120                wrap   common.q          1     FAILED      1:0 2019-11-26T10:05:30 2019-11-26T10:07:35 2019-11-26T10:07:35   00:00:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">120.batch         batch                     1     FAILED      1:0 2019-11-26T10:07:35 2019-11-26T10:07:35 2019-11-26T10:07:35   00:00:00</span><br style="color:rgb(34,34,34);font-size:14px"><span style="color:rgb(34,34,34);font-size:14px">```</span><div style="color:rgb(34,34,34);font-size:14px"><br></div><div style="color:rgb(34,34,34);font-size:14px">Thanks,</div><div style="color:rgb(34,34,34);font-size:14px">Koso Kashima</div>