[slurm-users] fail job
Durai Arasan
arasan.durai at gmail.com
Tue Jun 30 08:47:29 UTC 2020
Hi,
Can you post the output of the following commands on your master node?:
sacctmgr show cluster
scontrol show nodes
Best,
Durai Arasan
Zentrum für Datenverarbeitung
Tübingen
On Tue, Jun 30, 2020 at 10:33 AM Alberto Morillas, Angelines <
angelines.alberto at ciemat.es> wrote:
> Hi,
>
>
>
> We have slurm version 18.08.6
>
> One of my nodes is in drain state Reason=Kill task failed
> [root at 2020-06-27T02:25:29]
>
>
>
> In the node I can see in the slurmd.log
>
>
>
> 2020-06-27T01:24:26.242] task_p_slurmd_batch_request: 963771
>
> [2020-06-27T01:24:26.242] task/affinity: job 963771 CPU input mask for
> node: 0x0FFFFFFFFF
>
> [2020-06-27T01:24:26.242] task/affinity: job 963771 CPU final HW mask for
> node: 0x55FFFFFFFF
>
> [2020-06-27T01:24:26.247] _run_prolog: run job script took usec=4537
>
> [2020-06-27T01:24:26.247] _run_prolog: prolog with lock for job 963771 ran
> for 0 seconds
>
> [2020-06-27T01:24:26.247] Launching batch job 963771 for UID 5200
>
> [2020-06-27T01:24:26.276] [963771.batch] task/cgroup:
> /slurm/uid_5200/job_963771: alloc=147456MB mem.limit=147456MB
> memsw.limit=147456MB
>
> [2020-06-27T01:24:26.284] [963771.batch] task/cgroup:
> /slurm/uid_5200/job_963771/step_batch: alloc=147456MB mem.limit=147456MB
> memsw.limit=147456MB
>
> [2020-06-27T01:24:26.310] [963771.batch] task_p_pre_launch: Using
> sched_affinity for tasks
>
> [2020-06-27T02:24:26.933] [963771.batch] error: *** JOB 963771 ON
> node0802 CANCELLED AT 2020-06-27T02:24:26 DUE TO TIME LIMIT ***
>
> [2020-06-27T02:25:27.009] [963771.batch] error: *** JOB 963771 STEPD
> TERMINATED ON node0802 AT 2020-06-27T02:25:27 DUE TO JOB NOT ENDING WITH
> SIGNALS ***
>
> [2020-06-27T02:25:27.009] [963771.batch] sending
> REQUEST_COMPLETE_BATCH_SCRIPT, error:4001 status 15
>
> [2020-06-27T02:25:27.011] [963771.batch] done with job
>
>
>
> If I try to get information about this job nothing get
>
>
>
> sacct -j 963771
>
> JobID JobName Partition Account AllocCPUS State
> ExitCode
>
> ------------ ---------- ---------- ----------
> ---------- ---------- --------
>
>
>
> Why I don`t get information about this job???
>
>
>
> Thanks in advance
>
> Angelines
>
> ________________________________________________
>
>
>
> Angelines Alberto Morillas
>
>
>
> Unidad de Arquitectura Informática
>
> Despacho: 22.1.32
>
> Telf.: +34 91 346 6119
>
> Fax: +34 91 346 6537
>
>
>
> skype: angelines.alberto
>
>
>
> CIEMAT
>
> Avenida Complutense, 40
>
> 28040 MADRID
>
> ________________________________________________
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20200630/c5a08273/attachment-0001.htm>
More information about the slurm-users
mailing list