[slurm-users] slurmstepd: error: _is_a_lwp

Luis Huang lhuang at NYGENOME.ORG
Tue Feb 4 20:50:11 UTC 2020


We have a user that keeps encountering this error with one type of her jobs. Sometimes her jobs will cancel and other times it will run fine.

slurmstepd: error: _is_a_lwp: open() /proc/195420/status failed: No such file or directory
slurmstepd: error: *** JOB 17534 ON pe2dc5-0007 CANCELLED AT 2020-01-23T14:11:36 ***

[root at pe2dc5-0007 ~]# grep 17534  /var/log/slurmd.log
[2020-01-23T14:10:12.789] task_p_slurmd_batch_request: 17534
[2020-01-23T14:10:12.789] task/affinity: job 17534 CPU input mask for node: 0x03000000000000
[2020-01-23T14:10:12.789] task/affinity: job 17534 CPU final HW mask for node: 0x02000000200000
[2020-01-23T14:10:12.790] _run_prolog: prolog with lock for job 17534 ran for 0 seconds
[2020-01-23T14:10:12.875] Launching batch job 17534 for UID 50321
[2020-01-23T14:10:16.937] [17534.batch] task_p_pre_launch: Using sched_affinity for tasks
[2020-01-23T14:10:42.895] [17534.batch] error: _is_a_lwp: open() /proc/195420/status failed: No such file or directory
[2020-01-23T14:11:36.386] [17534.batch] error: *** JOB 17534 ON pe2dc5-0007 CANCELLED AT 2020-01-23T14:11:36 ***
[2020-01-23T14:11:37.394] [17534.batch] sending REQUEST_COMPLETE_BATCH_SCRIPT, error:0 status:15
[2020-01-23T14:11:37.396] [17534.batch] done with job

I'm also seeing lots of spam in the slurmd.logs on the compute nodes themselves whenever this users jobs lands on them.

[2020-02-04T15:29:11.073] [43816.batch] error: _is_a_lwp: 1 read() attempts on /proc/234796/status failed: No such process
[2020-02-04T15:37:24.238] [43682.batch] error: _is_a_lwp: open() /proc/74338/status failed: No such file or directory
[2020-02-04T15:40:42.064] [43916.batch] error: _is_a_lwp: open() /proc/87034/status failed: No such file or directory
[2020-02-04T15:41:11.304] [43840.batch] error: _is_a_lwp: open() /proc/151191/status failed: No such file or directory

Has anyone seen this issue before?

Regards,


Luis Huang | Systems Administrator II, Research Computing
New York Genome Center
101 Avenue of the Americas
New York, NY 10013
O: (646) 977-7291
lhuang at nygenome.org




________________________________

This message is for the recipient’s use only, and may contain confidential, privileged or protected information. Any unauthorized use or dissemination of this communication is prohibited. If you received this message in error, please immediately notify the sender and destroy all copies of this message. The recipient should check this email and any attachments for the presence of viruses, as we accept no liability for any damage caused by any virus transmitted by this email.


More information about the slurm-users mailing list