[slurm-users] slurm error

Felix felix at itim-cj.ro
Fri Jul 8 07:59:13 UTC 2022


hello

I get a lot of error, like this one:

https://monit-grafana.cern.ch/d/siYq3DxZz/wlcg-sitemon-test-details?orgId=20&var-metric=org.sam.CONDOR-JobSubmit-/atlas/Role=lcgadmin&var-dst_hostname=arc7-node.itim-cj.ro&var-timestamp=1656600022000

Or the attached file

There is an error in the text file

Job: 
gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
State: Failed
Specific state: FAILED

Job Error: LRMS error: (-1) Job missing from SLURM

What exactly failed in the system?

Were do I have to look for the error?

Thank you

Felix

-- 

Dr. Eng. Farcas Felix
National Institute of Research and Development of Isotopic and Molecular Technology,
IT - Department - Cluj-Napoca, Romania
Mobile: +40742195323
-------------- next part --------------
=== ETF job log:
Timeout limits configured were:
global -> 1410 minutes
Queuing -> 1380 minutes
Current time: 2022-06-30 14:40:22
Job started: 2022-06-30 14:25:25
Job finished: 2022-06-30 14:40:16
Job tracking times (entered):
Failed -> 2022-06-30 14:40:16
=== Credentials:
x509:
/DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=ddmadmin/CN=531497/CN=Robot: ATLAS Data Management/CN=1144802943/CN=1862137640/CN=1506653970/CN=1677187488/CN=647349180
/atlas/Role=lcgadmin/Capability=NULL
/atlas/Role=NULL/Capability=NULL
/atlas/lcg1/Role=NULL/Capability=NULL
/atlas/usatlas/Role=NULL/Capability=NULL

=== Job description:
JDL([('executable', 'etf_run.sh'), ('join', 'yes'), ('stdout', 'arc.out'), ('outputFiles', ['wnlogs.tgz']), ('inputFiles', ['gridjob.tgz']), ('arguments', '-v atlas -c arc7-node.itim-cj.ro -p 2 -t 600 -T 550 -d'), ('queue', 'debug'), ('runtimeenvironment', 'ENV/PROXY'), ('cputime', '30'), ('walltime', '30')])
=== Job submission command:
arcsub --debug VERBOSE --submission-endpoint-type gridftp --computing-element arc7-node.itim-cj.ro --timeout 120 --joblist /var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/jobs.dat /var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/gridjob.jdl
VERBOSE: Running command: /bin/arcsub --debug VERBOSE --submission-endpoint-type gridftp --computing-element arc7-node.itim-cj.ro --timeout 120 --joblist /var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/jobs.dat /var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/gridjob.jdl
INFO: Configuration (/etc/arc/client.conf) loaded
INFO: Configuration (/omd/sites/etf/.arc/client.conf) loaded
INFO: Using proxy file: /opt/omd/sites/etf/etc/nagios/globus/userproxy.pem--atlas-Role_lcgadmin
INFO: Using CA certificate directory: /etc/grid-security/certificates
VERBOSE: String successfully parsed as nordugrid:xrsl.
VERBOSE: Automatically adding org.nordugrid.ldapng information endpoint type based on desired submission interface
VERBOSE: Automatically adding org.nordugrid.ldapglue2 information endpoint type based on desired submission interface
VERBOSE: LDAPQuery: Initializing connection to arc7-node.itim-cj.ro:2135
VERBOSE: LDAPQuery: Initializing connection to arc7-node.itim-cj.ro:2135
VERBOSE: LDAPQuery: Querying arc7-node.itim-cj.ro
VERBOSE: LDAPQuery: Getting results from arc7-node.itim-cj.ro
VERBOSE: LDAPQuery: Querying arc7-node.itim-cj.ro
VERBOSE: LDAPQuery: Getting results from arc7-node.itim-cj.ro
INFO: Broker Random loaded
VERBOSE: Certificate verification succeeded
VERBOSE: Skipping ComputingEndpoint 'https://arc7-node.itim-cj.ro:443/arex', because it has 'org.nordugrid.arcrest' interface instead of the requested 'org.nordugrid.gridftpjob'.
VERBOSE: Skipping ComputingEndpoint 'https://arc7-node.itim-cj.ro:443/arex', because it has 'org.ogf.glue.emies.activitycreation' interface instead of the requested 'org.nordugrid.gridftpjob'.
INFO: Computing endpoint gsiftp://arc7-node.itim-cj.ro:2811/jobs (type org.nordugrid.gridftpjob) added to the list for submission brokering
VERBOSE: Performing matchmaking against target (gsiftp://arc7-node.itim-cj.ro:2811/jobs).
VERBOSE: Certificate verification succeeded
VERBOSE: Requirement "== ENV/PROXY" satisfied by "ENV/PROXY".
VERBOSE: All requirements satisfied.
VERBOSE: Matchmaking, ExecutionTarget: gsiftp://arc7-node.itim-cj.ro:2811/jobs matches job description
VERBOSE: Performing matchmaking against target (gsiftp://arc7-node.itim-cj.ro:2811/jobs).
VERBOSE: Certificate verification succeeded
VERBOSE: Requirement "== ENV/PROXY" satisfied by "ENV/PROXY".
VERBOSE: All requirements satisfied.
VERBOSE: Matchmaking, ExecutionTarget: gsiftp://arc7-node.itim-cj.ro:2811/jobs matches job description
VERBOSE: Performing matchmaking against target (gsiftp://arc7-node.itim-cj.ro:2811/jobs).
VERBOSE: Certificate verification succeeded
VERBOSE: Requirement "== ENV/PROXY" satisfied by "ENV/PROXY".
VERBOSE: All requirements satisfied.
VERBOSE: Matchmaking, ExecutionTarget: gsiftp://arc7-node.itim-cj.ro:2811/jobs matches job description
VERBOSE: SendCommand: Response: 250 "jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm" is current directory
VERBOSE: Requirement "== ENV/PROXY" satisfied by "ENV/PROXY".
VERBOSE: All requirements satisfied.
VERBOSE: Generating nordugrid:xrsl job description output
VERBOSE: SendCommand: Response: 229 Entering Extended Passive Mode (|||9157|)
VERBOSE: FTP Job Control: Data channel: 193.231.25.228:9157
VERBOSE: Disconnect: Failed aborting - ignoring: Handle not in the proper state
INFO: Transfer from file:/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/gridjob.tgz to gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm/gridjob.tgz
VERBOSE: DataMover: cycle
INFO: Real transfer from file:/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/gridjob.tgz to gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm/gridjob.tgz
VERBOSE: Creating buffer: 1048576 x 2
VERBOSE: DataMove::Transfer: no checksum calculation for file:/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/gridjob.tgz
VERBOSE: Failed to load plugin for URL (empty)
VERBOSE: Failed to load plugin for URL (empty)
INFO: Using buffered transfer method
VERBOSE: Waiting for buffer
INFO: write_thread: get and pass buffers
INFO: [external] Using proxy file: /opt/omd/sites/etf/etc/nagios/globus/userproxy.pem--atlas-Role_lcgadmin
INFO: [external] Using CA certificate directory: /etc/grid-security/certificates
VERBOSE: [external] Using insecure data transfer
VERBOSE: [external] start_writing_ftp: mkdir
VERBOSE: [external] mkdir_ftp: making gsiftp://arc7-node.itim-cj.ro:2811/jobs
VERBOSE: [external] mkdir_ftp: making gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
VERBOSE: [external] start_writing_ftp: put
VERBOSE: [external] start_writing_ftp: waiting for data tag
VERBOSE: [external] start_writing_ftp: waiting for data chunk
VERBOSE: write_thread: for_write eof
VERBOSE: write_thread: exiting
VERBOSE: [external] start_writing_ftp: data chunk: 0 88918
VERBOSE: [external] start_writing_ftp: waiting for data tag
VERBOSE: [external] start_writing_ftp: waiting for data chunk
VERBOSE: [external] start_writing_ftp: data chunk: 88918 0
VERBOSE: [external] start_writing_ftp: waiting for some buffers sent
VERBOSE: [external] ftp_write_thread: waiting for transfer complete
VERBOSE: [external] ftp_write_thread: waiting for buffers released
VERBOSE: [external] ftp_write_thread: exiting
VERBOSE: buffer: read EOF : yes
VERBOSE: buffer: write EOF: yes
VERBOSE: buffer: error : no, read: no, write: no
VERBOSE: Closing read channel
VERBOSE: Closing write channel
VERBOSE: Checksum not computed
INFO: Transfer from file:/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/etf_run.sh to gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm/etf_run.sh
VERBOSE: DataMover: cycle
INFO: Real transfer from file:/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/etf_run.sh to gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm/etf_run.sh
VERBOSE: Creating buffer: 1048576 x 2
VERBOSE: DataMove::Transfer: no checksum calculation for file:/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/etf_run.sh
VERBOSE: Failed to load plugin for URL (empty)
VERBOSE: Failed to load plugin for URL (empty)
INFO: Using buffered transfer method
VERBOSE: Waiting for buffer
INFO: write_thread: get and pass buffers
VERBOSE: write_thread: for_write eof
VERBOSE: write_thread: exiting
INFO: [external] Using proxy file: /opt/omd/sites/etf/etc/nagios/globus/userproxy.pem--atlas-Role_lcgadmin
INFO: [external] Using CA certificate directory: /etc/grid-security/certificates
VERBOSE: [external] Using insecure data transfer
VERBOSE: [external] start_writing_ftp: mkdir
VERBOSE: [external] mkdir_ftp: making gsiftp://arc7-node.itim-cj.ro:2811/jobs
VERBOSE: [external] mkdir_ftp: making gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
VERBOSE: [external] start_writing_ftp: put
VERBOSE: [external] start_writing_ftp: waiting for data tag
VERBOSE: [external] start_writing_ftp: waiting for data chunk
VERBOSE: [external] start_writing_ftp: data chunk: 0 6898
VERBOSE: [external] start_writing_ftp: waiting for data tag
VERBOSE: [external] start_writing_ftp: waiting for data chunk
VERBOSE: [external] start_writing_ftp: data chunk: 6898 0
VERBOSE: [external] start_writing_ftp: waiting for some buffers sent
VERBOSE: [external] ftp_write_thread: waiting for transfer complete
VERBOSE: [external] ftp_write_thread: waiting for buffers released
VERBOSE: [external] ftp_write_thread: exiting
VERBOSE: buffer: read EOF : yes
VERBOSE: buffer: write EOF: yes
VERBOSE: buffer: error : no, read: no, write: no
VERBOSE: Closing read channel
VERBOSE: Closing write channel
VERBOSE: Checksum not computed
VERBOSE: Generating nordugrid:xrsl job description output
Job submitted with jobid: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
VERBOSE: Failed checking database (/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/jobs.dat)
VERBOSE: Unable to create job database (/var/lib/gridprobes/atlas.Role.lcgadmin/arc/arc7-node.itim-cj.ro/jobs.dat)

=== Job log:
Job: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
State: Failed
Specific state: FAILED
Job Error: LRMS error: (-1) Job missing from SLURM
Owner: 2e074e436081cef9e8c774c660d78fc30a967bf27f726c6d24fbb51d031412e380d9ab0ecadc71fb003f6d37a6cbb343ad587237e04d55541acebb141c2fa1e9
Other Messages: SubmittedVia=org.nordugrid.gridftpjob
Queue: debug
Requested Slots: 1
Stdin: /dev/null
Stdout: arc.out
Stderr: arc.out
Submitted: 2022-06-30 14:25:18
End Time: 2022-06-30 14:32:17
Submitted from: 188.184.29.54:9928
Requested CPU Time: 30 minutes
Results must be retrieved before: 2022-07-07 14:32:17
Proxy valid until: 2022-07-01 13:54:43
Entry valid from: 2022-06-30 14:38:44
Entry valid for: 3 hours
ID on service: LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
Service information URL: ldap://arc7-node.itim-cj.ro:2135/Mds-Vo-Name=local,o=grid??sub?(objectClass=*) (org.nordugrid.ldapng)
Job status URL: ldap://arc7-node.itim-cj.ro:2135/Mds-Vo-Name=local,o=grid??sub?(nordugrid-job-globalid=gsiftp:\2f\2farc7-node.itim-cj.ro:2811\2fjobs\2fLPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm) (org.nordugrid.ldapng)
Job management URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs (org.nordugrid.gridftpjob)
Stagein directory URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
Stageout directory URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
Session directory URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm

Status of 1 jobs was queried, 1 jobs returned information

=== Last job status:
Job: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
State: Failed
Specific state: FAILED
Job Error: LRMS error: (-1) Job missing from SLURM
Owner: 2e074e436081cef9e8c774c660d78fc30a967bf27f726c6d24fbb51d031412e380d9ab0ecadc71fb003f6d37a6cbb343ad587237e04d55541acebb141c2fa1e9
Other Messages: SubmittedVia=org.nordugrid.gridftpjob
Queue: debug
Requested Slots: 1
Stdin: /dev/null
Stdout: arc.out
Stderr: arc.out
Submitted: 2022-06-30 14:25:18
End Time: 2022-06-30 14:32:17
Submitted from: 188.184.29.54:9928
Requested CPU Time: 30 minutes
Results must be retrieved before: 2022-07-07 14:32:17
Proxy valid until: 2022-07-01 13:54:43
Entry valid from: 2022-06-30 14:38:44
Entry valid for: 3 hours
ID on service: LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
Service information URL: ldap://arc7-node.itim-cj.ro:2135/Mds-Vo-Name=local,o=grid??sub?(objectClass=*) (org.nordugrid.ldapng)
Job status URL: ldap://arc7-node.itim-cj.ro:2135/Mds-Vo-Name=local,o=grid??sub?(nordugrid-job-globalid=gsiftp:\2f\2farc7-node.itim-cj.ro:2811\2fjobs\2fLPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm) (org.nordugrid.ldapng)
Job management URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs (org.nordugrid.gridftpjob)
Stagein directory URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
Stageout directory URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm
Session directory URL: gsiftp://arc7-node.itim-cj.ro:2811/jobs/LPGNDmmHAR1npzuSSqjSXJOqK8uTEmABFKDmOANKDmABFKDmAvk8Xm

Status of 1 jobs was queried, 1 jobs returned information

Failed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2827 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20220708/07fab6ec/attachment.bin>


More information about the slurm-users mailing list