[slurm-users] new user; ExitCode reporting

Chris Samuel chris at csamuel.org
Fri Nov 23 05:09:58 MST 2018


On Friday, 23 November 2018 10:21:09 PM AEDT Matthew Goulden wrote:

> I've spent some time reading through the (excellent, frankly) documentation
> for sbatch and job_exit_code and while learning a great deal nothing has
> explained with anomaly.

I suspect Slurm is trying to be helpful, as exit codes > 128 are usually the 
result of a process being terminated by signal N + 128, so sacct subtracts 128 
from exit values greater than 128.   The bash manual page says:

       The return value of a simple command is its exit status, or 128+n if
       the command is terminated by signal n.

This is what sacct does (it appears the right value is in the DB):

                        if (exit_code != NO_VAL) {
                                if (WIFSIGNALED(exit_code))
                                        tmp_int2 = WTERMSIG(exit_code);
                                else if (WIFEXITED(exit_code))
                                        tmp_int = WEXITSTATUS(exit_code);
                                if (tmp_int >= 128)
                                        tmp_int -= 128;
                        }

For you 128+13 = 141.

*If* your job uses srun you can ask Slurm to tell you the DerivedExitCode, but 
that will be the highest exit code from all the invocations, but it will be 
your expected number as it's not converted by sacct.

$ sbatch --wrap 'srun bash -c "exit 141"'
Submitted batch job 1795583

$ sacct -j 1795583
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
1795583            wrap    skylake   hpcadmin          1     FAILED     13:0
1795583.bat+      batch              hpcadmin          1     FAILED     13:0
1795583.ext+     extern              hpcadmin          1  COMPLETED      0:0
1795583.0          bash              hpcadmin          1     FAILED     13:0

$ sacct -j 1795583 -o jobid,jobname,state,derivedexitcode -X
       JobID    JobName      State DerivedExitCode
------------ ---------- ---------- ---------------
1795583            wrap     FAILED           141:0


Hope that helps!

All the best,
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC






More information about the slurm-users mailing list