[slurm-users] What is the 2^32-1 values in "stepd_connect to <jobid>.4294967295 failed" telling you
Christopher Samuel
chris at csamuel.org
Fri Mar 8 18:16:13 UTC 2019
On 3/8/19 12:25 AM, Kevin Buckley wrote:
> error: stepd_connect to <jobid>.1 failed: No such file or directory
> error: stepd_connect to <jobid>.4294967295 failed: No such file or
> directory
>
> We can imagine why a job that got killed in step 0 might still be looking
> for the <jobid>.1 step but the <jobid>.2^32-1 is beyond our imagination.
That's the internal representation of the extern step from memory.
All the best,
Chris
--
Chris Samuel : http://www.csamuel.org/ : Berkeley, CA, USA
More information about the slurm-users
mailing list