[slurm-users] Slurmstepd sleep processes

Christopher Benjamin Coffey Chris.Coffey at nau.edu
Fri Aug 3 15:42:26 MDT 2018


Hello,

Has anyone observed "sleep 100000000" processes on their compute nodes? They seem to be tied to the slurmstepd extern process in slurm:

4 S root     136777      1  0  80   0 - 73218 do_wai 05:48 ?        00:00:01 slurmstepd: [13220317.extern]
0 S root     136782 136777  0  80   0 - 25229 hrtime 05:48 ?        00:00:00  \_ sleep 100000000
4 S root     136784      1  0  80   0 - 73280 do_wai 05:48 ?        00:00:02 slurmstepd: [13220317.batch]
4 S tes87    136789 136784  0  80   0 - 26520 do_wai 05:48 ?        00:00:00  \_ /bin/bash /var/spool/slurm/slurmd/job13220317/slurm_script
4 S root     136807      1  0  80   0 - 107157 do_wai 05:48 ?       00:00:01 slurmstepd: [13220317.1]

I'm not exactly sure what the extern piece is for. Anyone know what this is all about? Is this normal? We just saw this the other day while investigating some issues. Sleeping for 3.17 years seems strange. Any help would be appreciated, thanks!

Best,
Chris

—
Christopher Coffey
High-Performance Computing
Northern Arizona University
928-523-1167
 



More information about the slurm-users mailing list