[slurm-users] 20.11.1 on Cray: job_submit.lua: SO loaded on CtlD restart: script skipped when job submitted

Kevin Buckley Kevin.Buckley at pawsey.org.au
Thu Dec 17 02:21:52 UTC 2020


Probaly not specific to 20.11.1, nor a Cray, but has anyone out there seen anything like this.

As the slurmctld restarts, after upping the debug level, it all look hunky dory,

[2020-12-17T09:23:46.204] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/job_submit_cray_aries.so
[2020-12-17T09:23:46.205] debug3: Success.
[2020-12-17T09:23:46.206] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/job_submit_lua.so
[2020-12-17T09:23:46.207] debug3: slurm_lua_loadscript: job_submit/lua: loading Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:23:46.208] debug3: Success.
[2020-12-17T09:23:46.209] debug3: Trying to load plugin /opt/slurm/20.11.1/lib64/slurm/prep_script.so
[2020-12-17T09:23:46.210] debug3: Success.

but, at the point a submiited job that should pass through the job_submit script,

[2020-12-17T09:26:06.806] debug3: job_submit/lua: slurm_lua_loadscript: skipping loading Lua script: /etc/opt/slurm/job_submit.lua
[2020-12-17T09:26:06.807] debug3: assoc_mgr_fill_in_user: found correct user: someuser(12345)
[2020-12-17T09:26:06.808] debug5: assoc_mgr_fill_in_assoc: looking for assoc of user=someuser(12345), acct=accnts0001, cluster=clust, partition=acceptance
[2020-12-17T09:26:06.809] debug3: assoc_mgr_fill_in_assoc: found correct association of user=someuser(12345), acct=accnts0001, cluster=clust, partition=acceptance to assoc=67 acct=accnts0001


Reason I went looking is that the job_submit.lua should be telling
me, the job submitter, to "sling my hook" as I have, deliberately,
left something out.

FWIW, the debug level here goes all the way to 5, so I was hoping
for a little more info as to why it is skipping it.

The skip is occuring, in src/lua/slurm_lua.c, because of this trap

         if (st.st_mtime <= *load_time) {
                 debug3("%s: %s: skipping loading Lua script: %s", plugin,
                        __func__, script_path);
                 return SLURM_SUCCESS;
         }
         debug3("%s: %s: loading Lua script: %s", __func__, plugin, script_path);

where "st" is a stat struct, but I am currently none the wiser as why
such a condition would be (maybe even, would need to be) triggered?

The job submit script is certainly "younger" than the time of the slurmctld
restart, and of the job submission, be then, why wouldn't it be?

Kevin
-- 
Supercomputing Systems Administrator
Pawsey Supercomputing Centre



More information about the slurm-users mailing list