[slurm-users] Can sinfo/scontrol be called from job_submit.lua?

Groner, Rob rug262 at psu.edu
Tue Oct 11 21:14:50 UTC 2022


I am testing a method where, when a job gets submitted asking for specific features, then, if those features don't exist, I'll do something.

The job_submit.lua plugin has worked to determine when a job is submitted asking for the specific features.  I'm at the point of checking if those features exist already (the features are part of a nodeset and part of a partition....so jobs submitted asking for those features will just go to pending if no nodes exist that offer those features).  I thought to use "sinfo" to get a list of existing features on the system...but it fails to run.  The same for trying to use scontrol.

When I submit a job that requests the features, and so the sinfo command runs, it all hangs for about 10 seconds and then says:

[me at testsch (RC) slurm] sbatch ./gctest_account_test.sh
sbatch: error: Batch job submission failed: Socket timed out on send/recv operation

In the slurmctld.log, I see:
[2022-10-10T17:12:13.933] error: slurm_msg_sendto: address:port=10.6.88.99:40100 msg_type=4004: Unexpected missing socket error


I'll note that "sinfo -V" works...but I suspect it's because it's not trying to communicate outside of itself with the slurmctld.

Any suggestions on what to try?  Or is there a better slurm-ic way to do what I'm trying to do?

Rob


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20221011/27323a7c/attachment.htm>


More information about the slurm-users mailing list