Hi Ryan,
My apologies for letting this reply languish. Thank you for your reply - we have a working plugin now.
I believe the issue using the plugin without restarting slurmctld first was (for some reason I still haven't figured out) causing slurmctld to crash and I had attributed it to a problem with the plugin itself.
I found that restarting slurmctld was required. Without restarting, even if I run scontrol reconfigure, I was getting salloc: error: Job submit/allocate failed: Unexpected message received. It's consistent - I just tested it again to double check before sending this reply and the smallest change to the plugin will cause slurmctld to crash if I don't restart it first. Maybe that was mentioned somewhere in the job_submit_plugins documentation but if so I missed it and that's pretty much all that we needed.
Thanks again!
Kind Regards, Glen
========================================== Glen MacLachlan, PhD *Cyberinfrastructure Specialist*
Research Technology Services The George Washington University 44983 Knoll Square Enterprise Hall, 328L Ashburn, VA 20147
==========================================
On Tue, Apr 9, 2024 at 4:47 PM Ryan Cox via slurm-users < slurm-users@lists.schedmd.com> wrote:
Glen,
I don't think I see it in your message, but are you pointing to the plugin in slurm.conf with JobSubmitPlugins=? I assume you are but it's worth checking.
Ryan
On 4/9/24 10:19, Glen MacLachlan via slurm-users wrote:
Hi,
We have a plugin in Lua that mostly does what we want but there are features available in the C extension that are not available to lua. For that reason, we are attempting to convert to C using the guidance found here: https://slurm.schedmd.com/job_submit_plugins.html#building. We arrived here because the lua plugins don't seem to stretch enough to cover the use case we were looking at, i.e., branching off of the value of alloc_id or, for that matter, get_sid().
The goal is to disallow interactive allocations (i.e., salloc) on specific partitions while allowing it on others. However, we've run into an issue with our C plugin right out of the gate and I've included a minimal reproducer as an example which is basically a "Hello World" type of test (job_submit_disallow_salloc.c, see attached).
*Expectation* What we expect to happen is a sort of hello-world result with a message being written to a /tmp/min_repo.log but that does not occur. It seems that the plugin does not get run at all when jobs are submitted. Jobs still run as expected but the plugin seems to be ignored.
*Steps* We compile gcc -fPIC -DHAVE_CONFIG_H -I /modules/source/slurm-23.02.4 -g -O2 -pthread -fno-gcse -Werror -Wall -g -O0 -fno-strict-aliasing -MT job_submit_disallow_salloc.lo -MD -MP -MF .deps/job_submit_disallow_salloc.Tpo -c job_submit_disallow_salloc.c -o .libs/job_submit_disallow_salloc.o
mv .deps/job_submit_disallow_salloc.Tpo .deps/job_submit_disallow_ salloc.Plo
and link gcc -shared -fPIC -DPIC .libs/job_submit_disallow_salloc.o -O2 -pthread -O0 -pthread -Wl,-soname -Wl,job_submit_disallow_salloc.so -o job_submit_disallow_salloc.so
Check links after copying to /usr/lib64/slurm: ldd /usr/lib64/slurm/job_submit_disallow_salloc.so linux-vdso.so.1 (0x00007ffe467aa000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c02095000) libc.so.6 => /lib64/libc.so.6 (0x00007f1c01cd0000) /lib64/ld-linux-x86-64.so.2 (0x00007f1c024b7000)
Can someone point out what we are doing incorrectly or how we might troubleshoot this issue?
Kindest regards, Glen
*Reproducer* The minimal reproducer is basically a "hello world" for C extensions which I've pasted below (I've also attached it for convenience):
#include <slurm/slurm.h> #include <slurm/slurm_errno.h> #include <stdio.h> #include "src/slurmctld/slurmctld.h"
const char plugin_name[] = "Min Reproducer"; const char plugin_type[] = "job_submit/disallow_salloc"; const uint32_t plugin_version = SLURM_VERSION_NUMBER;
extern int job_submit(job_desc_msg_t *job_desc, uint32_t submit_uid, char **err_msg) { FILE *fp; fp = fopen("/tmp/min_repo.log", "w"); fprintf(fp,"Hello!");
fclose(fp); return SLURM_SUCCESS;
}
int job_modify(job_desc_msg_t *job_desc, job_record_t *job_ptr, uint32_t submit_uid, char **err_msg) { return SLURM_SUCCESS; }
-- Ryan Cox Director Office of Research Computing Brigham Young University
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com