Hi Ryan, 

My apologies for letting this reply languish. Thank you for your reply - we have a working plugin now. 

I believe the issue using the plugin without restarting slurmctld first was (for some reason I still haven't figured out) causing slurmctld to crash and I had attributed it to a problem with the plugin itself. 

I found that restarting slurmctld was required. Without restarting, even if I run scontrol reconfigure, I was getting
salloc: error: Job submit/allocate failed: Unexpected message received. 
It's consistent - I just tested it again to double check before sending this reply and the smallest change to the plugin will cause slurmctld to crash if I don't restart it first. Maybe that was mentioned somewhere in the job_submit_plugins documentation but if so I missed it and that's pretty much all that we needed. 

Thanks again!

Kind Regards,
Glen

==========================================
Glen MacLachlan, PhD
Cyberinfrastructure Specialist 
Research Technology Services
The George Washington University
44983 Knoll Square
Enterprise Hall, 328L
Ashburn, VA 20147
==========================================






On Tue, Apr 9, 2024 at 4:47 PM Ryan Cox via slurm-users <slurm-users@lists.schedmd.com> wrote:
Glen,

I don't think I see it in your message, but are you pointing to the plugin in slurm.conf with JobSubmitPlugins=?  I assume you are but it's worth checking.

Ryan

On 4/9/24 10:19, Glen MacLachlan via slurm-users wrote:
Hi, 

We have a plugin in Lua that mostly does what we want but there are features available in the C extension that are not available to lua. For that reason, we are attempting to convert to C using the guidance found here: https://slurm.schedmd.com/job_submit_plugins.html#buildingWe arrived here because the lua plugins don't seem to stretch enough to cover the use case we were looking at, i.e., branching off of the value of alloc_id or, for that matter, get_sid().

The goal is to disallow interactive allocations (i.e., salloc) on specific partitions while allowing it on others. However, we've run into an issue with our C plugin right out of the gate and I've included a minimal reproducer as an example which is basically a "Hello World" type of test (job_submit_disallow_salloc.c, see attached). 

Expectation
What we expect to happen is a sort of hello-world result with a message being written to a /tmp/min_repo.log but that does not occur. It seems that the plugin does not get run at all when jobs are submitted. Jobs still run as expected but the plugin seems to be ignored. 

Steps
We compile 
gcc -fPIC -DHAVE_CONFIG_H -I /modules/source/slurm-23.02.4 -g -O2 -pthread -fno-gcse -Werror -Wall -g -O0 -fno-strict-aliasing -MT job_submit_disallow_salloc.lo -MD -MP -MF .deps/job_submit_disallow_salloc.Tpo -c job_submit_disallow_salloc.c -o .libs/job_submit_disallow_salloc.o

mv .deps/job_submit_disallow_salloc.Tpo .deps/job_submit_disallow_salloc.Plo

and link
gcc -shared -fPIC -DPIC .libs/job_submit_disallow_salloc.o -O2 -pthread -O0 -pthread -Wl,-soname -Wl,job_submit_disallow_salloc.so    -o job_submit_disallow_salloc.so

 
Check links after copying to /usr/lib64/slurm:
ldd /usr/lib64/slurm/job_submit_disallow_salloc.so
linux-vdso.so.1 (0x00007ffe467aa000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f1c02095000)
libc.so.6 => /lib64/libc.so.6 (0x00007f1c01cd0000)
/lib64/ld-linux-x86-64.so.2 (0x00007f1c024b7000)



Can someone point out what we are doing incorrectly or how we might troubleshoot this issue?

Kindest regards, 
Glen



Reproducer
The minimal reproducer is basically a "hello world" for C extensions which I've pasted below (I've also attached it for convenience):

#include <slurm/slurm.h>
#include <slurm/slurm_errno.h>
#include <stdio.h>
#include "src/slurmctld/slurmctld.h"

const char plugin_name[] = "Min Reproducer";
const char plugin_type[] = "job_submit/disallow_salloc";
const uint32_t plugin_version = SLURM_VERSION_NUMBER;

extern int job_submit(job_desc_msg_t *job_desc, uint32_t submit_uid,
                      char **err_msg)
{
        FILE *fp;
        fp = fopen("/tmp/min_repo.log", "w");
        fprintf(fp,"Hello!");

        fclose(fp);
        return SLURM_SUCCESS;
}

int job_modify(job_desc_msg_t *job_desc, job_record_t *job_ptr,
               uint32_t submit_uid, char **err_msg)
{
        return SLURM_SUCCESS;
}




    

-- 
Ryan Cox
Director
Office of Research Computing
Brigham Young University

--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com