[slurm-users] Get Information from a Node to the MailProg Command / Add arbitrary information to a job
Matthias Loose
m.loose at mindcode.de
Wed Jun 16 06:38:42 UTC 2021
Hi Slurm Users,
first time posting. I have a new slurm setup where the users can specify
an amount of local node disk space they wish to use. This is a "gres"
resource named "local" and it measures in GB. Once the user has
scheduled a job and it gets executed, I create a folder for this job on
the node and add a XFS project quota for this job with the requested
amount as soft and +5% as hard limit in the node prolog. Then the users
get this folder set as their $TMPDIR in the user prolog. Lastly I remove
the quota and folder on job completion via the node epilog.
This all works great so far. Now I was busying myself with creating an
email script, that would notify the users if the "local" was used up.
Since slurm itself has no idea what the gres: local actually is and is
only managing it as a number I have to do it myself. My thought was that
I would check the quota on job termination in the node epilog to see
where the quota is at, but Ive now ran into the snag on how to get this
information to the mailprog, configured in the slurm.conf.
The arguments to that program appear to be always in this form:
-s SLURM Job_id=327 Name=ddt_clone Ended, Run time 00:05:01,
COMPLETED, ExitCode 0
and the environment of the script only contains the cluster name and
nothing else.
The question now becomes, how do I get information about the quota
status at the end of the job from the node epilog, to the mailprog
running on the head node. I can parse the jobID from the argument line
to the script and thus can get all information via scontrol. So my first
thought was if I could add my own data field to that output, it would
solve my problem. Unfortunately I cant seem to find such an option.
Other than that Ive only come up with writing some sort of file to a
shared storage mount that could be read by the mailprog.
Can you think of a more elegant solution to add this information to the
job so that it can be access on the head by the mailprog with the jobid?
Any help is appreciated!
More information about the slurm-users
mailing list