[slurm-users] Slurm Jobscript Archiver
Lech Nieroda
lech.nieroda at uni-koeln.de
Fri Jun 14 09:21:18 UTC 2019
Hello Chris,
we’ve tried out your archiver and adapted it to our needs, it works quite well.
The changes:
- we get lots of jobs per day, ca. 3k-5k, so storing them as individual files would waste too much inodes and 4k-blocks. Instead everything is written into two log files (job_script.log and job_env.log) with the prefix „<timestamp> <user> <jobid>“ in each line. In this way one can easily grep and cut the corresponding job script or environment. Long term storage and compression is handled by logrotate, with standard compression settings
- the parsing part can fail to produce a username, thus we have introduced a customized environment variable that stores the username and can be read directly by the archiver
- most of the program’s output, including debug output, is handled by the logger and stored in a jobarchive.log file with an appropriate timestamp
- the logger uses a va_list to make multi-argument log-oneliners possible
- signal handling is reduced to the debug-level incease/decrease
- file handling is mostly relegated to HelperFn, directory trees are now created automatically
- the binary header of the env-file and the binary footer of the script-file are filtered, thus the resulting files are recognized as ascii files
If you are interested in our modified version, let me know.
Kind regards,
Lech
> Am 09.05.2019 um 17:37 schrieb Christopher Benjamin Coffey <Chris.Coffey at nau.edu>:
>
> Hi All,
>
> We created a slurm job script archiver which you may find handy. We initially attempted to do this through slurm with a slurmctld prolog but it really bogged the scheduler down. This new solution is a custom c++ program that uses inotify to watch for job scripts and environment files to show up out in /var/spool/slurm/hash.* on the head node. When they do, the program copies the jobscript and environment out to a local archive directory. The program is multithreaded and has a dedicated thread watching each hash directory. The program is super-fast and lightweight and has no side effects on the scheduler. The program by default will apply ACLs to the archived job scripts so that only the owner of the jobscript can read the files. Feel free to try it out and let us know how it works for you!
>
> https://github.com/nauhpc/job_archive
>
> Best,
> Chris
>
> —
> Christopher Coffey
> High-Performance Computing
> Northern Arizona University
> 928-523-1167
>
>
More information about the slurm-users
mailing list