[slurm-users] how do array jobs stored in slurmdb database?

taleintervenor at sjtu.edu.cn taleintervenor at sjtu.edu.cn
Fri Jan 29 10:23:33 UTC 2021


Well, maybe my example in first mail caused some misunderstanding. We just use sacct to check some job records manually in the maintenance process after the system fault. Our account and billing system is an commercial product which unfortunately also not provide the ability to adjust billing rate for individual job. I'm not sure how it get the job data from slurm. But as long as sacct can not find the job record, the billing system of course won't generate billing for it.

-----邮件原件-----
发件人: Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk> 
发送时间: 2021年1月29日 15:40
收件人: slurm-users at lists.schedmd.com
主题: Re: [slurm-users] how do array jobs stored in slurmdb database?

On 1/29/21 3:51 AM, taleintervenor at sjtu.edu.cn wrote:
> The reason we need to delete job record from database is our billing system will calculate user cost from these historical records. But after a slurm system faulty there will be some specific jobs which should not be charged. it seems the best practical solution is to directly modify the database since slurm does not provide commend to delete job records.

I think the sreport command is normally used to generate accounting reports.  I have described this in my Wiki page https://wiki.fysik.dtu.dk/niflheim/Slurm_accounting#accounting-reports

I would like to understand how you have chosen to calculate user cost of a given job using the sacct command?  The sacct command will report accounting for each individual job, so which sacct options do you use to get the total cost value for a user with many jobs?

/Ole


> -----邮件原件-----
> 发件人: Ole Holm Nielsen <Ole.H.Nielsen at fysik.dtu.dk>
> 发送时间: 2021年1月29日 0:14
> 收件人: slurm-users at lists.schedmd.com
> 主题: Re: [slurm-users] how do array jobs stored in slurmdb database?
> 
> On 1/28/21 11:59 AM, taleintervenor at sjtu.edu.cn wrote:
>>   From query command such as ‘sacct -j 123456’ I can see a series of 
>> jobs named 123456_1, 123456_2, etc. And I need to delete these job 
>> records from mysql database for some reason.
>>
>> But in job_table of slurmdb, there is only one record with id_job=123456.
>> not any record has a id like 123456_2. After I delete the
>> id_job=123456 record, sacct result show the 123456_1 job disappeared, 
>> but other jobs in the array still exist. So how do these array job recorded in the database?
>> And how to completely delete all the jobs in a array?
> 
> I think you need to study how job arrays are implemented in Slurm, 
> please read https://slurm.schedmd.com/job_array.html
> 
> You will discover that job arrays, when each individual jobs start running, become independent jobs and obtain their own unique JobIDs.  It must be those JobIDs that will appear in the Slurm database.
> 
> This command illustrates the different JobID types (please read the squeue manual page about ArrayJobID,JobArrayID,JobID):
> 
> $ squeue  -j 3394902 -O ArrayJobID,JobArrayID,JobID
> ARRAY_JOB_ID        JOBID               JOBID
> 3394902             3394902_[18-91]     3394902
> 3394902             3394902_17          3394919
> 3394902             3394902_16          3394918
> 3394902             3394902_15          3394917
> 3394902             3394902_14          3394916
> 
> The last 4 jobs are running, while the first job i still pending.
> 
> Perhaps you may find my "showjob" script useful:
> https://github.com/OleHolmNielsen/Slurm_tools/tree/master/jobs
> In this script you can see how I work with array jobs.
> 
> I did not answer your question about how to delete array jobs in the Slurm database.  But in most cases manipulating the database directly is probably a bad idea.  I wonder why you want to delete jobs in the database at all?
> 
> Best regards,
> Ole






More information about the slurm-users mailing list