No ideas on fixing this in Slurm, but in userspace in the past when faced with huge array jobs which had really short jobs like this I've nudged them toward batching up array elements in each job to extend it. Say the user wants to run 50000 tasks, 30 seconds each. Batching those up in groups of 10 will make for 5 minute jobs so (off the top of my head pseudocode):

#SBATCH --array=1-50000:10

starttask=${SLURM_ARRAY_TASK_ID}
endtask=$(( ${starttask} + (${SLURM_ARRAY_TASK_STEP}  - 1)
task=${starttask}
while [[ $task -le $(( ${endtask} )) ]]; do
    someapp -param=${task}
done

griznog

On Mon, Sep 9, 2024 at 1:58 PM Ransom, Geoffrey M. via slurm-users <slurm-users@lists.schedmd.com> wrote:

 

Hello

   We have another batch of new users and some more batches of large array jobs with very short runtimes due to errors in the jobs or just by design. Trying to deal with these issues, Setting ArrayTaskThrottle and user education, I had a thought that it would be very nice to have a limit on how many jobs can start in a given minute for users, so if they posted a 200000 array job with 15 second tasks then the scheduler wouldn’t launch more than a 100 or 200 per minute and be less likely to bog down, but if they had longer runtimes (1 hour +) it would take a few extra minutes to start using all the resources they are allowed to but not add much overall delay to the whole set of jobs.

 

I thought about adding something to our CLI filter, but usually these jobs are asking for a runtime of 3-4 hours even though they run for <30 seconds so the submit options don’t indicate the problem jobs ahead of time.

 

We currently limit our users to %80 of the available resources which is way more than slurm needs to bog down with fast turnover jobs, but we have users who complain that they can’t use that other 20% when the cluster is not busy so putting in lower default restrictions is not currently an option.

 

Has this already been discussed and isn’t feasible for technical reasons? (Not finding anything like this yet searching the archives)

 

I think slurm used have a feature request severity on their bug submission site. Is there a severity level they prefer to have suggested requests like this?

 

Thanks

 


--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com