[slurm-users] Job step aborted
    Mahmood Naderan 
    mahmood.nt at gmail.com
       
    Sat May 19 06:06:07 MDT 2018
    
    
  
Excuse me, how can I tell slurm not to terminate until all steps
(tasks) are finished?
Regards,
Mahmood
On Fri, May 18, 2018 at 10:35 AM, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
> OK I understand that. However, there is a issue with ntasks=1.
> Assume a user wants to launch an application with the number of cores
> in the command line argument. Taking into mind that the cpu limit for
> the partition is 20 cores, the following example
>
> [mahmood at rocks7 ~]$ srun --x11 -A y8 -p RUBY --mem=8GB --pty bash
> [mahmood at compute-0-6 ~]$ /state/partition1/scfd/sc -t10
>
> raises two problems:
> 1- Slurm assumes that the user job is using only one core. That means
> a user can create 20 interactive sessions and in each of the sessions
> launch the program with 10 threads and bypassing the core limit I set
> before.
>
> 2- The user that start the session with ntasks=1 (or not specifying
> that) and then cheat the system by launching the program with more
> than cpu limit (specifying -t50).
>
> Any idea?
>
>
>
> Regards,
> Mahmood
>
>
>
>
> On Thu, May 17, 2018 at 11:40 PM, Matthieu Hautreux
> <matthieu.hautreux at gmail.com> wrote:
>>
>>
>> It means what is written : your job is terminated because 9 tasks out of 10
>> exited more than 60s before.
>>
>> The logic behind the 60 seconds (configurable) is described in the srun man
>> page. You should look at it closely.
>>
>> You should also look at the FAQ here https://slurm.schedmd.com/faq.html.
>>
>> You should set --ntask=1, if I properly guess your goal.
>>
>> HTH
>>
    
    
More information about the slurm-users
mailing list