[slurm-users] Job step aborted
Mahmood Naderan
mahmood.nt at gmail.com
Fri May 18 00:05:49 MDT 2018
OK I understand that. However, there is a issue with ntasks=1.
Assume a user wants to launch an application with the number of cores
in the command line argument. Taking into mind that the cpu limit for
the partition is 20 cores, the following example
[mahmood at rocks7 ~]$ srun --x11 -A y8 -p RUBY --mem=8GB --pty bash
[mahmood at compute-0-6 ~]$ /state/partition1/scfd/sc -t10
raises two problems:
1- Slurm assumes that the user job is using only one core. That means
a user can create 20 interactive sessions and in each of the sessions
launch the program with 10 threads and bypassing the core limit I set
before.
2- The user that start the session with ntasks=1 (or not specifying
that) and then cheat the system by launching the program with more
than cpu limit (specifying -t50).
Any idea?
Regards,
Mahmood
On Thu, May 17, 2018 at 11:40 PM, Matthieu Hautreux
<matthieu.hautreux at gmail.com> wrote:
>
>
> It means what is written : your job is terminated because 9 tasks out of 10
> exited more than 60s before.
>
> The logic behind the 60 seconds (configurable) is described in the srun man
> page. You should look at it closely.
>
> You should also look at the FAQ here https://slurm.schedmd.com/faq.html.
>
> You should set --ntask=1, if I properly guess your goal.
>
> HTH
>
More information about the slurm-users
mailing list