[slurm-users] Slurm overhead

Ryan Novosielski novosirj at rutgers.edu
Tue Apr 24 10:48:27 MDT 2018

I would likely crank up the debugging on the slurmd process and look at the log files to see what’s going on in that time. You could also watch the job via top or other means (on Linux, you can press “1” to see line-by-line for each CPU core), or use strace on the process itself. Presumably something is happening that’s either eating up 4 minutes, or the job is running 4 minutes more slowly and you’ll need to figure out why. I know that our jobs run via the scheduler perform about on par for the hardware, and that jobs start fairly immediately.

> On Apr 22, 2018, at 2:06 AM, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
> I ran some other tests and got the nearly the same results. That 4
> minutes in my previous post means about 50% overhead. So, 24000
> minutes on direct run is about 35000 minutes via slurm. I will post
> with details later. the methodology I used is
> 1- Submit a job to a specific node (compute-0-0) via slurm on the
> frontend and get te elapsed run time (or add  time command in the
> script)
> 2- ssh to the specific node (compute-0-0) and directly run the program
> with time command.
> So, the hardware is the same. I have to say that the frontnend has
> little differences with compute-0-0 but that is not important because
> as I said before, the program is installed on /usr and not the shared
> file system.
> I think the slurm process which query the node to collect runtime
> information is not negligible. For example, squeue updates the runtime
> every seconds. How can I tell slurm not to query very soon. For
> example, update the node information every 10 seconds. Though I am not
> sure how much effect that has.
> Regards,
> Mahmood
> On Fri, Apr 20, 2018 at 10:39 AM, Loris Bennett
> <loris.bennett at fu-berlin.de> wrote:
>> Hi Mahmood,
>> Rather than the overhead being 50%, maybe it is just 4 minutes.  If
>> another job runs for a week, that might not be a problem.  In addition,
>> you just have one data point, so it is rather difficult to draw any
>> conclusion.
>> However, I think that it is unlikely that Slurm is responsible for
>> this difference.  What can happen is that, if a node is powered down
>> before the job starts, then the clock starts ticking as soon as the job
>> is assigned to the node.  This means that the elapsed time also includes
>> the time for the node to be provisioned.  If this is not relevant in
>> your case, then you are probably just not comparing like with like,
>> e.g. is the hardware underlying /tmp identical in both cases?

|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 236 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180424/e09afa7a/attachment.sig>

More information about the slurm-users mailing list