[slurm-users] Slurm overhead

John Hearns hearnsj at googlemail.com
Thu Apr 26 02:26:02 MDT 2018


Mahmood,  do you haave Hyperthreading enabled?
That may be the root cause of your problem. If you have hyperhtreading,
then when you start to run more than the number of PHYSICAL cores you
will get over-subscription. Now, with certain workloads that is fine - that
is what hyperhtreading is all about.
However HPC workloads have traditionalyl not benifited from hyperhreading.

I would suggest the following:

a) share the result of  cat /proc/cpuinfo with is here so we can figure out
f HT is enabled
b) learn how to mimic HT being switched on or off by setting every odd
numbered CPU core to 'offline'
  This means you can 'play' with HT being on or off without a reboot
c) reboot one of your servers and look at the BIOS settings
    That is a good idea anyway - please tell us if HT is on or off. What is
the Power Profile? Are C0 states disabled?






On 26 April 2018 at 10:08, Mahmood Naderan <mahmood.nt at gmail.com> wrote:

> It seems that the number of threads has some effects on the
> performance. Maybe some configurations issue exists in openmpi. I will
> investigate more on that. Thanks guys for the tips.
>
> Regards,
> Mahmood
>
>
>
>
> On Tue, Apr 24, 2018 at 9:18 PM, Ryan Novosielski <novosirj at rutgers.edu>
> wrote:
> > I would likely crank up the debugging on the slurmd process and look at
> the log files to see what’s going on in that time. You could also watch the
> job via top or other means (on Linux, you can press “1” to see line-by-line
> for each CPU core), or use strace on the process itself. Presumably
> something is happening that’s either eating up 4 minutes, or the job is
> running 4 minutes more slowly and you’ll need to figure out why. I know
> that our jobs run via the scheduler perform about on par for the hardware,
> and that jobs start fairly immediately.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180426/11d0a315/attachment-0001.html>


More information about the slurm-users mailing list