Roger. It's the code that prints out the threads it sees - I bet it is the cgroups. I need to look at how that it is configured as well.For the time, that comes from the code itself. I'm guessing it has a start time and and end time in the code and just takes the difference. But again, this is something in the code. Unfortunately, the code uses the time to compute Mop/s total and Mop/s/thread so a longer time means slower performance.Thanks!JeffOn Wed, Apr 23, 2025 at 2:53 PM Michael DiDomenico via slurm-users <slurm-users@lists.schedmd.com> wrote:the program probably says 32 threads, because it's just looking at the
box, not what slurm cgroups allow (assuming your using them) for cpu
i think for an openmp program (not openmpi) you definitely want the
first command with --cpus-per-task=32
are you measuring the runtime inside the program or outside it? if
the later the 10sec addition in time could be the slurm setup/node
allocation
On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton <laytonjb@gmail.com> wrote:
>
> I tried using ntasks and cpus-per-task to get all 32 cores. So I added --ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like:
>
> sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
>
> It now takes 28 seconds (I ran it a few times).
>
> If I change the command to
>
> sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
>
> It now takes about 30 seconds.
>
> Outside of Slurm it was only taking about 19.6 seconds. So either way it takes longer.
>
> Interesting, in the output from bt, it gives the Total Threads and Avail Threads. In all cases the answer is 32. If the code was only using 1 thread I'm wondering why it would say Avail Threads is 32.
>
> I'm still not sure why it takes longer when Slurm is being used, but I'm reading as much as I can.
>
> Thanks!
>
> Jeff
>
>
> On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton <laytonjb@gmail.com> wrote:
>>
>> Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf to see if they restrict a job to a single CPU.
>>
>> Thanks
>>
>> On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users <slurm-users@lists.schedmd.com> wrote:
>>>
>>> without knowing anything about your environment, its reasonable to
>>> suspect that maybe your openmp program is multi-threaded, but slurm is
>>> constraining your job to a single core. evidence of this should show
>>> up when running top on the node, watching the cpu% used for the
>>> program
>>>
>>> On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users
>>> <slurm-users@lists.schedmd.com> wrote:
>>> >
>>> > Good morning,
>>> >
>>> > I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
>>> >
>>> > Then I run the code using a simple job:
>>> >
>>> > Command to submit job: sbatch --nodes=1 run-npb-omp
>>> >
>>> > The script run-npb-omp is the following:
>>> >
>>> > #!/bin/bash
>>> >
>>> > cd /home/.../NPB3.4-OMP/bin
>>> >
>>> > ./bt.C.x
>>> >
>>> >
>>> > When I use Slurm, the job takes 482 seconds.
>>> >
>>> > Nothing really appears in the logs. It doesn't do any IO. No data is copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
>>> >
>>> > Thanks!
>>> >
>>> > Jeff
>>> >
>>> >
>>> >
>>> > --
>>> > slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> > To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
>>>
>>> --
>>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>>> To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
--
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-leave@lists.schedmd.com