Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
without knowing anything about your environment, its reasonable to suspect that maybe your openmp program is multi-threaded, but slurm is constraining your job to a single core. evidence of this should show up when running top on the node, watching the cpu% used for the program
On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users slurm-users@lists.schedmd.com wrote:
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf to see if they restrict a job to a single CPU.
Thanks
On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users < slurm-users@lists.schedmd.com> wrote:
without knowing anything about your environment, its reasonable to suspect that maybe your openmp program is multi-threaded, but slurm is constraining your job to a single core. evidence of this should show up when running top on the node, watching the cpu% used for the program
On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users slurm-users@lists.schedmd.com wrote:
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK
(version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is
copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
I tried using ntasks and cpus-per-task to get all 32 cores. So I added --ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like:
sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
It now takes 28 seconds (I ran it a few times).
If I change the command to
sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
It now takes about 30 seconds.
Outside of Slurm it was only taking about 19.6 seconds. So either way it takes longer.
Interesting, in the output from bt, it gives the Total Threads and Avail Threads. In all cases the answer is 32. If the code was only using 1 thread I'm wondering why it would say Avail Threads is 32.
I'm still not sure why it takes longer when Slurm is being used, but I'm reading as much as I can.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton laytonjb@gmail.com wrote:
Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf to see if they restrict a job to a single CPU.
Thanks
On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users < slurm-users@lists.schedmd.com> wrote:
without knowing anything about your environment, its reasonable to suspect that maybe your openmp program is multi-threaded, but slurm is constraining your job to a single core. evidence of this should show up when running top on the node, watching the cpu% used for the program
On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users slurm-users@lists.schedmd.com wrote:
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK
(version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is
copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
the program probably says 32 threads, because it's just looking at the box, not what slurm cgroups allow (assuming your using them) for cpu
i think for an openmp program (not openmpi) you definitely want the first command with --cpus-per-task=32
are you measuring the runtime inside the program or outside it? if the later the 10sec addition in time could be the slurm setup/node allocation
On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton laytonjb@gmail.com wrote:
I tried using ntasks and cpus-per-task to get all 32 cores. So I added --ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like:
sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
It now takes 28 seconds (I ran it a few times).
If I change the command to
sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
It now takes about 30 seconds.
Outside of Slurm it was only taking about 19.6 seconds. So either way it takes longer.
Interesting, in the output from bt, it gives the Total Threads and Avail Threads. In all cases the answer is 32. If the code was only using 1 thread I'm wondering why it would say Avail Threads is 32.
I'm still not sure why it takes longer when Slurm is being used, but I'm reading as much as I can.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton laytonjb@gmail.com wrote:
Roger. I didn't configure Slurm so let me look at slurm.conf and gres.conf to see if they restrict a job to a single CPU.
Thanks
On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users slurm-users@lists.schedmd.com wrote:
without knowing anything about your environment, its reasonable to suspect that maybe your openmp program is multi-threaded, but slurm is constraining your job to a single core. evidence of this should show up when running top on the node, watching the cpu% used for the program
On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users slurm-users@lists.schedmd.com wrote:
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Roger. It's the code that prints out the threads it sees - I bet it is the cgroups. I need to look at how that it is configured as well.
For the time, that comes from the code itself. I'm guessing it has a start time and and end time in the code and just takes the difference. But again, this is something in the code. Unfortunately, the code uses the time to compute Mop/s total and Mop/s/thread so a longer time means slower performance.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:53 PM Michael DiDomenico via slurm-users < slurm-users@lists.schedmd.com> wrote:
the program probably says 32 threads, because it's just looking at the box, not what slurm cgroups allow (assuming your using them) for cpu
i think for an openmp program (not openmpi) you definitely want the first command with --cpus-per-task=32
are you measuring the runtime inside the program or outside it? if the later the 10sec addition in time could be the slurm setup/node allocation
On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton laytonjb@gmail.com wrote:
I tried using ntasks and cpus-per-task to get all 32 cores. So I added
--ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like:
sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
It now takes 28 seconds (I ran it a few times).
If I change the command to
sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
It now takes about 30 seconds.
Outside of Slurm it was only taking about 19.6 seconds. So either way it
takes longer.
Interesting, in the output from bt, it gives the Total Threads and Avail
Threads. In all cases the answer is 32. If the code was only using 1 thread I'm wondering why it would say Avail Threads is 32.
I'm still not sure why it takes longer when Slurm is being used, but I'm
reading as much as I can.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton laytonjb@gmail.com
wrote:
Roger. I didn't configure Slurm so let me look at slurm.conf and
gres.conf to see if they restrict a job to a single CPU.
Thanks
On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users <
slurm-users@lists.schedmd.com> wrote:
without knowing anything about your environment, its reasonable to suspect that maybe your openmp program is multi-threaded, but slurm is constraining your job to a single core. evidence of this should show up when running top on the node, watching the cpu% used for the program
On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users slurm-users@lists.schedmd.com wrote:
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC
SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data is
copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
Beside slurm options, you might also need to set OpenMP env variable:
export OMP_NUM_THREADS=32 (the core, not thread number)
Also other similar env variables, if you use any Python libs. Best,
Feng
On Wed, Apr 23, 2025 at 3:22 PM Jeffrey Layton via slurm-users < slurm-users@lists.schedmd.com> wrote:
Roger. It's the code that prints out the threads it sees - I bet it is the cgroups. I need to look at how that it is configured as well.
For the time, that comes from the code itself. I'm guessing it has a start time and and end time in the code and just takes the difference. But again, this is something in the code. Unfortunately, the code uses the time to compute Mop/s total and Mop/s/thread so a longer time means slower performance.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:53 PM Michael DiDomenico via slurm-users < slurm-users@lists.schedmd.com> wrote:
the program probably says 32 threads, because it's just looking at the box, not what slurm cgroups allow (assuming your using them) for cpu
i think for an openmp program (not openmpi) you definitely want the first command with --cpus-per-task=32
are you measuring the runtime inside the program or outside it? if the later the 10sec addition in time could be the slurm setup/node allocation
On Wed, Apr 23, 2025 at 2:41 PM Jeffrey Layton laytonjb@gmail.com wrote:
I tried using ntasks and cpus-per-task to get all 32 cores. So I added
--ntasks=# --cpus-per-task=N to th sbatch command so that it now looks like:
sbatch --nodes=1 --ntasks=1 --cpus-per-task=32 <script>
It now takes 28 seconds (I ran it a few times).
If I change the command to
sbatch --nodes=1 --ntasks=32 --cpus-per-task=1 <script>
It now takes about 30 seconds.
Outside of Slurm it was only taking about 19.6 seconds. So either way
it takes longer.
Interesting, in the output from bt, it gives the Total Threads and
Avail Threads. In all cases the answer is 32. If the code was only using 1 thread I'm wondering why it would say Avail Threads is 32.
I'm still not sure why it takes longer when Slurm is being used, but
I'm reading as much as I can.
Thanks!
Jeff
On Wed, Apr 23, 2025 at 2:15 PM Jeffrey Layton laytonjb@gmail.com
wrote:
Roger. I didn't configure Slurm so let me look at slurm.conf and
gres.conf to see if they restrict a job to a single CPU.
Thanks
On Wed, Apr 23, 2025 at 1:48 PM Michael DiDomenico via slurm-users <
slurm-users@lists.schedmd.com> wrote:
without knowing anything about your environment, its reasonable to suspect that maybe your openmp program is multi-threaded, but slurm is constraining your job to a single core. evidence of this should show up when running top on the node, watching the cpu% used for the program
On Wed, Apr 23, 2025 at 1:28 PM Jeffrey Layton via slurm-users slurm-users@lists.schedmd.com wrote:
Good morning,
I'm running an NPB test, bt.C that is OpenMP and built using NV HPC
SDK (version 25.1). I run it on a compute node by ssh-ing to the node. It runs in about 19.6 seconds.
Then I run the code using a simple job:
Command to submit job: sbatch --nodes=1 run-npb-omp
The script run-npb-omp is the following:
#!/bin/bash
cd /home/.../NPB3.4-OMP/bin
./bt.C.x
When I use Slurm, the job takes 482 seconds.
Nothing really appears in the logs. It doesn't do any IO. No data
is copied anywhere. I'm king of at a loss to figure out why. Any suggestions of where to look?
Thanks!
Jeff
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-leave@lists.schedmd.com