sbatch and cgroup v2

List overview All Threads
Download

newer

older

GPU shards not exclusive

slurmdbd error - Symbol...

Dietmar Rieder

28 Feb 2024 28 Feb '24

1:28 p.m.

Hi,

I'm new to slrum, but maybe someone can help me:

I'm trying to restrict the CPU usage to the actually requested/allocated resources using cgroup v2.

For this I made the following settings in slurmd.conf:

ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup,task/affinity

And in cgroup.conf

CgroupPlugin=cgroup/v2 CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainDevices=yes AllowedRAMSpace=98

cgroup v2 seems to be active on the compute node:

# mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

# cat /sys/fs/cgroup/cgroup.subtree_control cpuset cpu io memory pids # cat /sys/fs/cgroup/system.slice/cgroup.subtree_control cpuset cpu io memory pids

Now, when I use sbatch to submit the following test script, the python script which is started from the batch script is utilizing all CPUs (96) at 100% on the allocated node, although I only ask for 4 cpus (--cpus-per-task=4). I'd expect that the task can not use more that these 4.

#!/bin/bash #SBATCH --output=/local/users/appadmin/test-%j.log #SBATCH --job-name=test #SBATCH --chdir=/local/users/appadmin #SBATCH --cpus-per-task=4 #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --mem=64gb #SBATCH --time=4:00:00 #SBATCH --partition=standard #SBATCH --gpus=0 #SBATCH --export #SBATCH --get-user-env=L

export PATH=/usr/local/bioinf/jupyterhub/bin:/usr/local/bioinf/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bioinf/miniforge/condabin

source .bashrc conda activate test python test.py

The python code in test.py is the following using the cpu_load_generator package from [1]:

#!/usr/bin/env python

import sys from cpu_load_generator import load_single_core, load_all_cores, from_profile

load_all_cores(duration_s=120, target_load=1) # generates load on all cores

Interestingly, when I use srun to launch an interactive job, and run the python script manually, I see with top that only 4 cpus are running at 100%. And I also python errors thrown when the script tries to start the 5th process (which makes sense):

File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/cpu_load_generator/_interface.py", line 24, in load_single_core process.cpu_affinity([core_num]) File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/__init__.py", line 867, in cpu_affinity self._proc.cpu_affinity_set(list(set(cpus))) File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 1714, in wrapper return fun(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 2213, in cpu_affinity_set cext.proc_cpu_affinity_set(self.pid, cpus) OSError: [Errno 22] Invalid argument

What am I missing, why are the CPU resources not restricted when I use sbatch?

Thanks for any input or hint Dietmar

[1]: https://pypi.org/project/cpu-load-generator/

Attachments:

OpenPGP_signature.asc (application/pgp-signature — 665 bytes)

Show replies by date

Josef Dvoracek

28 Feb 28 Feb

1:55 p.m.

Hi Dietmar;

I tried this on ${my cluster}, as I switched to cgroupsv2 quite recently..

I must say that on my setup it looks it works as expected, see the grepped stdout from your reproducer below.

I use recent slurm 23.11.4 .

Wild guess.. Has your build machine bpt and dbus devel packages installed? (both packages are fine to be absent when doing build for cgroupsv1 - slurm..)

cheers

josef

[jose@koios1 test_cgroups]$ cat slurm-7177217.out | grep eli ValueError: CPU number 7 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 4 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 5 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 11 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 9 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 10 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 14 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 8 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 12 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 6 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 13 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 15 is not eligible; choose between [0, 1, 2, 3] [jose@koios1 test_cgroups]$

On 28. 02. 24 14:28, Dietmar Rieder via slurm-users wrote: ...

Dietmar Rieder

2:25 p.m.

New subject: [EXTERN] Re: sbatch and cgroup v2

Hi,

I'm running slurm 22.05.11 which is available with OpenHCP 3.x Do you think an upgrade is needed?

Best Dietmar

On 2/28/24 14:55, Josef Dvoracek via slurm-users wrote:

...

Hi Dietmar;

I tried this on ${my cluster}, as I switched to cgroupsv2 quite recently..

I must say that on my setup it looks it works as expected, see the grepped stdout from your reproducer below.

I use recent slurm 23.11.4 .

Wild guess.. Has your build machine bpt and dbus devel packages installed? (both packages are fine to be absent when doing build for cgroupsv1 - slurm..)

cheers

josef

[jose@koios1 test_cgroups]$ cat slurm-7177217.out | grep eli ValueError: CPU number 7 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 4 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 5 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 11 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 9 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 10 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 14 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 8 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 12 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 6 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 13 is not eligible; choose between [0, 1, 2, 3] ValueError: CPU number 15 is not eligible; choose between [0, 1, 2, 3] [jose@koios1 test_cgroups]$

On 28. 02. 24 14:28, Dietmar Rieder via slurm-users wrote: ...

Josef Dvoracek

3:25 p.m.

New subject: [EXTERN] Re: sbatch and cgroup v2

...

I'm running slurm 22.05.11 which is available with OpenHCP 3.x Do you think an upgrade is needed?

I feel that lot of slurm operators tend to not use 3rd party sources of slurm binaries, as you do not have the build environment fully in your hands.

But before making such a complex decision, perhaps look for build logs of slurm you use (somewhere in OpenHPC buildsystem?) and check if it was built with libraries needed to have cgroupsv2 working..

Not having cgroupsv2 dependencies during build-time is only one of all possible causes..

josef

Dietmar Rieder

29 Feb 29 Feb

12:19 p.m.

New subject: [EXTERN] Re: sbatch and cgroup v2

Hi Josef, hi list,

I now rebuild the rpms from OpenHPC but using the original sources form version 23.11.4.

The configure command that is genereated from the spec is the following:

./configure --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --program-prefix= \ --disable-dependency-tracking \ --prefix=/usr \ --exec-prefix=/usr \ --bindir=/usr/bin \ --sbindir=/usr/sbin \ --sysconfdir=/etc/slurm \ --datadir=/usr/share \ --includedir=/usr/include \ --libdir=/usr/lib64 \ --libexecdir=/usr/libexec \ --localstatedir=/var \ --sharedstatedir=/var/lib \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --enable-multiple-slurmd \ --with-pmix=/opt/ohpc/admin/pmix \ --with-hwloc=/opt/ohpc/pub/libs/hwloc

(Do I miss something here)

the configure output shows:

[...] checking for bpf installation... /usr checking for dbus-1... yes [...]

config.log

dbus_CFLAGS='-I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include ' dbus_LIBS='-ldbus-1

confdefs.h. #define WITH_CGROUP 1 #define HAVE_BPF 1

However I still can't see any CPU limits when I use sbatch to run a batch job.

$ sbatch --time 5 --ntasks-per-node=1 --nodes=1 --cpus-per-task=1 --wrap 'grep Cpus /proc/$$/status'

$ cat slurm-72.out Cpus_allowed: ffffffff,ffffffff,ffffffff Cpus_allowed_list: 0-95

The logs from the head node (leto) and the compute node (apollo-01) are showing:

Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: _slurm_rpc_submit_batch_job: JobId=72 InitPrio=1 usec=365 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 72 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 72 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU input mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU input mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU final HW mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU final HW mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: Launching batch job 72 for UID 50001 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: Launching batch job 72 for UID 50001 Feb 29 12:55:06 apollo-01 kernel: slurm.epilog.cl (172966): drop_caches: 3 Feb 29 12:55:06 apollo-01 kernel: slurm.epilog.cl (172966): drop_caches: 3 Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: sched/backfill: _start_job: Started JobId=72 in standard on apollo-01 Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: _job_complete: JobId=72 WEXITSTATUS 0 Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: _job_complete: JobId=72 done

Best Dietmar

On 2/28/24 16:25, Josef Dvoracek via slurm-users wrote:

...

...
I'm running slurm 22.05.11 which is available with OpenHCP 3.x Do you think an upgrade is needed?

I feel that lot of slurm operators tend to not use 3rd party sources of slurm binaries, as you do not have the build environment fully in your hands.

But before making such a complex decision, perhaps look for build logs of slurm you use (somewhere in OpenHPC buildsystem?) and check if it was built with libraries needed to have cgroupsv2 working..

Not having cgroupsv2 dependencies during build-time is only one of all possible causes..

josef

Dietmar Rieder

5:02 p.m.

New subject: [EXTERN] Re: sbatch and cgroup v2

Hi list,

I finally got it working, I completely overlooked that I set Oversubscribe=EXCLUSIVE for the partition that I used to test, stupid me.....

sorry for the noise and thanks again for your answers

Best Dietmar

On 2/29/24 13:19, Dietmar Rieder via slurm-users wrote:

...

Hi Josef, hi list,

I now rebuild the rpms from OpenHPC but using the original sources form version 23.11.4.

The configure command that is genereated from the spec is the following:

./configure --build=x86_64-redhat-linux-gnu \ --host=x86_64-redhat-linux-gnu \ --program-prefix= \ --disable-dependency-tracking \ --prefix=/usr \ --exec-prefix=/usr \ --bindir=/usr/bin \ --sbindir=/usr/sbin \ --sysconfdir=/etc/slurm \ --datadir=/usr/share \ --includedir=/usr/include \ --libdir=/usr/lib64 \ --libexecdir=/usr/libexec \ --localstatedir=/var \ --sharedstatedir=/var/lib \ --mandir=/usr/share/man \ --infodir=/usr/share/info \ --enable-multiple-slurmd \ --with-pmix=/opt/ohpc/admin/pmix \ --with-hwloc=/opt/ohpc/pub/libs/hwloc

(Do I miss something here)

the configure output shows:

[...] checking for bpf installation... /usr checking for dbus-1... yes [...]

config.log

dbus_CFLAGS='-I/usr/include/dbus-1.0 -I/usr/lib64/dbus-1.0/include ' dbus_LIBS='-ldbus-1

confdefs.h. #define WITH_CGROUP 1 #define HAVE_BPF 1

However I still can't see any CPU limits when I use sbatch to run a batch job.

$ sbatch --time 5 --ntasks-per-node=1 --nodes=1 --cpus-per-task=1 --wrap 'grep Cpus /proc/$$/status'

$ cat slurm-72.out Cpus_allowed: ffffffff,ffffffff,ffffffff Cpus_allowed_list: 0-95

The logs from the head node (leto) and the compute node (apollo-01) are showing:

Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: _slurm_rpc_submit_batch_job: JobId=72 InitPrio=1 usec=365 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 72 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: task_p_slurmd_batch_request: task_p_slurmd_batch_request: 72 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU input mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU input mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU final HW mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: task/affinity: batch_bind: job 72 CPU final HW mask for node: 0xFFFFFFFFFFFFFFFFFFFFFFFF Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: Launching batch job 72 for UID 50001 Feb 29 12:55:05 apollo-01 slurmd[172835]: slurmd: Launching batch job 72 for UID 50001 Feb 29 12:55:06 apollo-01 kernel: slurm.epilog.cl (172966): drop_caches: 3 Feb 29 12:55:06 apollo-01 kernel: slurm.epilog.cl (172966): drop_caches: 3 Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: sched/backfill: _start_job: Started JobId=72 in standard on apollo-01 Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: _job_complete: JobId=72 WEXITSTATUS 0 Feb 29 12:55:05 leto slurmctld[272883]: slurmctld: _job_complete: JobId=72 done

Best Dietmar

On 2/28/24 16:25, Josef Dvoracek via slurm-users wrote:

...
> I'm running slurm 22.05.11 which is available with OpenHCP 3.x > Do you think an upgrade is needed?

I feel that lot of slurm operators tend to not use 3rd party sources of slurm binaries, as you do not have the build environment fully in your hands.

But before making such a complex decision, perhaps look for build logs of slurm you use (somewhere in OpenHPC buildsystem?) and check if it was built with libraries needed to have cgroupsv2 working..

Not having cgroupsv2 dependencies during build-time is only one of all possible causes..

josef

Hermann Schwärzler

28 Feb 28 Feb

2:01 p.m.

Hi Dietmar,

what do you find in the output-file of this job

sbatch --time 5 --cpus-per-task=1 --wrap 'grep Cpus /proc/$$/status'

On our 64 cores machines with enabled hyperthreading I see e.g.

Cpus_allowed: 04000000,00000000,04000000,00000000 Cpus_allowed_list: 58,122

Greetings Hermann

On 2/28/24 14:28, Dietmar Rieder via slurm-users wrote:

...

Hi,

I'm new to slrum, but maybe someone can help me:

I'm trying to restrict the CPU usage to the actually requested/allocated resources using cgroup v2.

For this I made the following settings in slurmd.conf:

ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup,task/affinity

And in cgroup.conf

CgroupPlugin=cgroup/v2 CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainDevices=yes AllowedRAMSpace=98

cgroup v2 seems to be active on the compute node:

# mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

# cat /sys/fs/cgroup/cgroup.subtree_control cpuset cpu io memory pids # cat /sys/fs/cgroup/system.slice/cgroup.subtree_control cpuset cpu io memory pids

Now, when I use sbatch to submit the following test script, the python script which is started from the batch script is utilizing all CPUs (96) at 100% on the allocated node, although I only ask for 4 cpus (--cpus-per-task=4). I'd expect that the task can not use more that these 4.

#!/bin/bash #SBATCH --output=/local/users/appadmin/test-%j.log #SBATCH --job-name=test #SBATCH --chdir=/local/users/appadmin #SBATCH --cpus-per-task=4 #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --mem=64gb #SBATCH --time=4:00:00 #SBATCH --partition=standard #SBATCH --gpus=0 #SBATCH --export #SBATCH --get-user-env=L

export PATH=/usr/local/bioinf/jupyterhub/bin:/usr/local/bioinf/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bioinf/miniforge/condabin

source .bashrc conda activate test python test.py

The python code in test.py is the following using the cpu_load_generator package from [1]:

#!/usr/bin/env python

import sys from cpu_load_generator import load_single_core, load_all_cores, from_profile

load_all_cores(duration_s=120, target_load=1) # generates load on all cores

Interestingly, when I use srun to launch an interactive job, and run the python script manually, I see with top that only 4 cpus are running at 100%. And I also python errors thrown when the script tries to start the 5th process (which makes sense):

File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/cpu_load_generator/_interface.py", line 24, in load_single_core process.cpu_affinity([core_num]) File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/__init__.py", line 867, in cpu_affinity self._proc.cpu_affinity_set(list(set(cpus))) File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 1714, in wrapper return fun(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 2213, in cpu_affinity_set cext.proc_cpu_affinity_set(self.pid, cpus) OSError: [Errno 22] Invalid argument

What am I missing, why are the CPU resources not restricted when I use sbatch?

Thanks for any input or hint Dietmar

Dietmar Rieder

2:16 p.m.

New subject: [EXTERN] Re: sbatch and cgroup v2

Hi Hermann,

I get:

Cpus_allowed: ffffffff,ffffffff,ffffffff Cpus_allowed_list: 0-95

Best Dietmar

p.s.: lg aus dem CCB

On 2/28/24 15:01, Hermann Schwärzler via slurm-users wrote:

...

Hi Dietmar,

what do you find in the output-file of this job

sbatch --time 5 --cpus-per-task=1 --wrap 'grep Cpus /proc/$$/status'

On our 64 cores machines with enabled hyperthreading I see e.g.

Cpus_allowed:   04000000,00000000,04000000,00000000 Cpus_allowed_list:      58,122

Greetings Hermann

On 2/28/24 14:28, Dietmar Rieder via slurm-users wrote:

...
Hi,

I'm new to slrum, but maybe someone can help me:

I'm trying to restrict the CPU usage to the actually requested/allocated resources using cgroup v2.

For this I made the following settings in slurmd.conf:

ProctrackType=proctrack/cgroup TaskPlugin=task/cgroup,task/affinity

And in cgroup.conf

CgroupPlugin=cgroup/v2 CgroupAutomount=yes ConstrainCores=yes ConstrainRAMSpace=yes ConstrainDevices=yes AllowedRAMSpace=98

cgroup v2 seems to be active on the compute node:

# mount | grep cgroup cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot)

# cat /sys/fs/cgroup/cgroup.subtree_control cpuset cpu io memory pids # cat /sys/fs/cgroup/system.slice/cgroup.subtree_control cpuset cpu io memory pids

Now, when I use sbatch to submit the following test script, the python script which is started from the batch script is utilizing all CPUs (96) at 100% on the allocated node, although I only ask for 4 cpus (--cpus-per-task=4). I'd expect that the task can not use more that these 4.

#!/bin/bash #SBATCH --output=/local/users/appadmin/test-%j.log #SBATCH --job-name=test #SBATCH --chdir=/local/users/appadmin #SBATCH --cpus-per-task=4 #SBATCH --ntasks=1 #SBATCH --nodes=1 #SBATCH --mem=64gb #SBATCH --time=4:00:00 #SBATCH --partition=standard #SBATCH --gpus=0 #SBATCH --export #SBATCH --get-user-env=L

export PATH=/usr/local/bioinf/jupyterhub/bin:/usr/local/bioinf/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/local/bioinf/miniforge/condabin

source .bashrc conda activate test python test.py

The python code in test.py is the following using the cpu_load_generator package from [1]:

#!/usr/bin/env python

import sys from cpu_load_generator import load_single_core, load_all_cores, from_profile

load_all_cores(duration_s=120, target_load=1) # generates load on all cores

Interestingly, when I use srun to launch an interactive job, and run the python script manually, I see with top that only 4 cpus are running at 100%. And I also python errors thrown when the script tries to start the 5th process (which makes sense):

File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap      self.run()    File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/multiprocessing/process.py", line 108, in run      self._target(*self._args, **self._kwargs)    File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/cpu_load_generator/_interface.py", line 24, in load_single_core      process.cpu_affinity([core_num])    File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/__init__.py", line 867, in cpu_affinity      self._proc.cpu_affinity_set(list(set(cpus)))    File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 1714, in wrapper      return fun(self, *args, **kwargs)             ^^^^^^^^^^^^^^^^^^^^^^^^^^    File "/usr/local/bioinf/miniforge/envs/test/lib/python3.12/site-packages/psutil/_pslinux.py", line 2213, in cpu_affinity_set      cext.proc_cpu_affinity_set(self.pid, cpus) OSError: [Errno 22] Invalid argument

What am I missing, why are the CPU resources not restricted when I use sbatch?

Thanks for any input or hint     Dietmar

-- _________________________________________________________ D i e t m a r R i e d e r Innsbruck Medical University Biocenter - Institute of Bioinformatics Innrain 80, 6020 Innsbruck Phone: +43 512 9003 71402 | Mobile: +43 676 8716 72402 Email: dietmar.rieder@i-med.ac.at Web: http://www.icbi.at

528

Age (days ago)

529

Last active (days ago)

slurm-users@lists.schedmd.com

7 comments

3 participants

tags (0)

participants (3)

Dietmar Rieder
Hermann Schwärzler
Josef Dvoracek