[slurm-users] Problem with Cuda program in multi-cluster

mohammed shambakey shambakey1 at gmail.com
Tue Jul 4 18:44:34 UTC 2023


Hi

I work on 3 clusters: A, B, C. Each of Clusters A and C has 3 compute nodes
and the head node. One of the 3 compute nodes has an old GPU in each
cluster of A and C. All nodes, on all clusters, have Ubuntu 22.04 except
for the 2 nodes with GPU (both of them have Ubuntu 18.04 to suit the old
GPU card). The installed slurm version (on all clusters) is slurm
23.11.0-0rc1.

Cluster B has only 2 compute nodes and the head node. I tried to submit a
sbatch script from cluster B (with a CUDA program) to be executed in any of
clusters A or C (where a GPU node resides). Previously, this used to work,
but after updating the system, I get the following error:

srun: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found
(required by srun)
srun: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
(required by srun)
srun: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found
(required by /hpcshared/slurm_vm/usr/lib/slurm/libslurmfull.so)
srun: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found
(required by /hpcshared/slurm_vm/usr/lib/slurm/libslurmfull.so)
srun: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found
(required by /hpcshared/slurm_vm/usr/lib/slurm/libslurmfull.so)

The installed glibc is 2.35 on all nodes, except for the 2 GPU nodes (glibc
version 2.27). I tried to run the same sbatch script on each of clusters A
and C, and it works fine. The problem happens only when trying to use the
"sbatch -Mall" form cluster B. Just to be sure, I tried to run another
sbatch program (with the multicluster option) that does NOT involve CUDA
program, and it worked fine.

Should I install the same glibc6 on all nodes (2.33 or 2.33 or 2.34), or
what?

Regards

-- 
Mohammed
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20230704/a39c9b39/attachment.htm>


More information about the slurm-users mailing list