No, sadly there’s no topology.conf in use.
Thanks,
Daniel Healy
On Thu, Jan 9, 2025 at 8:28 AM Steffen Grunewald < steffen.grunewald@aei.mpg.de> wrote:
On Thu, 2025-01-09 at 07:51:40 -0500, Slurm users wrote:
Hello there and good morning from Baltimore.
I have a small cluster with 100 nodes. When the cluster is completely
empty
of all jobs, the first job gets allocated to node 41. In other clusters, the first job gets allocated to mode 01. If I specify node 01, the allocation works perfectly. I have my partition NodeName set as node[01-99], so having node41 used first is a surprise to me. We also
have
many other partitions which start with node41, but the partition being
used
for the allocation starts with node01.
Does anyone know what would cause this?
Just a wild guess, but do you have a topology.conf file that somehow makes this node look most reasonable to use for a single-node job? (Topology attempts to assign, or hold back, sections of your network to maximize interconnect bandwidth for multi-node jobs. Your node41 might be one - or the first one of a series - that would leave bigger chunks unused for bigger tasks.)
HTH, Steffen
-- Steffen Grunewald, Cluster Administrator Max Planck Institute for Gravitational Physics (Albert Einstein Institute) Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
Fon: +49-331-567 7274 Mail: steffen.grunewald(at)aei.mpg.de