On Thu, 2025-01-09 at 07:51:40 -0500, Slurm users wrote:
> Hello there and good morning from Baltimore.
>
> I have a small cluster with 100 nodes. When the cluster is completely empty
> of all jobs, the first job gets allocated to node 41. In other clusters,
> the first job gets allocated to mode 01. If I specify node 01, the
> allocation works perfectly. I have my partition NodeName set as
> node[01-99], so having node41 used first is a surprise to me. We also have
> many other partitions which start with node41, but the partition being used
> for the allocation starts with node01.
>
> Does anyone know what would cause this?
Just a wild guess, but do you have a topology.conf file that somehow makes
this node look most reasonable to use for a single-node job?
(Topology attempts to assign, or hold back, sections of your network to
maximize interconnect bandwidth for multi-node jobs. Your node41 might be
one - or the first one of a series - that would leave bigger chunks unused
for bigger tasks.)
HTH,
Steffen
--
Steffen Grunewald, Cluster Administrator
Max Planck Institute for Gravitational Physics (Albert Einstein Institute)
Am Mühlenberg 1 * D-14476 Potsdam-Golm * Germany
~~~
Fon: +49-331-567 7274
Mail: steffen.grunewald(at)aei.mpg.de
~~~