[slurm-users] sacct: error
Patrick Goetz
pgoetz at math.utexas.edu
Fri May 4 07:14:34 MDT 2018
I concur with this. Make sure your nodes are in the /etc/hosts file on
the SMS. Also, if you name them by base + numerical sequence, you can
configure them with a single line in Slurm (using the example below):
NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2
CoresPerSocket=8 ThreadsPerCore=2
On 05/04/2018 12:05 AM, Raymond Wan wrote:
> Hi Eric,
>
>
> On Fri, May 4, 2018 at 6:04 AM, Eric F. Alemany <ealemany at stanford.edu> wrote:
>> # COMPUTE NODES
>> NodeName=radonc[01-04] NodeAddr=10.112.0.5 10.112.0.6 10.112.0.14
>> 10.112.0.16 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
>> ThreadsPerCore=2 State=UNKNOWN
>> PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE
>> State=UP
>
>
> I don't know what is the problem, but my *guess* based on my own
> configuration file is that we have one node per line under "NodeName".
> We also don't have NodeAddr but maybe that's ok. This means the IP
> addresses of the nodes in our cluster are hard-coded in /etc/hosts.
> Also, State is not given.
>
> So, if I formatted your's to look line our's would look something like:
>
> NodeName=radonc01 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> NodeName=radonc02 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> NodeName=radonc03 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> NodeName=radonc04 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE State=UP
>
> Maybe the problem is with the NodeAddr because you might have to
> separate the values with a comma instead of a space? With spaces, it
> might have problems parsing?
>
> That's my guess...
>
> Ray
>
More information about the slurm-users
mailing list