[slurm-users] sacct: error

Patrick Goetz pgoetz at math.utexas.edu
Fri May 4 07:14:34 MDT 2018


I concur with this.  Make sure your nodes are in the /etc/hosts file on 
the SMS.  Also, if you name them by base + numerical sequence, you can 
configure them with a single line in Slurm (using the example below):

NodeName=radonc[01-04] CPUs=32 RealMemory=64402 Sockets=2 
CoresPerSocket=8 ThreadsPerCore=2

On 05/04/2018 12:05 AM, Raymond Wan wrote:
> Hi Eric,
> 
> 
> On Fri, May 4, 2018 at 6:04 AM, Eric F. Alemany <ealemany at stanford.edu> wrote:
>> # COMPUTE NODES
>> NodeName=radonc[01-04] NodeAddr=10.112.0.5 10.112.0.6 10.112.0.14
>> 10.112.0.16 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
>> ThreadsPerCore=2   State=UNKNOWN
>> PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE
>> State=UP
> 
> 
> I don't know what is the problem, but my *guess* based on my own
> configuration file is that we have one node per line under "NodeName".
> We also don't have NodeAddr but maybe that's ok.  This means the IP
> addresses of the nodes in our cluster are hard-coded in /etc/hosts.
> Also, State is not given.
> 
> So, if I formatted your's to look line our's would look something like:
> 
> NodeName=radonc01 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> NodeName=radonc02 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> NodeName=radonc03 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> NodeName=radonc04 CPUs=32 RealMemory=64402 Sockets=2 CoresPerSocket=8
> ThreadsPerCore=2
> PartitionName=debug Nodes=radonc[01-04] Default=YES MaxTime=INFINITE State=UP
> 
> Maybe the problem is with the NodeAddr because you might have to
> separate the values with a comma instead of a space?  With spaces, it
> might have problems parsing?
> 
> That's my guess...
> 
> Ray
> 



More information about the slurm-users mailing list