On 04.12.24 11:30, Hermann Schwärzler via slurm-users slurm-users@lists.schedmd.com wrote:
Hi Steven,
yes, you have the syntax a bit wrong. If you consult the documentation (or the man-page) of slurm.conf you find this in the "NODE CONFIGURATION" section (in the paragraph about "NodeName"):
Note that if the short form of the hostname is not used, it may prevent use of hostlist expressions (the numeric portion in brackets must be at the end of the string)
So the respective part in your slurm.conf should be
NodeName=node[1-7] ...
and you have to configure your name resolution (default domain?) such that these short names are resolvable to IP-addresses.
If that's not feasible you might have to use e.g. something like this
NodeName=DEFAULT CPUs=20 RealMemory=48 NodeName=node1.ods.vuw.ac.nz NodeName=node2.ods.vuw.ac.nz ...
BTW: do your nodes only have 48 MB of memory? The unit in which "RealMemory" has to be specified is megabytes.
Regards, Hermann
On 12/4/24 01:47, Steven Jones via slurm-users wrote:
I guess I have the syntax wrong,
root@node1 slurm]# /usr/sbin/slurmd -D slurmd: fatal: Unable to create NodeAddr list from node[1-7].ods.vuw.ac.nz [root@node1 slurm]# tail /etc/slurm/slurm.conf #ResumeRate= #SuspendExcNodes= #SuspendExcParts= #SuspendRate= #SuspendTime= # # # COMPUTE NODES NodeName=node[1-7].ods.vuw.ac.nz CPUs=20 RealMemory=48 State=UNKNOWN PartitionName=debug Nodes=ALL Default=YES MaxTime=INFINITE State=UP [root@node1 slurm]#
regards
Steven
*From:* Steven Jones via slurm-users slurm-users@lists.schedmd.com *Sent:* Wednesday, 4 December 2024 1:28 pm *To:* slurm-users@schedmd.com slurm-users@schedmd.com *Subject:* [slurm-users] Re: Slurm not running on a warewulf node Well that is a start, TY.
[root@node1 slurm]# /usr/sbin/slurmd -D slurmd: fatal: Unable to determine this slurmd's NodeName
Where is this set?
regards
Steven
*From:* Jeffrey R. Lang JRLang@uwyo.edu *Sent:* Wednesday, 4 December 2024 1:17 pm *To:* Steven Jones steven.jones@vuw.ac.nz; slurm-users@schedmd.com slurm-users@schedmd.com *Subject:* RE: Slurm not running on a warewulf node
You don't often get email from jrlang@uwyo.edu. Learn why this is important https://aka.ms/LearnAboutSenderIdentification
Steve
Trying running the failing process from the command line and use the -D option.
Per man page: Run slurmd in the foreground. Error and debug messages will be copied to stderr.
*Jeffrey R. Lang*
Advanced Research Computing Center
University of Wyoming, Information Technology Center
1000 E. University Ave
Laramie, WY 82071
Email: jrlang@uwyo.edu
Work: 307.766.3381
*From:* Steven Jones via slurm-users slurm-users@lists.schedmd.com *Sent:* Tuesday, December 3, 2024 5:39 PM *To:* slurm-users@schedmd.com *Subject:* [slurm-users] slurm not running on a warewulf node
◆ This message was sent from a non-UWYO address. Please exercise caution when clicking links or opening attachments from external sources.
Hi,
I have set a log creation/location in slurm.conf as,
SlurmdLogFile=/var/log/slurm/slurmd.log
But it is 0 length.
Slurm will not run, what else do I need to do to log why its failing pls?
regards
Steven
Hello Steven, if its warewulf v4 you can have a look at https://github.com/warewulf/warewulf/blob/main/etc/examples/slurm.conf.ww which is a template for a slurm.conf created from the available nodes in warewulf.
kind regards, Christian