<div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:tahoma,sans-serif">I have included my login node in the list of nodes. Not all cores are included though. Please see the output of "scontrol" below</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">[mahmood@rocks7 ~]$ scontrol show nodes<br>NodeName=compute-0-0 Arch=x86_64 CoresPerSocket=1<br> CPUAlloc=0 CPUTot=32 CPULoad=31.96<br> AvailableFeatures=rack-0,32CPUs<br> ActiveFeatures=rack-0,32CPUs<br> Gres=(null)<br> NodeAddr=10.1.1.254 NodeHostName=compute-0-0 Version=18.08<br> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017<br> RealMemory=64261 AllocMem=0 FreeMem=5187 Sockets=32 Boards=1<br> State=IDLE ThreadsPerCore=1 TmpDisk=444124 Weight=20511900 Owner=N/A MCS_label=N/A<br> Partitions=CLUSTER,WHEEL<br> BootTime=2018-12-24T18:16:49 SlurmdStartTime=2019-01-02T23:53:20<br> CfgTRES=cpu=32,mem=64261M,billing=47<br> AllocTRES=<br> CapWatts=n/a<br> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br><br><br>NodeName=compute-0-1 Arch=x86_64 CoresPerSocket=1<br> CPUAlloc=6 CPUTot=32 CPULoad=25.90<br> AvailableFeatures=rack-0,32CPUs<br> ActiveFeatures=rack-0,32CPUs<br> Gres=(null)<br> NodeAddr=10.1.1.253 NodeHostName=compute-0-1 Version=18.08<br> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017<br> RealMemory=64261 AllocMem=4096 FreeMem=509 Sockets=32 Boards=1<br> State=MIXED ThreadsPerCore=1 TmpDisk=444124 Weight=20511899 Owner=N/A MCS_label=N/A<br> Partitions=CLUSTER,WHEEL,RUBY,EMERALD,QEMU<br> BootTime=2018-12-24T18:07:22 SlurmdStartTime=2019-01-02T23:53:20<br> CfgTRES=cpu=32,mem=64261M,billing=47<br> AllocTRES=cpu=6,mem=4G<br> CapWatts=n/a<br> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br><br><br>NodeName=compute-0-2 Arch=x86_64 CoresPerSocket=1<br> CPUAlloc=6 CPUTot=32 CPULoad=5.95<br> AvailableFeatures=rack-0,32CPUs<br> ActiveFeatures=rack-0,32CPUs<br> Gres=(null)<br> NodeAddr=10.1.1.252 NodeHostName=compute-0-2 Version=18.08<br> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017<br> RealMemory=64261 AllocMem=4096 FreeMem=7285 Sockets=32 Boards=1<br> State=MIXED ThreadsPerCore=1 TmpDisk=444124 Weight=20511898 Owner=N/A MCS_label=N/A<br> Partitions=CLUSTER,WHEEL,RUBY,EMERALD,QEMU<br> BootTime=2018-12-24T18:10:56 SlurmdStartTime=2019-01-02T23:53:20<br> CfgTRES=cpu=32,mem=64261M,billing=47<br> AllocTRES=cpu=6,mem=4G<br> CapWatts=n/a<br> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br><br><br>NodeName=compute-0-3 Arch=x86_64 CoresPerSocket=1<br> CPUAlloc=6 CPUTot=56 CPULoad=36.87<br> AvailableFeatures=rack-0,56CPUs<br> ActiveFeatures=rack-0,56CPUs<br> Gres=(null)<br> NodeAddr=10.1.1.251 NodeHostName=compute-0-3 Version=18.08<br> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017<br> RealMemory=64147 AllocMem=4096 FreeMem=15274 Sockets=56 Boards=1<br> State=MIXED ThreadsPerCore=1 TmpDisk=913567 Weight=20535897 Owner=N/A MCS_label=N/A<br> Partitions=CLUSTER,WHEEL,RUBY,EMERALD,QEMU<br> BootTime=2018-12-24T18:02:51 SlurmdStartTime=2019-01-02T23:53:20<br> CfgTRES=cpu=56,mem=64147M,billing=71<br> AllocTRES=cpu=6,mem=4G<br> CapWatts=n/a<br> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br><br><br>NodeName=compute-0-4 Arch=x86_64 CoresPerSocket=1<br> CPUAlloc=6 CPUTot=56 CPULoad=37.47<br> AvailableFeatures=rack-0,56CPUs<br> ActiveFeatures=rack-0,56CPUs<br> Gres=(null)<br> NodeAddr=10.1.1.250 NodeHostName=compute-0-4 Version=18.08<br> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017<br> RealMemory=64147 AllocMem=4096 FreeMem=15233 Sockets=56 Boards=1<br> State=MIXED ThreadsPerCore=1 TmpDisk=50268 Weight=20535896 Owner=N/A MCS_label=N/A<br> Partitions=CLUSTER,WHEEL,RUBY,EMERALD,QEMU<br> BootTime=2018-12-24T18:05:38 SlurmdStartTime=2019-01-02T23:53:20<br> CfgTRES=cpu=56,mem=64147M,billing=71<br> AllocTRES=cpu=6,mem=4G<br> CapWatts=n/a<br> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br><br><br>NodeName=rocks7 Arch=x86_64 CoresPerSocket=1<br> CPUAlloc=0 CPUTot=8 CPULoad=23.40<br> AvailableFeatures=(null)<br> ActiveFeatures=(null)<br> Gres=(null)<br> NodeAddr=10.1.1.1 NodeHostName=rocks7 Version=18.08<br> OS=Linux 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017<br> RealMemory=64261 AllocMem=0 FreeMem=322 Sockets=8 Boards=1<br> State=IDLE ThreadsPerCore=1 TmpDisk=272013 Weight=1 Owner=N/A MCS_label=N/A<br> Partitions=WHEEL,QEMU<br> BootTime=2018-12-24T17:47:14 SlurmdStartTime=2019-01-02T23:53:20<br> CfgTRES=cpu=8,mem=64261M,billing=8<br> AllocTRES=<br> CapWatts=n/a<br> CurrentWatts=0 LowestJoules=0 ConsumedJoules=0<br> ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s<br><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">And here are some salloc examples:</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">[mahmood@rocks7 ~]$ salloc<br>salloc: Granted job allocation 275<br>[mahmood@rocks7 ~]$ exit<br>exit<br>salloc: Relinquishing job allocation 275<br>[mahmood@rocks7 ~]$ salloc -n1<br>salloc: Granted job allocation 276<br>[mahmood@rocks7 ~]$ exit<br>exit<br>salloc: Relinquishing job allocation 276<br>[mahmood@rocks7 ~]$ salloc --nodelist=compute-0-2<br>salloc: Granted job allocation 277<br>[mahmood@rocks7 ~]$ exit<br>exit<br>salloc: Relinquishing job allocation 277<br>[mahmood@rocks7 ~]$ salloc -n1 hostname<br>salloc: Granted job allocation 278<br><a href="http://rocks7.jupiterclusterscu.com">rocks7.jupiterclusterscu.com</a><br>salloc: Relinquishing job allocation 278<br>salloc: Job allocation 278 has been revoked.<br>[mahmood@rocks7 ~]$ <br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif">As you can see whenever I run salloc, I see the rocks7 prompt which is the login node.</div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br></div><div class="gmail_default" style="font-family:tahoma,sans-serif"><br clear="all"></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><font face="tahoma,sans-serif">Regards,<br>Mahmood</font><br><br><br></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Jan 2, 2019 at 10:13 PM Brian Johanson <<a href="mailto:bjohanso@psc.edu">bjohanso@psc.edu</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">sallocdefaultcommand specified in slurm.conf will change the default <br>
behavior when salloc is executed without appending a command and also <br>
explain conflicting behavior between installations.<br>
<br>
<br>
SallocDefaultCommand<br>
Normally, salloc(1) will run the user's default shell <br>
when a command to execute is not specified on the salloc command line. <br>
If SallocDefaultCommand is specified, salloc will instead<br>
run the configured command. The command is passed to <br>
'/bin/sh -c', so shell metacharacters are allowed, and commands with <br>
multiple arguments should be quoted. For instance:<br>
<br>
SallocDefaultCommand = "$SHELL"<br>
<br>
would run the shell in the user's $SHELL environment <br>
variable. and<br>
<br>
SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 <br>
--pty --preserve-env --mpi=none $SHELL"<br>
<br>
would run spawn the user's default shell on the <br>
allocated resources, but not consume any of the CPU or memory resources, <br>
configure it as a pseudo-terminal, and preserve all of the job's<br>
environment variables (i.e. and not over-write them with <br>
the job step's allocation information).<br>
<br>
For systems with generic resources (GRES) defined, the <br>
SallocDefaultCommand value should explicitly specify a zero count for <br>
the configured GRES. Failure to do so will result in the<br>
launched shell consuming those GRES and preventing <br>
subsequent srun commands from using them. For example, on Cray systems <br>
add "--gres=craynetwork:0" as shown below:<br>
SallocDefaultCommand = "srun -n1 -N1 --mem-per-cpu=0 <br>
--gres=craynetwork:0 --pty --preserve-env --mpi=none $SHELL"<br>
<br>
For systems with TaskPlugin set, adding an option of <br>
"--cpu-bind=no" is recommended if the default shell should have access <br>
to all of the CPUs allocated to the job on that node, other‐<br>
wise the shell may be limited to a single cpu or core.<br>
<br>
On 1/2/2019 12:38 PM, Ryan Novosielski wrote:<br>
> I don’t think that’s true (and others have shared documentation regarding interactive jobs and the S commands). There was documentation shared for how this works, and it seems as if it has been ignored.<br>
><br>
> [novosirj@amarel2 ~]$ salloc -n1<br>
> salloc: Pending job allocation 83053985<br>
> salloc: job 83053985 queued and waiting for resources<br>
> salloc: job 83053985 has been allocated resources<br>
> salloc: Granted job allocation 83053985<br>
> salloc: Waiting for resource configuration<br>
> salloc: Nodes slepner012 are ready for job<br>
><br>
> This is the behavior I’ve always seen. If I include a command at the end of the line, it appears to simply run it in the “new” shell that is created by salloc (which you’ll notice you can exit via CTRL-D or exit).<br>
><br>
> [novosirj@amarel2 ~]$ salloc -n1 hostname<br>
> salloc: Pending job allocation 83054458<br>
> salloc: job 83054458 queued and waiting for resources<br>
> salloc: job 83054458 has been allocated resources<br>
> salloc: Granted job allocation 83054458<br>
> salloc: Waiting for resource configuration<br>
> salloc: Nodes slepner012 are ready for job<br>
> <a href="http://amarel2.amarel.rutgers.edu" rel="noreferrer" target="_blank">amarel2.amarel.rutgers.edu</a><br>
> salloc: Relinquishing job allocation 83054458<br>
><br>
> You can, however, tell it to srun something in that shell instead:<br>
><br>
> [novosirj@amarel2 ~]$ salloc -n1 srun hostname<br>
> salloc: Pending job allocation 83054462<br>
> salloc: job 83054462 queued and waiting for resources<br>
> salloc: job 83054462 has been allocated resources<br>
> salloc: Granted job allocation 83054462<br>
> salloc: Waiting for resource configuration<br>
> salloc: Nodes node073 are ready for job<br>
> <a href="http://node073.perceval.rutgers.edu" rel="noreferrer" target="_blank">node073.perceval.rutgers.edu</a><br>
> salloc: Relinquishing job allocation 83054462<br>
><br>
> When you use salloc, it starts an allocation and sets up the environment:<br>
><br>
> [novosirj@amarel2 ~]$ env | grep SLURM<br>
> SLURM_NODELIST=slepner012<br>
> SLURM_JOB_NAME=bash<br>
> SLURM_NODE_ALIASES=(null)<br>
> SLURM_MEM_PER_CPU=4096<br>
> SLURM_NNODES=1<br>
> SLURM_JOBID=83053985<br>
> SLURM_NTASKS=1<br>
> SLURM_TASKS_PER_NODE=1<br>
> SLURM_JOB_ID=83053985<br>
> SLURM_SUBMIT_DIR=/cache/home/novosirj<br>
> SLURM_NPROCS=1<br>
> SLURM_JOB_NODELIST=slepner012<br>
> SLURM_CLUSTER_NAME=amarel<br>
> SLURM_JOB_CPUS_PER_NODE=1<br>
> SLURM_SUBMIT_HOST=<a href="http://amarel2.amarel.rutgers.edu" rel="noreferrer" target="_blank">amarel2.amarel.rutgers.edu</a><br>
> SLURM_JOB_PARTITION=main<br>
> SLURM_JOB_NUM_NODES=1<br>
><br>
> If you run “srun” subsequently, it will run on the compute node, but a regular command will run right where you are:<br>
><br>
> [novosirj@amarel2 ~]$ srun hostname<br>
> <a href="http://slepner012.amarel.rutgers.edu" rel="noreferrer" target="_blank">slepner012.amarel.rutgers.edu</a><br>
><br>
> [novosirj@amarel2 ~]$ hostname<br>
> <a href="http://amarel2.amarel.rutgers.edu" rel="noreferrer" target="_blank">amarel2.amarel.rutgers.edu</a><br>
><br>
> Again, I’d advise Mahmood to read the documentation that was already provided. It really doesn’t matter what behavior is requested — that’s not what this command does. If one wants to run a script on a compute node, the correct command is sbatch. I’m not sure what advantage salloc with srun has. I assume it’s so you can open an allocation and then occasionally send srun commands over to it.<br>
><br>
> --<br>
> ____<br>
> || \\UTGERS, |---------------------------*O*---------------------------<br>
> ||_// the State | Ryan Novosielski - <a href="mailto:novosirj@rutgers.edu" target="_blank">novosirj@rutgers.edu</a><br>
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus<br>
> || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark<br>
> `'<br>
><br><br>
</blockquote></div></div>