[slurm-users] salloc with bash scripts problem

Wed Jan 2 10:38:04 MST 2019

I don’t think that’s true (and others have shared documentation regarding interactive jobs and the S commands). There was documentation shared for how this works, and it seems as if it has been ignored.

[novosirj at amarel2 ~]$ salloc -n1
salloc: Pending job allocation 83053985
salloc: job 83053985 queued and waiting for resources
salloc: job 83053985 has been allocated resources
salloc: Granted job allocation 83053985
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job

This is the behavior I’ve always seen. If I include a command at the end of the line, it appears to simply run it in the “new” shell that is created by salloc (which you’ll notice you can exit via CTRL-D or exit).

[novosirj at amarel2 ~]$ salloc -n1 hostname
salloc: Pending job allocation 83054458
salloc: job 83054458 queued and waiting for resources
salloc: job 83054458 has been allocated resources
salloc: Granted job allocation 83054458
salloc: Waiting for resource configuration
salloc: Nodes slepner012 are ready for job
amarel2.amarel.rutgers.edu
salloc: Relinquishing job allocation 83054458

You can, however, tell it to srun something in that shell instead:

[novosirj at amarel2 ~]$ salloc -n1 srun hostname
salloc: Pending job allocation 83054462
salloc: job 83054462 queued and waiting for resources
salloc: job 83054462 has been allocated resources
salloc: Granted job allocation 83054462
salloc: Waiting for resource configuration
salloc: Nodes node073 are ready for job
node073.perceval.rutgers.edu
salloc: Relinquishing job allocation 83054462

When you use salloc, it starts an allocation and sets up the environment:

[novosirj at amarel2 ~]$ env | grep SLURM
SLURM_NODELIST=slepner012
SLURM_JOB_NAME=bash
SLURM_NODE_ALIASES=(null)
SLURM_MEM_PER_CPU=4096
SLURM_NNODES=1
SLURM_JOBID=83053985
SLURM_NTASKS=1
SLURM_TASKS_PER_NODE=1
SLURM_JOB_ID=83053985
SLURM_SUBMIT_DIR=/cache/home/novosirj
SLURM_NPROCS=1
SLURM_JOB_NODELIST=slepner012
SLURM_CLUSTER_NAME=amarel
SLURM_JOB_CPUS_PER_NODE=1
SLURM_SUBMIT_HOST=amarel2.amarel.rutgers.edu
SLURM_JOB_PARTITION=main
SLURM_JOB_NUM_NODES=1

If you run “srun” subsequently, it will run on the compute node, but a regular command will run right where you are:

[novosirj at amarel2 ~]$ srun hostname
slepner012.amarel.rutgers.edu

[novosirj at amarel2 ~]$ hostname
amarel2.amarel.rutgers.edu

Again, I’d advise Mahmood to read the documentation that was already provided. It really doesn’t matter what behavior is requested — that’s not what this command does. If one wants to run a script on a compute node, the correct command is sbatch. I’m not sure what advantage salloc with srun has. I assume it’s so you can open an allocation and then occasionally send srun commands over to it.

--
____
|| \\UTGERS,  	 |---------------------------*O*---------------------------
||_// the State	 |         Ryan Novosielski - novosirj at rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ	 | Office of Advanced Research Computing - MSB C630, Newark
     `'

> On Jan 2, 2019, at 12:20 PM, Terry Jones <terry at jon.es> wrote:
> 
> I know very little about how SLURM works, but this sounds like it's a configuration issue - that it hasn't been configured in a way that indicates the login nodes cannot also be used as compute nodes. When I run salloc on the cluster I use, I *always* get a shell on a compute node, never on the login node that I ran salloc on.
> 
> Terry
> 
> 
> On Wed, Jan 2, 2019 at 4:56 PM Mahmood Naderan <mahmood.nt at gmail.com> wrote:
> Currently, users run "salloc --spankx11 ./qemu.sh" where qemu.sh is a script to run a qemu-system-x86_64 command.
> When user (1) runs that command, the qemu is run on the login node since the user is accessing the login node. When user (2) runs that command, his qemu process is also running on the login node and so on.
> 
> That is not what I want!
> I expected slurm to dispatch the jobs on compute nodes.
> 
> 
> Regards,
> Mahmood
> 
> 
> 
> 
> On Wed, Jan 2, 2019 at 7:39 PM Renfro, Michael <Renfro at tntech.edu> wrote:
> Not sure what the reasons behind “have to manually ssh to a node”, but salloc and srun can be used to allocate resources and run commands on the allocated resources:
> 
> Before allocation, regular commands run locally, and no Slurm-related variables are present:
> 
> =====
> 
> [renfro at login ~]$ hostname
> login
> [renfro at login ~]$ echo $SLURM_TASKS_PER_NODE
> 
> 
> =====
> 
> After allocation, regular commands still run locally, Slurm-related variables are present, and srun runs commands on the allocated node (my prompt change inside a job is a local thing, not done by default):
> 
> =====
> 
> [renfro at login ~]$ salloc
> salloc: Granted job allocation 147867
> [renfro at login(job 147867) ~]$ hostname
> login
> [renfro at login(job 147867) ~]$ echo $SLURM_TASKS_PER_NODE
> 1
> [renfro at login(job 147867) ~]$ srun hostname
> node004
> [renfro at login(job 147867) ~]$ exit
> exit
> salloc: Relinquishing job allocation 147867
> [renfro at login ~]$
> 
> =====
> 
> Lots of people get interactive shells on a reserved node with some variant of ‘srun --pty $SHELL -I’, which doesn’t require explicitly running salloc or ssh, so what are you trying to accomplish in the end?
> 
> --
> Mike Renfro, PhD / HPC Systems Administrator, Information Technology Services
> 931 372-3601     / Tennessee Tech University
> 
> > On Jan 2, 2019, at 9:24 AM, Mahmood Naderan <mahmood.nt at gmail.com> wrote:
> >
> > I want to know if there any any way to push the node selection part on slurm and not a manual thing that is done by user.
> > Currently, I have to manually ssh to a node and try to "allocate resources" using salloc.
> >
> >
> > Regards,
> > Mahmood
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: Message signed with OpenPGP
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190102/3d32b843/attachment.sig>