[slurm-users] Running pyMPI on several nodes

Tue Aug 13 06:30:48 UTC 2019

Hi Pälle!

Great. It would be helpful to know how they shared the etc directory? NFS?

Benson

On 8/13/19 9:25 AM, Pär Lundö wrote:
>
> Hi!
>
> I have now had the chance to look into to this matter more thoroughly 
> and it seems that the problem was due to the fact that the nodes are 
> diskless and shared some data (e.g. "etc"-dir). I removed that 
> dependency and mounted each node to a unique set of folders, which 
> resolved the issue. Presumably, this can be done in other ways unknown 
> to me, but it helped me and I can now run multiple nodes via MPI.
>
> Thank you for your help!
>
> Best regards,
> Pälle L
>
> On 2019-07-16 15:49, Benson Muite wrote:
>>
>> Hi,
>>
>> Does a regular MPI program run on two nodes? For example helloworld:
>>
>> https://people.sc.fsu.edu/~jburkardt/c_src/hello_mpi/hello_mpi.c
>>
>> https://people.sc.fsu.edu/~jburkardt/py_src/hello_mpi/hello_mpi.py
>>
>> Benson
>>
>> On 7/16/19 4:30 PM, Pär Lundö wrote:
>>> Hi,
>>> Thank you for your quick answer!
>>> I’ll look into that, but they share the same hosts-file and the 
>>> DHCP-server sets their hostname.
>>>
>>> However I came across a setting in the slurm.conf-file ”Tmpfs” and 
>>> there were a note regarding it in the guide of mpi at the slurms 
>>> webpage. I implemented the proposed changes but still no luck.
>>>
>>> Best regards,
>>> Palle
>>>
>>> ------------------------------------------------------------------------
>>> *From:* "slurm-users" <slurm-users-bounces at lists.schedmd.com>
>>> *Sent:* 16 juli 2019 12:32
>>> *To:* "Slurm User Community List" <slurm-users at lists.schedmd.com>
>>> *Subject:* Re: [slurm-users] Running pyMPI on several nodes
>>>
>>> srun: error: Application launch failed: Invalid node name specified
>>>
>>> Hearns Law. All batch system problems are DNS problems.
>>>
>>> Seriously though - check out your name resolution both on the head 
>>> node and the compute nodes.
>>>
>>>
>>> On Tue, 16 Jul 2019 at 08:49, Pär Lundö < par.lundo at foi.se 
>>> <mailto:par.lundo at foi.se>> wrote:
>>>
>>>     Hi,
>>>
>>>     I have now had the time to look at some of your suggestions.
>>>
>>>     First I tried running "srun -N1 hostname" via a sbatch-script,
>>>     while having two nodes up and running.
>>>     "sinfo" yields that two nodes are up and idle prior to
>>>     submitting the sbatch-script.
>>>     After submitting the job, I receive an error stating that:
>>>
>>>     "srun: error: Task launch for 86.0 failed on node lxclient11:
>>>     Invalid node name specified.
>>>     srun: error: Application launch failed: Invalid node name specified
>>>     srun: Job step aborted: Waiting up to 32 seconds for job step to
>>>     finish.
>>>     srun: error: TImed out waiting for job step to complete"
>>>
>>>
>>>     From the log file at the client I get a more detailed error:
>>>     " Launching batch job 86 for UID 1000
>>>     [86.batch] error: Invalid host_index -1 for job 86
>>>     [86.batch] error: Host lxclient10 not in hostlist lxclient11
>>>     [86.batch] task_pre_launch: Using sched_affinity for tasks
>>>     rpc_launch_tasks: Invalid node list (lxclient10 not in lxclient11)"
>>>
>>>     My two nodes are called lxclient10 and lxclient11.
>>>     Why is my batch job launched with the UID 1000, shouldnt it be
>>>     launched via the slurm-user (which in my case has the UID 64030)?
>>>     What is meant by that the different nodes are not in the nodeslist?
>>>     The two nodes and the server share the same setup of
>>>     IP-addresses in the "/etc/hosts"-file.
>>>
>>>     -> This was resolved due to that lxclient10 was noted as down.
>>>     Getting it back up, the submitting of the same sbatch-script,
>>>     resulted in no error.
>>>     However running it on two nodes I get an error
>>>     "srun: error: Job Step 88.0 aborted before step completely launched.
>>>     srun: error: Job step aborted: Waiting up to 32 seconds for job
>>>     step to finish.
>>>     srun: error: task 1 launched failed: Unspecifed error
>>>     srun: error: lxclient10: task 0: Killed"
>>>
>>>     And in the slurmctld.log-file from the client I get an error
>>>     similiar to that prevously stated, that the pmix cannot bind
>>>     UNIX socket /var/spool/slurmd/stepd.slurm.pmix.88.0: Address
>>>     already in use (98)
>>>
>>>     I ran the lsof command, but I dont really know what I am looking
>>>     after, I can see if I grep with the different nodenames that the
>>>     two nodes have mounted the nfs-partition and that a link is
>>>     established.
>>>
>>>     "As an aside, you have checked that your username exists on that
>>>     compue server?      getent passwd par
>>>     Also that your home directory is mounted - or something
>>>     substituting for your home directory?"
>>>     Yes, the user slurm exists on both nodes and have the same uid.
>>>
>>>     "Have you tried
>>>
>>>
>>>             srun -N# -n# mpirun python3 ....
>>>
>>>
>>>     Perhaps you have no MPI environment being setup for the
>>>     processes?  There was no "--mpi" flag in your "srun" command and
>>>     we don't know if you have a default value for that or not.
>>>
>>>     "
>>>
>>>     In my slurm.conf-file I do specify that "MpiDefault=pmix" (And
>>>     it can be seen in the logfile that there is something wrong with
>>>     pmix, that the address already in use.)
>>>
>>>     One thing that struck my mind now is that I run these nodes as a
>>>     pair of diskless nodes, whom boots and mounts the same
>>>     filesystem which is supplied by a server. The run differen pids
>>>     for different processes which should not affect one another(?),
>>>     right?
>>>
>>>
>>>     Best regards,
>>>
>>>     Palle
>>>
>>>     On 2019-07-12 19:34, Pär Lundö wrote:
>>>
>>>         Hi,
>>>
>>>         Thank you so much for your quick responses!
>>>         It is much appreciated.
>>>         I dont have access to the cluster until next week, but I’ll
>>>         be sure to follow up on all of your suggestions and get back
>>>         you next week.
>>>
>>>         Have a nice weekend!
>>>         Best regards
>>>         Palle
>>>
>>>         ------------------------------------------------------------------------
>>>         *From:* "slurm-users"
>>>         <slurm-users-bounces at lists.schedmd.com>
>>>         <mailto:slurm-users-bounces at lists.schedmd.com>
>>>         *Sent:* 12 juli 2019 17:37
>>>         *To:* "Slurm User Community List"
>>>         <slurm-users at lists.schedmd.com>
>>>         <mailto:slurm-users at lists.schedmd.com>
>>>         *Subject:* Re: [slurm-users] Running pyMPI on several nodes
>>>
>>>         Par, by 'poking around' Crhis means to use tools such as
>>>         netstat and lsof.
>>>         Also I would look as ps -eaf --forest to make sure there are
>>>         no 'orphaned' jusbs sitting on that compute node.
>>>
>>>         Having said that though, I have a dim memory of a classic
>>>         PBSPro error message which says something about a network
>>>         connection,
>>>         but really means that you cannot open a remote session on
>>>         that compute server.
>>>
>>>         As an aside, you have checked that your username exists on
>>>         that compue server?      getent passwd par
>>>         Also that your home directory is mounted - or something
>>>         substituting for your home directory?
>>>
>>>
>>>         On Fri, 12 Jul 2019 at 15:55, Chris Samuel <
>>>         chris at csamuel.org <mailto:chris at csamuel.org>> wrote:
>>>
>>>             On 12/7/19 7:39 am, Pär Lundö wrote:
>>>
>>>             > Presumably, the first 8 tasks originates from the
>>>             first node (in this
>>>             > case the lxclient11), and the other node (lxclient10)
>>>             response as
>>>             > predicted.
>>>
>>>             That looks right, it seems the other node has two
>>>             processes fighting
>>>             over the same socket and that's breaking Slurm there.
>>>
>>>             > Is it neccessary to have passwordless ssh
>>>             communication alongside the
>>>             > munge authentication?
>>>
>>>             No, srun doesn't need (or use) that at all.
>>>
>>>             > In addition I checked the slurmctld-log from both the
>>>             server and client
>>>             > and found something (noted in bold):
>>>
>>>             This is from the slurmd log on the client from the look
>>>             of it.
>>>
>>>             > *[2019-07-12T14:57:53.771][83.0] task_p_pre_launch:
>>>             Using sched affinity
>>>             > for tasks lurm.pmix.83.0: Address already in use[98]*
>>>             > [2019-07-12T14:57:53.682][83.0] error: lxclient[0]
>>>             /pmix.server.c:386
>>>             > [pmix_stepd_init] mpi/pmix: ERROR: pmixp_usock_create_srv
>>>             > [2019-07-12T14:57:53.683][83.0] error: (null) [0]
>>>             /mpi_pmix:156
>>>             > [p_mpi_hook_slurmstepd_prefork] mpi/pmix: ERROR:
>>>             pmixp_stepd_init() failed
>>>
>>>             That indicates that something else has grabbed the
>>>             socket it wants and
>>>             that's why the setup of the MPI ranks on the second node
>>>             fails.
>>>
>>>             You'll want to poke around there to see what's using it.
>>>
>>>             Best of luck!
>>>             Chris
>>>             -- 
>>>               Chris Samuel  : http://www.csamuel.org/
>>>             <http://www.csamuel.org/>  :  Berkeley, CA, USA
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20190813/bf5cc322/attachment-0001.htm>