[slurm-users] [External] Re: R jobs crashing when run in parallel

Prentice Bisbal pbisbal at pppl.gov
Mon Mar 29 17:28:23 UTC 2021

It sounds to me like configuration drift on your cluster. I would check
that libpcre is actually (still?) Installed on all your cluster nodes. I'll
bet if you check the node(s) where the jobs are failing, it's probably a
particular subset of nodes, or even only a single node, and libpcre has
some how disappeared from that node(s).


On Mon, Mar 29, 2021, 12:43 PM Patrick Goetz <pgoetz at math.utexas.edu> wrote:

> Could this be a function of the R script you're trying to run, or are
> you saying you get this error running the same script which works at
> other times?
> On 3/29/21 7:47 AM, Simon Andrews wrote:
> > I've got a weird problem on our slurm cluster.  If I submit lots of R
> > jobs to the queue then as soon as I've got more than about 7 of them
> > running at the same time I start to get failures, saying:
> >
> > /bi/apps/R/4.0.4/lib64/R/bin/exec/R: error while loading shared
> > libraries: libpcre2-8.so.0: cannot open shared object file: No such file
> > or directory
> >
> > ..which makes no sense because that library is definitely there, and
> > other jobs on the same nodes worked both before and after the failed
> > jobs.  I recently ran 500 identical jobs and 152 of them failed in this
> way.
> >
> > There are no errors in the log files on the compute nodes where this
> > failed and it happens across multiple nodes so it's not a single one
> > being strange.  The R binary is on an isilon network share, but the
> > libpcre2 library is on the local disk for the node.
> >
> > Anyone come across anything like this before?  Any suggestions for fixes?
> >
> > Thanks
> >
> > Simon.
> >
> >
> > This message is from an external sender. Learn more about why this
> > matters. <https://ut.service-now.com/sp?id=kb_article&number=KB0011401>
> >
> >
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210329/5b8f98bc/attachment.htm>

More information about the slurm-users mailing list