[slurm-users] Error running jobs with srun
Elisabetta Falivene
e.falivene at ilabroma.com
Wed Nov 8 16:54:32 MST 2017
I am the admin and I have no documentation :D I'll try The third option.
Thank you very much
Il giovedì 9 novembre 2017, Lachlan Musicman <datakid at gmail.com> ha scritto:
> On 9 November 2017 at 10:35, Elisabetta Falivene <e.falivene at ilabroma.com
> <javascript:_e(%7B%7D,'cvml','e.falivene at ilabroma.com');>> wrote:
>
>> Wow, thank you. There's a way to check which directories the master and
>> The nodes share?
>>
>
> There's no explicit way.
> 1. Check the cluster documentation written by the cluster admins
> 2. Ask the cluster admins
> 3. Run "mount" or "cat /etc/mtab" or "df -H" on the master node and check
> against the same commands on a worker node (by getting an interactive
> terminal: "srun --pty bash" )
>
> Cheers
> L.
>
> ------
> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
> civics is the insistence that we cannot ignore the truth, nor should we
> panic about it. It is a shared consciousness that our institutions have
> failed and our ecosystem is collapsing, yet we are still here — and we are
> creative agents who can shape our destinies. Apocalyptic civics is the
> conviction that the only way out is through, and the only way through is
> together. "
>
> *Greg Bloom* @greggish https://twitter.com/greggish/
> status/873177525903609857
>
>
>
>> Il mercoledì 8 novembre 2017, Lachlan Musicman <datakid at gmail.com
>> <javascript:_e(%7B%7D,'cvml','datakid at gmail.com');>> ha scritto:
>>
>>> On 9 November 2017 at 09:19, Elisabetta Falivene <
>>> e.falivene at ilabroma.com> wrote:
>>>
>>>> I'm getting this message anytime I try to execute any job on my
>>>> cluster.
>>>> (node01 is the name of my first of eight nodes and is up and running)
>>>>
>>>> Trying a python simple script:
>>>> *root at mycluster:/tmp# srun python test.py *
>>>> *slurmd[node01]: error: task/cgroup: unable to build job physical cores*
>>>> */usr/bin/python: can't open file 'test.py': [Errno 2] No such file or
>>>> directory*
>>>> *srun: error: node01: task 0: Exited with exit code 2*
>>>>
>>>>
>>> This error - which I've seen too many times to mention - is because the
>>> file isn't visible to the node.
>>>
>>> EG: If all the cluster share /opt and /home/ but not /root, and you run
>>> "srun python test.py" from /root - then node1 can't find it (because on
>>> node1, /root/test.py doesn't exist)
>>>
>>> Cheers
>>> L.
>>>
>>>
>>> ------
>>> "The antidote to apocalypticism is *apocalyptic civics*. Apocalyptic
>>> civics is the insistence that we cannot ignore the truth, nor should we
>>> panic about it. It is a shared consciousness that our institutions have
>>> failed and our ecosystem is collapsing, yet we are still here — and we are
>>> creative agents who can shape our destinies. Apocalyptic civics is the
>>> conviction that the only way out is through, and the only way through is
>>> together. "
>>>
>>> *Greg Bloom* @greggish https://twitter.com/greggish/s
>>> tatus/873177525903609857
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20171109/fac0ecd3/attachment.html>
More information about the slurm-users
mailing list