[slurm-users] Slurm doesn't call mpiexec or mpirun when run through a GUI app
pbisbal at pppl.gov
Fri Mar 22 17:14:04 UTC 2019
On 3/22/19 12:40 PM, Reuti wrote:
>> Am 22.03.2019 um 16:20 schrieb Prentice Bisbal <pbisbal at pppl.gov>:
>> On 3/21/19 6:56 PM, Reuti wrote:
>>> Am 21.03.2019 um 23:43 schrieb Prentice Bisbal:
>>>> My users here have developed a GUI application which serves as a GUI interface to various physics codes they use. From this GUI, they can submit jobs to Slurm. On Tuesday, we upgraded Slurm from 18.08.5-2 to 18.08.6-2,and a user has reported a problem when submitting Slurm jobs through this GUI app that do not occur when the same sbatch script is submitted from sbatch on the command-line.
>>>> When I replaced the mpirun command with an equivalent srun command, everything works as desired, so the user can get back to work and be productive.
>>>> While srun is a suitable workaround, and is arguably the correct way to run an MPI job, I'd like to understand what is going on here. Any idea what is going wrong, or additional steps I can take to get more debug information?
>>> Was an alias to `mpirun` introduced? It may cover the real application and even the `which mpirun` will return the correct value, but never be executed.
>>> $ type mpirun
>>> $ alias mpirun
>>> may tell in the jobscript.
>> Unfortunately, the script is in tcsh,
> Oh, I didn't notice this – correct.
>> so the 'type' command doesn't work since,
> Is it really running in `tcsh`? The commands look like being generic and available in various shells. Does SLURM honor the the first line of a script and/or use a default? In Bash a function would cover the `mpirun` too.
> (I'm more used to GridEngine, where this can be configured in both ways how to start the scripts.)
Yes, it's running /bin/tcsh as the interpreter. When I used 'type' as
you instructed, I got an error from tcsh, which I wouldn't have gotten
in bash. Slurm respects the interpreter line at the start of the script.
( I know your more of a GridEngine guy. you helped me a lot through the
GridEngine mailing list when I used SGE. I was actually surprised to see
you here. Welcome to Slurm!)
> In "tcsh" I see a defined "jobcmd" of having some effect.
> -- Reuti
>> it's a bash built-in function. I did use the 'alias' command to see all the defined aliases, and mpirun and mpiexec are not aliased. Any other ideas?
More information about the slurm-users