[slurm-users] [External] Re: What is an easy way to prevent users run programs on the master/login node.
Marcus Wagner
wagner at itc.rwth-aachen.de
Thu May 20 05:13:37 UTC 2021
Hi Prentice,
you are right, and I looked into the wrapper script (not my part, never did anything in that thing).
In fact the mpi processes are spawned on the backend nodes, the only process remaining on the login/frontend node is the spawner process.
The wrapper checks the load on the nodes and then creates a corresponding hostfile:
Host nrm214: current load 0.53 => 96 slots left
Host nrm215: current load 0.14 => 96 slots left
Host nrm212: current load 0.09 => 96 slots left
Host nrm213: current load 0.13 => 96 slots left
Used hosts:
nrm214 0 (current load is: 0.53)
nrm215 0 (current load is: 0.14)
nrm212 2.0 (current load is: 0.09)
nrm213 0 (current load is: 0.13)
Writing to /tmp/mw445520/login_60004/hostfile-613910
Contents:
nrm212:2
And then spawns the job:
Command: /opt/intel/impi/2018.4.274/compilers_and_libraries/linux/mpi/bin64/mpirun -launcher ssh -machinefile /tmp/mw445520/login_60004/hostfile-63375 -np 2 <code>
I hope to have cleared things up a little bit.
Best
Marcus
Am 27.04.2021 um 17:48 schrieb Prentice Bisbal:
> But won't that first process be able to use 100% of a core? What if enough users do this such that every core is at 100% utilization? Or, what if the application is MPI + OpenMP? In that case, that one process on the login node could spawn multiple threads that use the remaining cores on the login node.
>
> Prentice
>
> On 4/26/21 2:01 AM, Marcus Wagner wrote:
>> Hi,
>>
>> we also have a wrapper script, together with a number of "MPI-Backends".
>> If mpiexec is called on the login nodes, only the first process is started on the login node, the rest runs on the MPI backends.
>>
>> Best
>> Marcus
>>
>> Am 25.04.2021 um 09:46 schrieb Patrick Begou:
>>> Hi,
>>>
>>> I also saw a cluster setup where mpirun or mpiexec commands were
>>> replaced by a shell script just saying "please use srun or sbatch...".
>>>
>>> Patrick
>>>
>>> Le 24/04/2021 à 10:03, Ole Holm Nielsen a écrit :
>>>> On 24-04-2021 04:37, Cristóbal Navarro wrote:
>>>>> Hi Community,
>>>>> I have a set of users still not so familiar with slurm, and yesterday
>>>>> they bypassed srun/sbatch and just ran their CPU program directly on
>>>>> the head/login node thinking it would still run on the compute node.
>>>>> I am aware that I will need to teach them some basic usage, but in
>>>>> the meanwhile, how have you solved this type of user-behavior
>>>>> problem? Is there a preffered way to restrict the master/login
>>>>> resources, or actions, to the regular users ?
>>>>
>>>> We restrict user limits in /etc/security/limits.conf so users can't
>>>> run very long or very big tasks on the login nodes:
>>>>
>>>> # Normal user limits
>>>> * hard cpu 20
>>>> * hard rss 50000000
>>>> * hard data 50000000
>>>> * soft stack 40000000
>>>> * hard stack 50000000
>>>> * hard nproc 250
>>>>
>>>> /Ole
>>>>
>>>
>>>
>>
>
--
Dipl.-Inf. Marcus Wagner
IT Center
Gruppe: Systemgruppe Linux
Abteilung: Systeme und Betrieb
RWTH Aachen University
Seffenter Weg 23
52074 Aachen
Tel: +49 241 80-24383
Fax: +49 241 80-624383
wagner at itc.rwth-aachen.de
www.itc.rwth-aachen.de
Social Media Kanäle des IT Centers:
https://blog.rwth-aachen.de/itc/
https://www.facebook.com/itcenterrwth
https://www.linkedin.com/company/itcenterrwth
https://twitter.com/ITCenterRWTH
https://www.youtube.com/channel/UCKKDJJukeRwO0LP-ac8x8rQ
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5326 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20210520/489b2c4c/attachment.bin>
More information about the slurm-users
mailing list