[slurm-users] Prevent users from updating their jobs

Fri Dec 17 08:44:58 UTC 2021

Well, there could be a way: make them "pay" (in some way) for the 
requested resources.
Payment can be anything: in our case, the more resources one user 
allocates the less priority his group gets. If there are enough users 
impacted by bad behaviour they'll be your allies (if they have access to 
tools like seff to check other users' job efficiency, and they notice 
their jobs have low priority, they'll be the ones sending nastygrams to 
their colleagues and you haven't to do anything).

Il 16/12/2021 22:04, Fulcomer, Samuel ha scritto:
> There's no clear answer to this. It depends a bit on how you've 
> segregated your resources.
> 
> In our environment, GPU and bigmem nodes are in their own partitions. 
> There's nothing to prevent a user from specifying a list of potential 
> partitions in the job submission, so there would be no need for them to 
> do a post-submission "scontrol update jobid" to push a job into a 
> partition that violated the spirit of the service.
> 
> Our practice has been to periodically look at running jobs to see if 
> they are using (or have used, in the case of bigmem) less than their 
> requested resources, and send them a nastygram telling them to stop 
> doing that.
> 
> Creating a LUA submission script that, e.g., blocks jobs from the gpu 
> queue that don't request gpus only helps to weed out the naive users. A 
> subversive user could request a gpu and only use the allocated cores and 
> memory. There's no way to deal with this other than monitoring running 
> jobs and nastygrams, with removal of access after repeated offenses.
> 
> On Thu, Dec 16, 2021 at 3:36 PM Jordi Blasco <jbllistes at gmail.com 
> <mailto:jbllistes at gmail.com>> wrote:
> 
>     Hi everyone,
> 
>     I was wondering if there is a way to prevent users from updating
>     their jobs with "scontrol update job".
> 
>     Here is the justification.
> 
>     A hypothetical user submits a job requesting a regular node, but
>     he/she realises that the large memory nodes or the GPU nodes are
>     idle. Using the previous command, users can request the job to use
>     one of those resources to avoid waiting without a real need for
>     using them.
> 
>     Any suggestions to prevent that?
> 
>     Cheers,
> 
>     Jordi
> 
>     sbatch --mem=1G -t 0:10:00 --wrap="srun -n 1 sleep 360"
>     scontrol update job 791 Features=smp
> 
>     [user01 at slurm-simulator ~]$ sacct -j 791 -o "jobid,nodelist,user"
>             JobID        NodeList      User
>     ------------ --------------- ---------
>     791                    smp-1    user01
> 

-- 
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786