Hello ~
However, as soon as the base QoS job is created, the large QoS job is 
immediately canceled without any waiting time.
> __ __
But in the slurmctld log, there is a grace time log.
[2023-11-02T11:37:36.589] debug:  setting 3600 sec preemption grace time 
for JobId=153 to reclaim resources for JobId=154
> __ __
Could you help me understand what might be going wrong?

Note that Slurm sends SIGTERM signal by default to slurmstepd immediate 
children (which might be gpu_burn in your case) at _the beginning_ of 
the GraceTime, to notify them of approaching termination.

If the processes react to SIGTERM by terminating, which generally the 
case, you may have the impression GraceTime is not honored.

To benefit from the GraceTime, your program must either trap SIGTERM 
with a signal handler or you must enable send_user_signal 
PreemptParameters flag and submit your job with --signal and another signal.

