Hi, As their example was limited too "allgpus", I had posted my take on this on the nvidia developer blog. Basically all the same, but lookups the groupid from the dcgmi group json using jp instead of a file. https://developer.nvidia.com/blog/job-statistics-nvidia-data-center-gpu-mana... prolog
group=$(sudo -u $SLURM_JOB_USER dcgmi group -c j$SLURM_JOB_ID) if [ $? -eq 0 ]; then groupid=$(echo $group | awk '{print $10}') sudo -u $SLURM_JOB_USER dcgmi group --group $groupid --add $SLURM_JOB_GPUS sudo -u $SLURM_JOB_USER dcgmi stats --group $groupid --enable sudo -u $SLURM_JOB_USER dcgmi stats --group $groupid --jstart $SLURM_JOBID fi
epilog
OUTPUTDIR=/tmp/ sudo -u $SLURM_JOB_USER dcgmi stats --jstop $SLURM_JOBID sudo -u $SLURM_JOB_USER dcgmi stats --verbose --job $SLURM_JOBID | sudo -u $SLURM_JOB_USER tee $OUTPUTDIR/dcgm-gpu-stats-$HOSTNAME-$SLURM_JOBID.out
groupid=$(sudo -u $SLURM_JOB_USER dcgmi group -l --json | jp "body.Groups.children.[*][0][?children.\"Group Name\".value=='j$SLURM_JOBID'].children.\"Group ID\".value | [0] " | sed s/\"//g)
sudo -u $SLURM_JOB_USER dcgmi group --delete $groupid
MfG -- Markus Kötter, +49 681 870832434 30159 Hannover, Lange Laube 6 Helmholtz Center for Information Security