[slurm-users] GPU / cgroup challenges
R. Paul Wiegand
rpwiegand at gmail.com
Tue May 1 18:15:48 MDT 2018
Yes, I am sure they are all the same. Typically, I just scontrol reconfig;
however, I have also tried restarting all daemons.
We are moving to 7.4 in a few weeks during our downtime. We had a QDR ->
OFED version constraint -> Lustre client version constraint issue that
delayed our upgrade.
Should I just wait and test after the upgrade?
On Tue, May 1, 2018, 19:56 Christopher Samuel <chris at csamuel.org> wrote:
> On 02/05/18 09:31, R. Paul Wiegand wrote:
>
> > Slurm 17.11.0 on CentOS 7.1
>
> That's quite old (on both fronts, RHEL 7.1 is from 2015), we started on
> that same Slurm release but didn't do the GPU cgroup stuff until a later
> version (17.11.3 on RHEL 7.4).
>
> I don't see anything in the NEWS file about relevant cgroup changes
> though (there is a cgroup affinity fix but that's unrelated).
>
> You do have identical slurm.conf, cgroup.conf,
> cgroup_allowed_devices_file.conf etc on all the compute nodes too?
> Slurmd and slurmctld have both been restarted since they were
> configured?
>
> All the best,
> Chris
> --
> Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20180502/a1ecb156/attachment-0001.html>
More information about the slurm-users
mailing list