[slurm-users] I just had a "conversation" with ChatGPT about working DMTCP, OpenMPI and SLURM. Here are the results
    Christopher Samuel 
    chris at csamuel.org
       
    Sat Feb 18 20:38:12 UTC 2023
    
    
  
On 2/10/23 11:06 am, Analabha Roy wrote:
> I'm having some complex issues coordinating OpenMPI, SLURM, and DMTCP in 
> my cluster.
If you're looking to try checkpointing MPI applications you may want to 
experiment with the MANA ("MPI-Agnostic, Network-Agnostic MPI") plugin 
for DMTCP here: https://github.com/mpickpt/mana
We (NERSC) are collaborating with the developers and it is installed on 
Cori (our older Cray system) for people to experiment with. The 
documentation for it may be useful to others who'd like to try it out - 
it's got a nice description of how it works too which even I as a 
non-programmer can understand. 
https://docs.nersc.gov/development/checkpoint-restart/mana/
Pay special attention to the caveats in our docs though!
I've not used it myself, though I'm peripherally involved to give advice 
on system related issues.
All the best,
Chris
-- 
Chris Samuel  :  http://www.csamuel.org/  :  Berkeley, CA, USA
    
    
More information about the slurm-users
mailing list