[slurm-users] [External] Re: Status of BLCR?

George Wm Turner turnerg at iu.edu
Sun Oct 6 13:23:36 UTC 2019


I stumbled across CRIU (Checkpoint/Restore In Userspace) https://criu.org/Main_Page <https://criu.org/Main_Page> a couple of weeks ago.  I have not utilized it yet it; it's on my ToDo list. They claim that it’s packaged with most distress;  I checked RHEL/CentOS and it was there. Be careful of package/kernel versions; i.e  a good reason to go with the version included in your distro.  BLCR was last updated January 2013; back in the day, it worked well enough for simpler apps;  complicated MPI apps was less so.

   - geo



> On Oct 4, 2019, at 11:17 PM, Renfro, Michael <Renfro at tntech.edu> wrote:
> 
> This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.
> 
> DMTCP might be an option? Pretty sure there are RPMs for it in RHEL/CentOS 7. Don’t recall it being any trouble to install.
> 
> http://dmtcp.sourceforge.net/ <http://dmtcp.sourceforge.net/>
> 
> On Oct 4, 2019, at 9:47 PM, Eliot Moss <moss at cs.umass.edu <mailto:moss at cs.umass.edu>> wrote:
> 
>> Dear slurm users --
>> 
>> I'm new to slurm (somewhat experienced with Grid Engine, though that's
>> not relevant to this post).  I have access to two slurm based clusters,
>> and have an application that (a) can be _very_long running (more than
>> 8 weeks for one execution, though the compute and I/O demands of one
>> such job are not huge by modern standards) and that (b) is not at all
>> practical to convert to do its own checkpoints.  (I am running traces
>> from the valgrind program of every memory reference and branch made
>> when running individual SPEC benchmarks; this is then piped to 8
>> downstream analyzers, mostly Java programs.)
>> 
>> From what I have read, BLCR would meet my needs for checkpointing,
>> but the admins of both clusters are reluctant to pursue BLCR support.
>> I myself am wondering whether it is still working, etc., and what it
>> means that built-in support has been removed, etc.  Can someone offer
>> a brief explanation of the status and recent history of BLCR w.r.t.
>> slurm?
>> 
>> Many thanks!   Eliot Moss, UMass Amherst Computer Science
>> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191006/e8351b0f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4046 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191006/e8351b0f/attachment.bin>


More information about the slurm-users mailing list