[slurm-users] [External] Re: Status of BLCR?
George Wm Turner
turnerg at iu.edu
Sun Oct 6 13:23:36 UTC 2019
I stumbled across CRIU (Checkpoint/Restore In Userspace) https://criu.org/Main_Page <https://criu.org/Main_Page> a couple of weeks ago. I have not utilized it yet it; it's on my ToDo list. They claim that it’s packaged with most distress; I checked RHEL/CentOS and it was there. Be careful of package/kernel versions; i.e a good reason to go with the version included in your distro. BLCR was last updated January 2013; back in the day, it worked well enough for simpler apps; complicated MPI apps was less so.
- geo
> On Oct 4, 2019, at 11:17 PM, Renfro, Michael <Renfro at tntech.edu> wrote:
>
> This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.
>
> DMTCP might be an option? Pretty sure there are RPMs for it in RHEL/CentOS 7. Don’t recall it being any trouble to install.
>
> http://dmtcp.sourceforge.net/ <http://dmtcp.sourceforge.net/>
>
> On Oct 4, 2019, at 9:47 PM, Eliot Moss <moss at cs.umass.edu <mailto:moss at cs.umass.edu>> wrote:
>
>> Dear slurm users --
>>
>> I'm new to slurm (somewhat experienced with Grid Engine, though that's
>> not relevant to this post). I have access to two slurm based clusters,
>> and have an application that (a) can be _very_long running (more than
>> 8 weeks for one execution, though the compute and I/O demands of one
>> such job are not huge by modern standards) and that (b) is not at all
>> practical to convert to do its own checkpoints. (I am running traces
>> from the valgrind program of every memory reference and branch made
>> when running individual SPEC benchmarks; this is then piped to 8
>> downstream analyzers, mostly Java programs.)
>>
>> From what I have read, BLCR would meet my needs for checkpointing,
>> but the admins of both clusters are reluctant to pursue BLCR support.
>> I myself am wondering whether it is still working, etc., and what it
>> means that built-in support has been removed, etc. Can someone offer
>> a brief explanation of the status and recent history of BLCR w.r.t.
>> slurm?
>>
>> Many thanks! Eliot Moss, UMass Amherst Computer Science
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191006/e8351b0f/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4046 bytes
Desc: not available
URL: <http://lists.schedmd.com/pipermail/slurm-users/attachments/20191006/e8351b0f/attachment.bin>
More information about the slurm-users
mailing list