Message boards : Number crunching : Rosetta Checkpointing
Author | Message |
---|---|
Robert Gammon Send message Joined: 9 Nov 07 Posts: 14 Credit: 969,848 RAC: 0 |
I have a XP laptop machine that runs BOINC. its old, and somewhat unreliable, but I cannot afford to replace it now. The power cord is frayed and MUST stay in a PARTICULAR position in order for the machine to stay powered up. This means if the machine gets bumped or moved, we get a sudden, unexpected power failure. This is no different than someone experiencing power failure due to lighting, just LOTS more frequent. In addition, the laptop has to be moved to get to a location with internet access (wireless access only) On the intentional power down events, I have done an orderly shutdown of BOINC prior to shutdown. In both cases (orderly shutdown and unexpected), Rosetta REPEATS the WU from 0.0% almost regardless of how far along the WU is. It was suggested by the moderator that Suspending the project before intentional power downs would go a long way to solving the problem. Well, I just Suspended the project, told BOINC to shutdown, then did a PowerDown of the computer using XP's Shutdown command. I moved the computer to get wireless inet access, checked a few things, and did an XP Shutdown again. When I powered back up, restarted BOINC, and Resumed Rosetta, the Rosetta 5.98 WU that was at 92+% complete, reset back to 0.0%. This is very frustrating!!! |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,424,820 RAC: 9,690 |
Thank you for crunching R@H. This issue sounds frustrating and unfortunately, there is likely little that can be done about it. Some work units save checkpoints more often that others. In many cases, the check point is at the end of a model. If you are 99% complete with a model and your machine restarts, you will restart the work unit a 0%. Maybe someone else has better ideas. sorry and thanks for crunching R@H. Thx! Paul |
Adam Gajdacs (Mr. Fusion) Send message Joined: 26 Nov 05 Posts: 13 Credit: 2,819,688 RAC: 2,580 |
Suspending BOINC/projects only temporarily stops them from running (and thus using CPU time), but nothing else. What would probably work tho is: - set project/BIONC preferences to leave applications in memory when preempted - do not exit BOINC when you want to turn the computer off - instead of shutting down your system, use hybernate (which is usually the preferred method for laptops anyway), which will save a snapshot of the system memory (needs at least as much free hard disk space as much physical memory you have), including the state of processes in a way that they will be restored the exact same state next time you power up the system, meaning that workunits, checkpointed or not, should continue processing from the point where they were before you hybernated the system |
Message boards :
Number crunching :
Rosetta Checkpointing
©2024 University of Washington
https://www.bakerlab.org