Message boards : Number crunching : Why can't Rosetta checkpoint more often (compared to WCG)? +feedback
Author | Message |
---|---|
Nemesis Send message Joined: 12 Mar 06 Posts: 149 Credit: 21,395 RAC: 0 |
Rosetta, 1st credit, invalidated after many hours. WCG, "finished", not updated on website, I quit distributed computing for a while (couple days). Then I come back to WCG to see if it updated.. and it did! Whoo, after many hours, it didn't reset, the timer or the checkpoint (at least not significantly)... You're singing my song! Maybe this will become my personal crusade - to get the 1% and Completion Time problems fixed. Right now, there has been no acknowledgement that the Rosetta programmers are working on it, or that they intend to work on it. BTW, there is an entire thread devoted to this topic. Nemesis n. A righteous infliction of retribution manifested by an appropriate agent. |
Nemesis Send message Joined: 12 Mar 06 Posts: 149 Credit: 21,395 RAC: 0 |
I realize that.. Because Rosetta doesn't checkpoint until the end of the model, if it's stopped it has to start over from the last completed model, or if in the first model from the beginning, and the clock starts over as well if in the first model. I've never run WCG, but it sounds like it does a checkpoint and saves the crunching time info when you stop it. That's totally up to the science app programmers and how they decide to do it. Nemesis n. A righteous infliction of retribution manifested by an appropriate agent. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
So I'd like to hear from a Rosetta dev why they can't resume work (save often) in the middle of a crunching... Bin Qian's comments from when checkpointing was originally added to Rosetta almost a year ago. As mentioned in that thread, the new version of BOINC also has new features to try and preempt one project to begin another only at a checkpoint. Rosetta Moderator: Mod.Sense |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Well, Rosetta doesn't count in 1 second increments. It's not like Rosetta made a calculation every second, showing the time remaining increasing. It's just BOINC showing you *IT's* guess based on the increasing CPU time. So, 1 second of time passing increases the CPU time used by ~1 second, and BOINC takes the total CPU time used so far, along with the % complete, and it's history on how long it took you to complete tasks in the past and shows you the result as estimated time to completion. You can see this better if you let a task run longer. Later in the run, when % completed is over 50%, the runtime still increases one second at a time, but the estimated time to completion doesn't change every second. I mention this simply to point out that the numbers you are observing are a level removed from the numbers Rosetta's programs are working with. So it further complicates reaching the goal of a smoothly declining timeline. I don't know all the details about how the numbers get revised and how they are communicated back to the BOINC Manager. Nor am I the one that can improve how it works. I'm just trying to explain the parts that I can. The need for, and benefits of improvement are pretty clear. So, I'm confident we will see some improvements in future releases. Rosetta Moderator: Mod.Sense |
Angus Send message Joined: 17 Sep 05 Posts: 412 Credit: 321,053 RAC: 0 |
I don't know all the details about how the numbers get revised and how they are communicated back to the BOINC Manager. Nor am I the one that can improve how it works. I'm just trying to explain the parts that I can. The need for, and benefits of improvement are pretty clear. So, I'm confident we will see some improvements in future releases. And still, the developers and real project people are strangely silent on this. No comments or acknowledgements in the 1% thread. Proudly Banned from Predictator@Home and now Cosmology@home as well. Added SETI to the list today. Temporary ban only - so need to work harder :) "You can't fix stupid" (Ron White) |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Hi John, Mod.Sense, and others: Thanks for bringing this up. More checkpointing and better time-to-completion feedback were big causes of controversy last year (and again now!) -- we did put checkpointing into larger jobs, but never really addressed the problem of accurately estimating time to completion. We've been too busy getting rid of early bugs and putting new science modes into Rosetta! Things have settled down, though. the development team will discuss both issues early next week. Thanks, Rhiju http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=11332 |
[STS]LoB Send message Joined: 18 Mar 07 Posts: 4 Credit: 678,612 RAC: 0 |
Hey Rhiju, has Version 5.59 increased the checkpointing frequency? I ask because of the heavily increased rate of updates to the progress display (xx%)... Hi John, Mod.Sense, and others: |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Version 5.59 tackled the % complete. Additional checkpoints will be added in the coming weeks. See Rhiju's post on the Ralph boards. Rosetta Moderator: Mod.Sense |
[STS]LoB Send message Joined: 18 Mar 07 Posts: 4 Credit: 678,612 RAC: 0 |
Thanks! Version 5.59 tackled the % complete. Additional checkpoints will be added in the coming weeks. See Rhiju's post on the Ralph boards. |
Message boards :
Number crunching :
Why can't Rosetta checkpoint more often (compared to WCG)? +feedback
©2025 University of Washington
https://www.bakerlab.org