Message boards : Number crunching : Check pointing needs fixxed
Author | Message |
---|---|
Ron Peterson Send message Joined: 6 Oct 05 Posts: 23 Credit: 4,268,694 RAC: 0 |
I don't run applications non stop on my home computer. Yes, I pause and restart between different projects. But everytime I get above about 20% with 10 to 15 hours and pause, then restart, I loose all work, going back to 0%. I want the old check pointing system back. Ron edit - It's only been a few weeks like this. edit2 - And yes, I leave the application in memory |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Hi Ron If you check your Wu when it has restarted does the CPU-time reset to 0? And if you check the grafics what model does it start at? Anders n |
Ron Peterson Send message Joined: 6 Oct 05 Posts: 23 Credit: 4,268,694 RAC: 0 |
Hi Ron No, just the progress goes to 0%. And no, I've not checked graphics. |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
You don't loose more work now than you did before it's just % to finish that it off when you restart Boinc. Hope they find a fix for it soon. Happy Chrunching Anders n |
Ron Peterson Send message Joined: 6 Oct 05 Posts: 23 Credit: 4,268,694 RAC: 0 |
*grumble* After getting to about 12% it restarted, yet again. Graphics show it in the "ab initio + relax" start up stage. Is it worth running rosetta at all till this is fixxed? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Ron, I'd say your results speak for themselves. The only ones that have any sign of error, are the ones that were aborted by user. Anders n is correct, the only thing that's really changed here is that the progress % completed is updated more frequently. It used to only update at the end of each model. Now it updates every 5 seconds. And with that new change there is a new quirk, that being that upon a full restart of a task (i.e. it was removed from memory), the % complete will start at zero, even though many hours of work may be retained in the task already. These will still finish at the normal time. It is simply the indication of how far along we are in the task that is incorrect. ...and yes, I've been saying many many times in threads throughout the boards that the checkpointing is the next big thing (from a user's point of view) that we will see in the next Rosetta release. Rosetta Moderator: Mod.Sense |
Ron Peterson Send message Joined: 6 Oct 05 Posts: 23 Credit: 4,268,694 RAC: 0 |
Ron, I'd say your results speak for themselves. The only ones that have any sign of error, are the ones that were aborted by user. I'll see what this run does. 15 hours + 17% done after two restarts. Yes, I aborted earlier ones because it liked they had lost progress. |
Purple Rabbit Send message Joined: 24 Sep 05 Posts: 28 Credit: 4,296,740 RAC: 3,006 |
I think one of the unintended consequences of the new user friendly progress bar is that you don't know if a model has completed. I used to wait for 10%, 20%, etc. before doing anything drastic with BOINC. Now these points don't seem to necessarily correspond to the end of a model. My anecdotal and non scientific observations show that I may have thrown away some work because the progress is only updated to reality at the end of each model. This isn't a complaint. It's an observation. It may also be an example of: "You can't please everyone no matter what you do" :-) |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
It looks like there is a bug in the % complete logic. It should jump back to the last checkpointed value after a restart. We'll fix this for the next release. You can't really keep track of the models via the boinc manager but the rosetta graphics has the model number. The % progress and time to complete are estimates due to the variable run times per model and the per model time resolution used to prevent going over the run time pref. |
Purple Rabbit Send message Joined: 24 Sep 05 Posts: 28 Credit: 4,296,740 RAC: 3,006 |
You can't really keep track of the models via the boinc manager but the rosetta graphics has the model number. Well yes, but most of my computers are remote Linux machines...sigh. It's a PITA, but certainly not a show stopper. The BOINC progress report is all I have so any efforts to make it more accurate with respect to model completion will be appreciated (but you knew that already!). I only mention this as constructive criticism so you know the problem I have and with the hope that the next version will be better. |
Message boards :
Number crunching :
Check pointing needs fixxed
©2024 University of Washington
https://www.bakerlab.org