Resuming a task after a computer shutdown

Message boards : Number crunching : Resuming a task after a computer shutdown

To post messages, you must log in.

AuthorMessage
IVAN DENIA

Send message
Joined: 18 Feb 07
Posts: 2
Credit: 826,792
RAC: 0
Message 52338 - Posted: 9 Apr 2008, 7:24:32 UTC

Two of my computers are not ON 24 hours but the time I am at work. The day after, rarely those computers resume the task where they I left it. Today one was at 41% of the task, and the other at 19%, and when the computer was turned on after the yesterday shutdown, they both started from 0%. (This happens very often). My question is if there's any chance of avoiding this to happen? or if it is a program error?

Thanks
ID: 52338 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 52340 - Posted: 9 Apr 2008, 7:42:17 UTC

The problem is the one of checkpointing.

The two models, for whatever reason had not yet written a check-point and therefore when they resume, they start from the beginning.

THere is a setting to ALLOW check-pointing, and if you have that set high, then that is potentially the problem. Note that this setting does not force check-pointing, it only allows it ..

so if it is set for 10 hours between check-points, the application will not write one until at least that amount of time has past.

However, even if set earlier, that does not mean that the program WILL check-point, only that it is allowed to if it is ready ...
ID: 52340 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
IVAN DENIA

Send message
Joined: 18 Feb 07
Posts: 2
Credit: 826,792
RAC: 0
Message 52341 - Posted: 9 Apr 2008, 10:23:50 UTC

Thank you; but, can you tell me where can I find this setting so I can modify it?
ID: 52341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 52342 - Posted: 9 Apr 2008, 12:24:25 UTC
Last modified: 9 Apr 2008, 12:25:14 UTC

Paul is referring to the setting about how often to write to disk. This is in your computational preferences from the website (see the "[Participants]" link on top of this message board) or you can change it for a specific host in the advanced view, advanced tab, then select preferences... then the disk and memory tab.

The other issue is just the percentage. If you run with a 1 hour runtime preference on Rosetta, 10% is only 6 minutes of work lost. The default is 3 hours, and so 20% is still only 36 minutes. The application does (for most all types of Rosetta work) checkpoint often enough to preserve most all of the work done. It is a balancing act between taking time checkpointing and wearing on the disk drive, and getting more work done.
Rosetta Moderator: Mod.Sense
ID: 52342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 52343 - Posted: 9 Apr 2008, 15:49:25 UTC

Ivan,

I was Ok enough to write the answer, but not to remember, figure out where the setting was.

My apologies.

Mod.Sense

Thank you for filling my lapse ...
ID: 52343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Heidi1
Avatar

Send message
Joined: 11 Aug 07
Posts: 49
Credit: 1,786,248
RAC: 0
Message 52350 - Posted: 9 Apr 2008, 21:43:17 UTC

Check out this thread: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4037

I was having checkpointing issues also, and the debug file mentioned in here has helped me with it. It won't actually fix the problem, but you can monitor it during the day and when you shut down, so you can either wait a little longer until the next checkpoint or at least know how much work you're losing. The checkpointing then won't take you by surprise. There is another possible fix also mentioned in the thread that maybe you're interested in.

BTW, I found my checkpoints were occuring every 45 minutes or so.
ID: 52350 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 52353 - Posted: 9 Apr 2008, 22:38:32 UTC

I did the same as Heidi, only the tasks my machine was working on were checkpointing every 3 minutes or so. Today I see one is checkpointing about every half hour, and the other about every 50 minutes, but one model must have taken longer, because one checkpoint took almost 2 hours. It's got 13 models done in 20hours, the first one has 29 models done after only 12.5 hrs, but either way, only checkpointing at the end of the model apparently for these two. Some types of work can checkpoint several times within each model.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 52353 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Resuming a task after a computer shutdown



©2025 University of Washington
https://www.bakerlab.org