Message boards : Number crunching : WU run times out of whack
Previous · 1 · 2
Author | Message |
---|---|
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
Well, my problem is this: I have understood that Rosetta currently saves checkpoints after finishing one model (decoy). I have had models running for 5 h 30 mins, and as I move around a bit I have to avoid such sessions since I cannot handle such models if they choose inopportune moments to appear. Currently (5.59) I receive models needing less time to completion, but even now I have seen models exceeding 2 h by a not insignificant amount. If no intermediate saving is performed I am in trouble, either wasting too much computing time or, worse, getting stuck with a model I cannot complete within a reasonable period. So I seem to be able to limit my problems to the length of the first model of a wu (but exceptions are noted), and most of the time 5.59 manages quite nicely. I am aware that I do not lose 8 h of work when shutting down the computer, but I question my effectiveness even if losing 1 h of computing. I have also observed the quirks of the statistic, so I no longer pay too much attention to the completion reports. Maybe wus have generally become shorter, but I cannot use too much time experimenting in this matter. I imagine I can cope until a new version appears and I can find out how often you plan to save a wu in progress. And then it all depends on whether I find my computing to be of sufficient use or not. -- R. A. Mostol |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
ramostol, you are correct that there are some longer running models, and also that some long running models do not (yet) save checkpoints in between models. That's why they are working on the checkpointing. At present (Rosetta v5.59), some types of work units can take checkpoints in mid-model, and others cannot. My point was just that the amount of work lost or preserved, for a given type of work unit, is the same, regardless of the runtime preference. Rosetta Moderator: Mod.Sense |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
The checkpointing within a model will be for the pose and jumping jobs. The ab relax jobs already have checkpointing within a model. We will be able to set the checkpointing interval and will probably start at 5minutes or so and see how it goes. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
See section on checkpoint here. Call the BOINC API boinc_time_to_checkpoint() when a checkpoint is possible and it will tell you if you should. I presume the returned value is based upon the user's General Preference (see <disk_interval> property in global_prefs.xml) for how frequently to write to disk. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Feet1st, I'll add the api call and see how it goes with the default interval (60 sec). It may be overkill so I might increase the minimum interval. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Feet1st, I'll add the api call and see how it goes with the default interval (60 sec). It may be overkill so I might increase the minimum interval. Perfect! Yes, BOINC only allows me to say the MOST I'd like it to use my disk... not the AVERAGE. But if user had specified 300 seconds or I think I've got mine set to 900 seconds (15 min.) then you don't want to be writing any more then that. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
the checkpoint files are small |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
The informations in the recent messages in this thread are just the facts needed to make us calm down and let the developers do their best. In spite of some remaining irregularities I feel that Rosetta 5.59 is the most stable release to date for the Mac platform, and secure and predictable checkpointing will make most of these irregulatities unimportant when ensuring that Rosetta may function in a satisfying way. Thanks. -- R. A. Mostol |
Message boards :
Number crunching :
WU run times out of whack
©2024 University of Washington
https://www.bakerlab.org