Message boards : Number crunching : Report long-running models here
Previous · 1 . . . 7 · 8 · 9 · 10 · 11 · 12 · 13 . . . 14 · Next
Author | Message |
---|---|
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
This wu https://boinc.bakerlab.org/rosetta/workunit.php?wuid=225141500 strikes me as odd. I have the run time set to the 6 hour standard, but this one seems to be in a loop. It has 10:03 to go for completion, drops to 10:02 then quickly jumps back to 10:03. It has been doing that for at least the last 30 minutes, maybe more. Collectively it has run 08:17:46. I have suspended it pending advice. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
adrianxw, mini's watchdog kicks in after preferred runtime plus 4 hours. So you haven't quit made it there yet. You are looking at time remaining, which is simply computed from the % complete and the CPU time. The watchdog will look at CPU time used. Since the task is running long, Rosetta is slowing the rate at which it increases the % complete, and so it makes the resulting estimated completion time bounce around. But it is better then reaching 100% and not being done yet. Since your runtime is 6hrs, let it run for at least 10.25hrs before you worry about it. The watchdog should cleanly take care of it before that time. And if you are running the new 6.6 client, use the runtime shown in the Rosetta graphic, NOT the one shown in the BOINC manager. I've seen cases where they are significantly different. Rosetta Moderator: Mod.Sense |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Fair enough, it is running again, and doing the same thing with respect to the 10:03->10:02->10:03. BOINC is 6.6.20. BOINC Manager says it 08:29:30 now whilst the BOINC graphic, which I don't normally look at says 08:27:30. I notice the run time is on the parameters list now, (may have been for ages, it is not something I usually look at), and that the 6 hours comes from there. Is that a scientific optimum or a twitchy cruncher limit? (Edit spelling, gee, shakes head and mumbles...) (Edit again, job was still 10:03 when suddenly it went to "Ready to report" after 08:48:27) (Edit again, hmm and miserly credit as well, ah well) Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
tralala Send message Joined: 8 Apr 06 Posts: 376 Credit: 581,806 RAC: 0 |
|
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
Fair enough, it is running again, and doing the same thing with respect to the 10:03->10:02->10:03. BOINC is 6.6.20. BOINC Manager says it 08:29:30 now whilst the BOINC graphic, which I don't normally look at says 08:27:30. From what I've seen, 6 hours is now the default setting for those who haven't set their workunit size to something else. It's better for handling typical workunits now than the previous default of 2 hours, and larger values put less of a load on the server. |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
My question was more, "Is there scientific benefit from having longer run times?". If the team get more out of 2 x 12 hour units then 4 x 6 then I'd like to know. I don't know enough about their models to answer this myself. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
According to the preferences the default is 3 hours if not selected. I am not selected and as best as I can recall the run times they are still about 3 hours ... |
adrianxw Send message Joined: 18 Sep 05 Posts: 653 Credit: 11,840,739 RAC: 28 |
Well, perhaps that is true for you Paul, but I just checked mine again and it is quite definitely 6. Wave upon wave of demented avengers march cheerfully out of obscurity into the dream. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The benefit to the science is more based on how many hours of crunching per day you do. Not how many hours per task. But less hits to the servers per day from your machine leaves them free to service more participants with the same server hardware. The preference for runtime is set in the Rosetta-specific preferences on the website. Click on the "[ Participants ]" link above. Keep in mind that preferences exist for each venue you have set up. Rosetta Moderator: Mod.Sense |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Well, perhaps that is true for you Paul, but I just checked mine again and it is quite definitely 6.
from the settings. If you had changed it to 6, it will be 6 ... Unless we are accessing different web sites ... :) |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
Adrianxw: My question was more, "Is there scientific benefit from having longer run times?". If the team get more out of 2 x 12 hour units then 4 x 6 then I'd like to know. I don't know enough about their models to answer this myself. I'm curious about the topic that adrianxw raises. Bear with me 'cause I'm no linguist nor writer. Tasks begin with a seed, some starting point from which models are developed, no? Is this akin to a tree with roots and branches that grow from the seed? The longer the tree lives (task run time) the more roots and branches, no? So based on this it "seems" that a long run time is better, more output based on the one input seed. No two trees look alike. A different seed leads to different roots and branches and different outcome I guess? So if you run a unique seed for 24 hours you get one set of roots, branches, models. For 12 hours work presumably you get say half as many roots and branches. So if you run a task for 12 hours instead of 24 then how do you uncover/discover those roots and branches that would have been revealed if the task was allowed to run 24 hours? Also, after a 12-hour run you are assigned a new task of whatever kind with another unique starting seed. So, a participant running 4, 6-hour tasks would likely get 4 different kinds of "proteins to model" with 4 different seeds. While a single task of 24 hours length would produce results for one "protein" study. Anyway, what I'm trying to convey in a peculiar way is that I don't see how one can compare what is ostensibly achieved from a host running 4x6 tasks, or 1x24 task, or 8x3 tasks, or whatever. I don't see how you can say that it doesn't matter whether you run 1x24 or 4x6, because it seems to me that it does matter. The missing link for me to understand this is the whole concept of a "seed", where it comes from and how it relates from one task to another. That is, if I run a task with a given seed for 12 hours, is there another task sent out with a different seed that ostensibly reveals the information that would have been produced if the first task had run for 24 hours? Or does the project not care about that? I think people want to know the optimal run time that produces the optimum scientific output. You say it doesn't matter. What needs explaining is how the "seed" concept renders the run time as moot and irrelevant with regard to scientific achievement. Sorry, wish I could do better. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Idle, you have the basic idea. I believe the point you are missing is that there are literally trillions of trillions (of trillions...) of potential branches possible. And so the objective is to pursue some small subset of those. It doesn't matter to the science of the completed "trees" are small with 10 branches, or large with 30 branches. Every completed "branch" adds to the subset the scientists are seeking to review. When they set out to study a protein, they decide which of the approaches could best be applied to the study, and they have some round number of models they would like to complete (say 100,000, it varies depending on many factors). They don't define any specific 100,000 that must be reviewed. They are originated randomly in order to get a sampling of what's out there. And so tasks will be created and sent to clients until the desired 100,000 models are completed and returned. If some specific tasks are not returned (i.e. some seeds die), it doesn't harm the overall growth of the forest. It was an outcome that was anticipated from the start. To follow your analogy, if the goal was to produce a forest with 100,000 branches in 10 years, you would plant seeds in the appropriate number, and of appropriate tree variety to assure that the goal will be met with some reasonable margin of error for weather, germination rates, lightening strikes, and insect damage. Rosetta@home works much the same way. They have some idea how many models the average machine is going to produce, and how many WUs they have to create to get the desired number of results. Rosetta Moderator: Mod.Sense |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
Maybe if I watched the screen saver and observed how models are generated I might better comprehend the methodology. I just never wanted to waste cpu cycles on it nor visually sate myself with wiggling molecules like some people apparently do. Graphics add nothing to the scientific value. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Anyway, the science isn't as concerned about whether it is a birch branch or a maple. So long as the total forest has the desired number of branches in it in the desired timeframe, then they have what they need. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
A long-running 1.67 workunit: 5/24/2009 8:56:43 PM rosetta@home Starting epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0 5/24/2009 8:56:44 PM rosetta@home Starting task epsilon_BOINC_ABRELAX_CONTROL_SAVE_ALL_OUT_IGNORE_THE_REST-S25-9-S3-3--epsilon-_12490_15365_0 using minirosetta version 167 I requested 12-hour workunits. So far, this one has used nearly 10 CPU hours, is 1.230% completed, and the time to completion estimate is nearly 32 hours and constantly increasing. Since I'm preparing to upgrade BOINC on this machine to 6.6.28, I may have to abort this workunit if it takes too long to complete. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2126 Credit: 41,253,914 RAC: 7,960 |
A long-running 1.67 workunit: Officially I think a long-running WU is "Requested time + 3 hours" = 15 hours for you. Don't worry about the time-to-completion figure because that's just an extrapolation of how long decoys are taking in the current WU. When the current one finishes it'll recalculate whether it can complete another in the remainder of 12 hours and finish immediately if it can't. You may also need to reduce run time to 8 hours to complete what you've got outstanding. Or just abort them all now if you've got time to do the upgrade. No real point hanging around when you're just 3 days before deadline. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
A long-running 1.67 workunit: There is something odd going on here though I'm not sure it's indicative of a long-running model. The % complete number is based on your requested runtime and while math is not my strong subject I feel perfectly confident in stating that 10 hours is not 1.23% of 12 hours. I suspect rosetta is not in fact receiving any cpu time but BOINC thinks it is (it's BOINC providing the "to completion" estimate not rosetta). I don't know how to check this on a windows machine, Task Manager, maybe? Can you open the graphics window? Are the numbers there the same as(or very close to)the numbers in BOINC manager? Before you give up and abort the WU you might try stopping and restarting BOINC and see if that shakes anything loose. Snags |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
I tried suspending all workunits, then rebooting, since I haven't seen any other information on how to restart BOINC. This restarted this workunit with only 11 CPU minutes shown as already used. This lost any more time it had already used, but then allowed it to complete successfully in about 12 more hours as far as I can tell. I've postponed the BOINC upgrade long enough to finish all workunits already downloaded, but haven't started the upgrade yet. I suspect that the workunit had run into the lockfile problem, and therefore had the minirosetta program mainly waiting in hope the lockfile problem would go away, but this restart did not seem to preserve enough information that I could check this. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Just a few observations on the recent posts here: 1) the watchdog kicks in at target runtime plus 4 hours. If a task runs longer then that, the watchdog will pack it up and send it home. 2) the 6.6.x BOINC client versions now show ELAPSED time, rather then CPU time. So it is entirely possible people report tasks using time and not progressing when their machine doesn't have any CPU time available to run low priority tasks such as BOINC applications. 3) You can upgrade BOINC any time. Even with work in progress. The Rosetta application is still the same, and this is what is truely processing the work, so the BOINC upgrade should not pose a problem. Rosetta Moderator: Mod.Sense |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
3) You can upgrade BOINC any time. Even with work in progress. The Rosetta application is still the same, and this is what is truely processing the work, so the BOINC upgrade should not pose a problem. I'd expect that to be the case, but it's never worked for me. Queued WUs don't get picked up by the new BOINC version and a load more come down in their place. I can see the old WUs sitting on this website, but they never get run and end up expiring. I thought that happened to everyone. Am I wrong? Looks like it :( |
Message boards :
Number crunching :
Report long-running models here
©2024 University of Washington
https://www.bakerlab.org