Message boards : Rosetta@home Science : 90% failure rate
Previous · 1 · 2
Author | Message |
---|---|
sharder8 Send message Joined: 2 Feb 06 Posts: 7 Credit: 15,648,378 RAC: 0 |
Of the 20 computers that I'm running/have run Rosetta on, only one has had a 90% + failure rate. That one is a dual Xeon 450 running @ 500MHz. Consequently, that one was moved to another project, as I thought/felt if was/is a machine problem. That box has crunched [FAD], DIMES, and RC5-72 without any problems. In this case, it probably isn't much of a loss to the Rosetta project. Recently though, another machine started having problems and would end up with an error message containing the message "daily quota met". The only way I was able to recover was to do a complete un-install, followed by a clean install. Unfortunately, now it continues to get the error regardless of what I do. That machine is a Mobile 2800+ Semperon. It's currently running DIMES and RC5 without any problems. Finally, I've run into the 1% "stuck" problem. This one is starting to get real tiring and I've stopped Rosetta on 2 machines that seemed to get by far the majority of jobs stuck at 1%, that I've had. I understand that this problem is being worked on and will continue crunching Rosetta on the remainder of my machines. Harder |
R/B Send message Joined: 8 Dec 05 Posts: 195 Credit: 28,095 RAC: 0 |
What is the RC-5 project? I've looked but didn't see anything about it. Thank you. Founder of BOINC GROUP - Objectivists - Philosophically minded rational data crunchers. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
What is the RC-5 project? I've looked but didn't see anything about it. Thank you. It's a project trying to crack encryption algorithm. RC5 Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Pphalan Send message Joined: 5 Nov 05 Posts: 53 Credit: 291,580 RAC: 0 |
Well after comming home from overseas I still see a problem with the program and looking at other threads hear no explanation. Been awhile since winter holidays. I have shut down 7 of my machines. I have a machine turning in result after result with no cpu time shown and no points showing no errors on the Client. That will number 8 I am shutting down on this project. Rosetta is not only about to lose me forever on this project but my whole team. I have talk to friends on other teams and you guys would not believe the real dislike thats brewing out there for this program. The attitude around her seems to be so what? Well guess what happens when you get people out there calling Roseetta a lousy DC project in the forums? Explain this to me? https://boinc.bakerlab.org/rosetta/results.php?hostid=58422 Results for computer This machine used to do a good job on every DC project on it.....Its doing nothing now worth anything. http://www.christianboards.org/forum.php http://usalug.org/phpBB2/index.php |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1323#12948 is a thread where some others are discussing your problem showing up on their Win98 machines. (No time, no credit). Which means we need more Win98 machines testing out Ralph; and monitored by those that keep track of their machines. The 90% failure rate that happened prior to you leaving was described elsewhere as a batch of failing WUs. For this problem.. do you have the option of upgrading to Win2k or WinXP or jumping to Linux? (To help prove that it's an OS issue, not hardware.) Keep in mind that this client is undergoing the same types of problems that other medical apps had in their early days, and those of us lucky to have come in after the problems were ironed out - never got to see. (This is my first time experiencing the "early stage".) But things are improving. Although it looks like we'll need a 4.84 client update for the Win98 users.. David(s)/Rom, etc: How can we help the programmers track down this problem? |
Pphalan Send message Joined: 5 Nov 05 Posts: 53 Credit: 291,580 RAC: 0 |
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1323#12948 I want you to bare in mind that this machine ran Rosetta perfect as it is set now. Then I had the high failure rate with all nine and not one of those machines are identicle. I do not have the option of using a newer windows OS. Matter of what I use the money for, another cruncher or buying licenses just for machines doing DC projects. Thank you for your response.... http://www.christianboards.org/forum.php http://usalug.org/phpBB2/index.php |
Whl. Send message Joined: 29 Dec 05 Posts: 203 Credit: 275,802 RAC: 0 |
I dont have time to attach and report back to Ralph right now, or babysit this thing anymore (too much else happening). My machines were working fine up till 4.83 was released. I will let the existing jobs in the cache run and empty and try back here in a month or so. Hope you sort out all the bugs guys. Good luck and all the best. |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
Pphlan wrote:
Just so we're all on the same page here, from what I understand, on the Win98 PCs in question the SCIENTIFIC computations work fine (from what I can tell by watching the results output of Pphalan's PC), but NO CREDITS are granted, because BOINC reports 0 seconds and claims 0 credits. Also, AFAIK, everything credit-related (timing, claiming etc) is still done IN BOINC, not in the science application for ALL BOINC projects except SETI-Beta. Apparently the fixes for 4.83 had an effect on BOINC's timing under Win98. I guess the project can run a script to correct the credits for WUs which complete correctly, yet due to Win98/BOINC/R interaction time spent is mis-reported. So the big fuss is (again) about (temporary?) credits. Personally I'd be upset if my PCs spent the time without producing any useful results. I guess everyone is entitled to his priorities. Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Pphalan Send message Joined: 5 Nov 05 Posts: 53 Credit: 291,580 RAC: 0 |
Pphlan wrote: I said at the begining of this thread what my priorities are. I have no measure if the machine is doing anything useful though...For all I know its turned in nothing. LOL http://www.christianboards.org/forum.php http://usalug.org/phpBB2/index.php |
Pphalan Send message Joined: 5 Nov 05 Posts: 53 Credit: 291,580 RAC: 0 |
We should be clear of the "bad" work units by now. There still is a 7% chance of getting a bad random number seed but it should in no way be at 90%. Batch 205 is most definitely done by now. My second post in this thread. http://www.christianboards.org/forum.php http://usalug.org/phpBB2/index.php |
Dimitris Hatzopoulos Send message Joined: 5 Jan 06 Posts: 336 Credit: 80,939 RAC: 0 |
I have no measure if the machine is doing anything useful though...For all I know its turned in nothing. LOL Assuming you're not joking, it's rather easy to tell whether a machine is doing the scientific work or not, you can just clicking on the resultid URL, e.g.: https://boinc.bakerlab.org/rosetta/result.php?resultid=15867586 Exit status 0 (0x0) stderr out <core_client_version>5.3.1</core_client_version> <stderr_txt> # random seed: 1822271 # cpu_run_time_pref: 7200 # DONE :: 1 starting structures built 11 (nstruct) times # This process generated 11 decoys from 11 attempts </stderr_txt> So you can see that your PC computed 11 predicted protein structures, within the 2hrs (7200sec) it ran on this particular WorkUnit and exited with a status of 0 (success). On WUs/PCs with problems, there are lots of different error codes, which people report in the various specific error-reporting threads in "Number Crunching". This particular issue is a glitch with how BOINC can track process time under Win98 and I've seen it discussed in various other BOINC projects. My 2 cents... Best UFO Resources Wikipedia R@h How-To: Join Distributed Computing projects that benefit humanity |
Andrew Send message Joined: 19 Sep 05 Posts: 162 Credit: 105,512 RAC: 0 |
This particular issue is a glitch with how BOINC can track process time under Win98 and I've seen it discussed in various other BOINC projects. This is a known issue with boinc, not rosetta. It is one reason why the official supported Windows platforms are only XP, 2000, and 2003 server. https://boinc.bakerlab.org/rosetta/rah_requirements.php Some people don't have any issue running win98, others do... you unfortunately are one of the unlucky ones. |
BennyRop Send message Joined: 17 Dec 05 Posts: 555 Credit: 140,800 RAC: 0 |
Does the error show up in Win98SE, or just Win98? (Or the reverse?) |
Johnathon Send message Joined: 5 Nov 05 Posts: 120 Credit: 138,226 RAC: 0 |
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=1177#13069 |
Whl. Send message Joined: 29 Dec 05 Posts: 203 Credit: 275,802 RAC: 0 |
I see Dr Baker says the science is unaffected with the Win98 problem, so I will continue with those machines. |
dgnuff Send message Joined: 1 Nov 05 Posts: 350 Credit: 24,773,605 RAC: 0 |
I said at the begining of this thread what my priorities are. I have no measure if the machine is doing anything useful though...For all I know its turned in nothing. LOL I suppose that pointing out that error results are extremely useful, doesn't matter to you. Even if a WU errors out it helps to identify which WU's have bugs. As any programmer will tell you, it's impossible to fix a bug you can't find. I've had my fair share of error WU's and as far as I'm concerned, they were useful. They did not return any scientific results, but they DID help the Rosetta team debug and improve the application. |
Pphalan Send message Joined: 5 Nov 05 Posts: 53 Credit: 291,580 RAC: 0 |
I said at the begining of this thread what my priorities are. I have no measure if the machine is doing anything useful though...For all I know its turned in nothing. LOL As I understand it now the problem is with boinc not rosetta. So hows an error with boinc doing any good for rosetta? Oh my primary machine uploaded some more errors for you....its XP Pro. And all my remotes are XP that keep dropping the program. They have not been added back, just to much of a pain. http://www.christianboards.org/forum.php http://usalug.org/phpBB2/index.php |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
This is an important point which perhaps still needs to be spelt out clearly to newcomers, and indeed to all who joined before last Xmas: By the standards of Einstein or SETI, Rosetta is a permanent Beta project. It is getting better (due mainly to using Ralph for alpha testing of new WU) and it will continue to get better for a while yet, but it will never be as reliable as Einstein or SETI. For some users that will turn them away - especially those who seek every last credit. Fair enough - as donors you have the right to donate whereve you feel most happy. Maybe they want every last credit, or maybe they have a lot of boxes at a lot of different places and want the most reliable project going. For other users, the science is more important than the credits and reliability is important but not absolutely critical. They'd be happier now than they were last winter. If you want to run Rosetta code that is tested to around SETI standard of quality, and is being used for production runs on real proteins, then I'd suggest the World Computing Grid, and select the option for the Human Proteome Project. My son runs that and has not had any problems at all. They are using an older version of Rosetta - version 4.21 - maybe not so fast at solving the proteins but it does have seem to have the wrinkles ironed out. Dr Baker is involved with both projects and has been quoted as saying that both projects are important steps towards solving the problems of protein structures. Anyone who is still unhappy with the level of reliability here, I'd suggest going here and follow the link for people who already run BOINC. I'd also suggest checking back every couple of months as the reliability continues to improve here. I don't suggest checking back if you want the very highest reliability - stick with the production model over at the grid. hope that helps |
Pphalan Send message Joined: 5 Nov 05 Posts: 53 Credit: 291,580 RAC: 0 |
I said at the begining of this thread what my priorities are. I have no measure if the machine is doing anything useful though...For all I know its turned in nothing. LOL Thank you very much. My background is in Physics and Electrical Engineering, I want to run just one project. I have to much of a Communication systems background to ever run Seti.....Those massive radio noise makers called stars and the vast size of the galaxy make it futile. lol Keep up the good work...I appreciate it. http://www.christianboards.org/forum.php http://usalug.org/phpBB2/index.php |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
I have to much of a Communication systems background to ever run Seti.....Those massive radio noise makers called stars and the vast size of the galaxy make it futile. lol On that at least we agree! You will see from my stats the relative importance I've given SETI ;-) Keep up the good work...I appreciate it. Me too - the quality of the feedback and responsiveness to users is what keep me donating time here, even tho physics is my favoured field. |
Message boards :
Rosetta@home Science :
90% failure rate
©2025 University of Washington
https://www.bakerlab.org