Message boards : Number crunching : Need help fixing problems or avoiding Rosetta Mini
Author | Message |
---|---|
Alan Roberts Send message Joined: 7 Jun 06 Posts: 61 Credit: 6,901,926 RAC: 0 |
I've reached the end of my ability to keep up with failing Rosetta Mini work units. I'm allowed the off-hours use of four dual-CPU Xeon servers (261560, 262128, 261547, and 262119) which crunch Beta jobs without problem, but every time I see their stats fall off and go look, a Mini job is "hung." Hung in this case means CPU time is being consumed way beyond limit, with no increase in percentage complete. Far worse, once a Mini job is in this state it stops obeying BOINC's time-of-day suspend rules. BOINC shows the job suspended, but Windows show it at load, churning away. This is unacceptable, I crunch on these machines off-hours with the permission of a business! There are multiple Pentium D desktop systems that are seeing large numbers of Mini failures (260104, 259209, and 259666 are examples). I've only been able to spot-check results on these, but the failures seem to be Mini and they seem to complete Beta jobs. Some of these have so many failures that their daily WU quota has dropped, and they are going idle. All of this gear is stock Dell hardware ... No overclocking or other pushing the envelope going on. In fact I have them configured to only utilize a single core/CPU, trying to make sure I avoid high system temperatures (and Beta jobs don't seem to be failing). In order from most to least desirable:
|
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
I've reached the end of my ability to keep up with failing Rosetta Mini work units. [...] For a quick (temporary) help, take a look at these two messages. Peter |
funkydude Send message Joined: 15 Jun 08 Posts: 28 Credit: 397,934 RAC: 0 |
Hey, out of curiosity, what BOINC client are you using? I had severe problems with failures on another project using the normal client, after installing the beta to take advantage of "protected mode" I've never had a problem since. Link to betas https://boinc.berkeley.edu/download_all.php |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
Hey, out of curiosity, what BOINC client are you using? ATM 6.2.4+6.2.11. If curious, take a look at any person's recently returned result and (on a particular computer) the client number should be marked there in the stderr.out section as <core_client_version>N.N.N</core_client_version>. (If not faked, happens too on some private builds.) I had severe problems with failures on another project using the normal client, after installing the beta to take advantage of "protected mode" I've never had a problem since. If I may know, what type of problems and where? Might be worth tracking down and fixing. Link to betas https://boinc.berkeley.edu/download_all.php Beware that using test versions is often risky, if you do not listen to BOINC alpha information channels... Peter |
Alan Roberts Send message Joined: 7 Jun 06 Posts: 61 Credit: 6,901,926 RAC: 0 |
Thanks for the pointer. I've got the servers locked down to Beta jobs, and they are obeying BOINC's time-of-day suspends (helps with the geo-politics). I can deploy to the Pentium D desktops that are throwing all the errors on Mini jobs this evening. Not sure if the previous poster was asking me about BOINC version, but I think everything is at 5.10.30 or higher. I've only been running BOINC updates when I happened to be visiting a machine at it was near a job boundary. |
funkydude Send message Joined: 15 Jun 08 Posts: 28 Credit: 397,934 RAC: 0 |
It was with the climateprediction project, had constant error after error until using the beta (6.x) to get protected mode. Perhaps your problems are from an old beta? I saw 6.2.4 somewhere there. |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
(Off-topic here): It was with the climateprediction project, had constant error after error until using the beta (6.x) to get protected mode. Have you ever asked for help there? You've started crunching few WUs in mid June, but all with BOINC 5.10.45, no beta to be seen in your results. Peter |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
(Off-topic here): what has climate prediction got to do with Rosetta? you should be posting this over in climate or over in the cafe boards here at RAH |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
(Off-topic here): If you have not noticed yet, I'll reveal it for you: BOINC ;-) you should be posting this over in climate or over in the cafe boards here at RAH Maybe you're right (hence my "(Off-topic here)" note), but still, the applications and clients depend strongly one upon another and both suffer from the other's flaws. Understand it as a search for a one more possible common source of errors. Peter |
funkydude Send message Joined: 15 Jun 08 Posts: 28 Credit: 397,934 RAC: 0 |
I gave up on climate prediction after changing to beta, I didn't like how long it took for 1 result. Rosetta was my first project after beta, |
Message boards :
Number crunching :
Need help fixing problems or avoiding Rosetta Mini
©2024 University of Washington
https://www.bakerlab.org