Message boards : Number crunching : Problems with Rosetta version 5.46
Author | Message |
---|---|
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Please report here for problems you have observed with Rosetta version 5.46. |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
What could be causing these compute errors? It's only happening on one of my hosts in the last few weeks. https://boinc.bakerlab.org/rosetta/result.php?resultid=62506015 https://boinc.bakerlab.org/rosetta/result.php?resultid=62470017 https://boinc.bakerlab.org/rosetta/result.php?resultid=62378522 https://boinc.bakerlab.org/rosetta/result.php?resultid=62351637 https://boinc.bakerlab.org/rosetta/result.php?resultid=61390501 That host has been fine running Rosetta for ages. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Marty's WUs are from 5.45 and 5.46, all seem to end with -107. And MOST of the WUs on this Win/XP machine are now failing. One after 47 seconds, others after more then two hours. Rosetta Moderator: Mod.Sense |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
And MOST of the WUs on this Win/XP machine are now failing. Grrr, so they are. I've set that host to "no new work" on Rosetta for now until the cause is found. |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Have you tried to reset the project to see if it helps? Those workunits themself seem to be fine and if this happens all the time on a single host, my guess is that some files become corrupted. Another possibility is hardware problem though this can be ruled out if it does not have problem of running other programs. What could be causing these compute errors? It's only happening on one of my hosts in the last few weeks. |
StephenYavorsky Send message Joined: 24 Mar 06 Posts: 9 Credit: 87,195 RAC: 0 |
Please report here for problems you have observed with Rosetta version 5.46. "Waiting for memory" I have never seen this message previously, but two Rosetta units, the final one in the queue from 5.45 and the first from 5.46, have both just now stopped, with the message "waiting for memory." This has not happened before. |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
Please report here for problems you have observed with Rosetta version 5.46. There are two new setting in your "General Preferences". They are "use at most X percent ram while active", and "use at most X percent of ram when not active". By default they are set to 50 and 90 respectively. Try increasing them. tony |
StephenYavorsky Send message Joined: 24 Mar 06 Posts: 9 Credit: 87,195 RAC: 0 |
Please report here for problems you have observed with Rosetta version 5.46. Thanks, Tony, I've found the settings and I'm sure it will help. |
meshmar Send message Joined: 1 Apr 06 Posts: 26 Credit: 176,432 RAC: 0 |
Please report here for problems you have observed with Rosetta version 5.46. I was aware of the change in preferences, and had changed mine already. Only one of my 'crunchers' had a problem - and only with some of the Rosetta WUs. These WUs seem to grab a LOT more memory than others, and that leads to the problem with 'waiting for memory' .... |
Viromancy Send message Joined: 23 Sep 06 Posts: 8 Credit: 125,713 RAC: 0 |
Still some watchdog terminations with version 5.46: https://boinc.bakerlab.org/rosetta/result.php?resultid=62694055 https://boinc.bakerlab.org/rosetta/result.php?resultid=62738141. Haven't seen this type of error before. |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
Have you tried to reset the project to see if it helps? Those workunits themself seem to be fine and if this happens all the time on a single host, my guess is that some files become corrupted. Another possibility is hardware problem though this can be ruled out if it does not have problem of running other programs. Have tried that now, and that host is still failing - and on almost every WU now. The same host now also fails to run the new Human Proteome Folding WUs from WGC, and that's Rosetta too. But every other project, including the others from WGC, run fine, so does any other bit of software I run on it. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I got this error on one of the WU's that was recently completed: <core_client_version>5.8.8</core_client_version> <![CDATA[ <stderr_txt> # random seed: 1489227 # cpu_run_time_pref: 28800 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -252.879 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .aac4z1.out https://boinc.bakerlab.org/rosetta/result.php?resultid=62674491 |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Same error on the next WU to complete: <core_client_version>5.8.8</core_client_version> <![CDATA[ <stderr_txt> # random seed: 1432700 # cpu_run_time_pref: 28800 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -192.27 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .aand73.out^ https://boinc.bakerlab.org/rosetta/result.php?resultid=62736709 |
Viromancy Send message Joined: 23 Sep 06 Posts: 8 Credit: 125,713 RAC: 0 |
... Is your machine overclocked? I had the same problem: Rosetta worked fine for months, then these access violation errors started creeping in, then they became more common, then almost overnight 4 of every 5 WUs were failing with the same type of error. No other project was affected (wasn't running Human Proteome Folding at the time)and no other piece of software ever showed any kind of instability. I'd been running my processor and memory at the highest stable clock setting I could find; and when I stepped the overclock down by a tiny amount (about 1.5% from 3.46GHz to 3.40GHz) the result was that Rosetta suddenly became completely stable again. Hardly had any access violation errors since. If you've overclocked that machine, even if everything else runs okay, it might be worth dropping the speed down a little bit and seeing what happens. |
EigenState Send message Joined: 16 Feb 07 Posts: 4 Credit: 1,667 RAC: 0 |
I have only been running BOINC for a week, and that only for Einstein@Home until yesterday, 15 February 2007, when I attempted to attach to Rosetta as well. From all I can tell, the attachment and download went smoothly enough. However, as soon as any Rosetta Work Unit began its calculations, I immediately received a Compute Error. More surprisingly perhaps, Rosetta was then detached from my BOINC Manager. I tried to re-attach two more times with basically identical results. Inspection of my Results Log indicates that three WU’s were terminated as Compute Errors, three were terminated claiming the user had detached, and one remains In Progress despite Rosetta having been detached. What is common to all is that after each attachment, Rosetta was spontaneously detached in that I did not request the detach action. Examples from my Results Log follow: Compute Error: https://boinc.bakerlab.org/rosetta/result.php?resultid=62900991 Client Detached: https://boinc.bakerlab.org/rosetta/result.php?resultid=62885782 In Progress: https://boinc.bakerlab.org/rosetta/result.php?resultid=62902733 If I have been doing something incorrectly, advice as to how to correct those mistakes would be most welcome. If this is a problem with Rosetta 4.56, then perhaps this information will be useful in identifying and correcting those problems and I can wait and re-attach to Rosetta once those problems are resolved successfully. |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
If I have been doing something incorrectly, advice as to how to correct those mistakes would be most welcome. If this is a problem with Rosetta 4.56, then perhaps this information will be useful in identifying and correcting those problems and I can wait and re-attach to Rosetta once those problems are resolved successfully. @EigenState Do you use BAM? |
EigenState Send message Joined: 16 Feb 07 Posts: 4 Credit: 1,667 RAC: 0 |
Yes, I do use BAM. If I use it properly is an entirely different question to which I hope the answer would be yes, but I am not certain of that. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
If I have been doing something incorrectly, advice as to how to correct those mistakes would be most welcome. If this is a problem with Rosetta 4.56, then perhaps this information will be useful in identifying and correcting those problems and I can wait and re-attach to Rosetta once those problems are resolved successfully. i.e. An account manager BAM being BoincStats Account Manager or GridRepublic a similar one. If so you need to attach through the account manager itself and check it has updated. If not, some more computer info and a spin of to a new post would be good. I would recommend at this point to uninstall boinc, download the now updated again 5.8.11 version of boinc and reinstall (or just install 5.8.11 over the top but I was just making sure everything was cleaned out) Team mauisun.org |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Yes, I do use BAM. If I use it properly is an entirely different question to which I hope the answer would be yes, but I am not certain of that. As above, you have to attach, using the host options in BAM. If you try to attach yourself and it is a project boinc support, when it contacts BAM it will kick the project off (unfortunatly now questions asked) To help you out http://www.boincstats.com/bam/host_list.php Link to your host list Since you have a Rosetta@home account you may have to find it first in BAM http://www.boincstats.com/bam/project_sign_up.php Team mauisun.org |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
Is your machine overclocked? Nope, it was running at the standard speed. Just for the heck of it though, I've now underclocked it 6% to see how it goes. |
Message boards :
Number crunching :
Problems with Rosetta version 5.46
©2025 University of Washington
https://www.bakerlab.org