Message boards : Number crunching : Problems with Rosetta version 5.80
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
Beezlebub Send message Joined: 18 Oct 05 Posts: 40 Credit: 260,375 RAC: 0 |
This Capri14 WU did "client error" but has a debug readout. Might be useful https://boinc.bakerlab.org/rosetta/result.php?resultid=105518403 e6600 quad @ 2.5ghz 2418 floating point 5227 integer e6750 dual @ 3.71ghz 3598 floating point 7918 integer |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Just noticed each WU is consuming about 248MB of RAM. With 2 GB of RAM, this was not a problem until the Q6600 went into the system. 4 WUs are consuming 1/2 of the system memory. I posted for him because I noticed the same thing. My post here went unanswered. https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3564 |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
Please report problems with this version. Thanks! While crunching Rosetta Beta 5.80, WU 95855024 on my (1 core) AMD Sempron processor 3000+, BOINC replied with an “Waiting for memory” error. My computer (Windows XP-home SP2) has 448 MB of memory, which exceeds the recommended system requirements. To get lost of this problem, I gave the 5.80 more memory by adjusting the: “Use at most 50% of memory when computer is in use” to 60% of memory. This has solved the problem (so far). O.t.: The screen saver looks like a beautiful piece of art! Path7. |
Gorkan Send message Joined: 13 Sep 07 Posts: 10 Credit: 151,300 RAC: 0 |
I dunno , looks like it was chewing on something it didnt want to swallow On the plus side it didnt leave a mess on the floor. <core_client_version>5.10.20</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 2944148 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0092541B read attempt to address 0x16481000 Engaging BOINC Windows Runtime Debugger... ******************** |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,501,314 RAC: 9,302 |
Thanks for the help with the preferences. I made some changes. After more investigation on the comutation errors, it is clear that of my 5 systems, only the one with a quad core process Q6600 is getting the computation errors. Of course, this is also the busiest system. It looks like about 3 failed for every success WU. Any suggestions are welcome. I can provide the debug info if it will help. https://boinc.bakerlab.org/rosetta/result.php?resultid=105762597 https://boinc.bakerlab.org/rosetta/result.php?resultid=105843964 https://boinc.bakerlab.org/rosetta/result.php?resultid=105811123 https://boinc.bakerlab.org/rosetta/result.php?resultid=105739580 Thx! Paul |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Here is a double failure... t030__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-t030_-plexinmonomer__2083_2234 stderr out <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 3553867 si </stderr_txt> ]]> Validate state Invalid |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I got 0 credits for this wu: too many results: There is a quirk with the BOINC server software which reissues a task too soon. Seems to only happen when one machine gets a compute error. This is discussed in an existing thread, and there is an item on the BOINC boards to get this corrected. Rosetta Moderator: Mod.Sense |
[RKN] schatten1411 , Mitglied des Teams und des VEREINS Rechenkraft.net Send message Joined: 25 Apr 07 Posts: 12 Credit: 441,995 RAC: 0 |
Selber Fehler wie in 5.78 auch in der Beta ? 104916668 545978 11 Sep 2007 15:23:03 UTC 14 Sep 2007 5:36:06 UTC Over Success Done 9,430.36 50.13 20.00 Ihr arbeitet zwar gerade dran, aber was ist mit der Fehlerbeseitigung bei den erledigten WU`s ? |
RC Send message Joined: 27 Sep 05 Posts: 13 Credit: 262,048 RAC: 0 |
OK, great! I'm glad you were able to catch one. Assuming that others behave the same way (a bit of a stretch with only a single one observed, but it's all we have to go by)... the fact that it is still on model one is the reason why the task fails and only 20 credits are granted. Here's another one |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
And a second "double failure" here 1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-lig_plexinmonomer__2085_1427 stderr out <core_client_version>5.10.13</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 2944674 ERROR:: Exit from: .pose.cc line: 769 </stderr_txt> ]]> |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
It's a good thing I don't care too much for credits as I do the science... Just look at the amount of time that is being "wasted"... My wu's are set for 3 hrs (10,800 seconds), and this one ran for over TWICE that, 21,653 seconds !!! All for 20 credits... Here it is... 1g4u__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1g4u_-plexinmonomer__2083_3748_0 stderr out <core_client_version>5.10.13</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 10800 # random seed: 3582353 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -69.8041 for 1800 seconds ********************************************************************** GZIP SILENT FILE: .xx1g4u.out </stderr_txt> ]]> Validate state Valid Claimed credit 93.156397645739 Granted credit 20 application version 5.80 |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
Here are a couple of failed WUs, both of them Capri14... WU 95938082 WU 95780562 I think it is important to note that this computer has not had a single failure on non-Capri14 WUs, but has a dismal record of about 7 failures for each 8 attempts with Capri14... Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
tazrt Send message Joined: 31 Aug 06 Posts: 6 Credit: 468,735 RAC: 0 |
Hi, I also have some trouble with 3 Capri-WUs. 2 of them are valid (granted Credit for 6-8h runtime = 20 credits) but have gotten stuck: WUID:96055341 and WUID:95800425 1 Capri is invalid: Access Violation (0xc0000005) WUID:95766573 PC is an not oc'ed Q6600 with 2GB RAM Target CPU Runtime:12h. |
Daniel Send message Joined: 4 Nov 05 Posts: 1 Credit: 11,084 RAC: 0 |
still running 5.80 since friday and no errors system: athlon64-3000 win2k sp4 1GB RAM |
Rolly Send message Joined: 31 Dec 05 Posts: 4 Credit: 717,205 RAC: 0 |
I also noticed a first failure on my system, Result 105943441. It seems the unit also hung somewhere during computation. I was surpsised that this non Capri unit is also using the Beta core, I understand using the Beta core for a competition on rosetta@home bur for less urgent workunits I would think it to be better to first test it on ralp? |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
Here are 4 more https://boinc.bakerlab.org/rosetta/result.php?resultid=104541449 https://boinc.bakerlab.org/rosetta/result.php?resultid=104542777 https://boinc.bakerlab.org/rosetta/result.php?resultid=104585618 https://boinc.bakerlab.org/rosetta/result.php?resultid=104621606 I hope that the few CAPRI14 that actually make it through are worth it. Jmarks |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,501,314 RAC: 9,302 |
I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition? I finally disconnected my Q6600 computer from Rosetta and started on other projects. Thus far, no computation errors. Most of my other computers are Core Duo and they report no issues. Is anyone else using an optimized boinc client? Q6600 2GB RAM 500 GB Disk 400 MB Swap < this is very small XP Home Is anyone having this problem with Vista 32 or 64? I will try increasing my swap space to 2GB and see if it corrects the problem. Thx! Paul |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"... I'm running standard Boinc client, Q6600, 2 GB RAM, Swap = 75% of page file, Vista Premium (32). EDIT--> Just noticed inetersting post here. I am starting to wonder if this problem with the failed work units is related to multicore or Q6600 processors. Could it be a memory management issue with the WUs attempting to access the same memory locations creating a lock or race condition? |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"... I have a dual core e6600 4 gig and 70% of mine are bad also. Jmarks |
Jmarks Send message Joined: 16 Jul 07 Posts: 132 Credit: 98,025 RAC: 0 |
Good question ! May be sheer coincidence, but seems we're hearing about this with the Q6600's more than "average"... I bet it has more to do with the fact that we have more memory available then other PC's so we get more of those wu's. Jmarks |
Message boards :
Number crunching :
Problems with Rosetta version 5.80
©2024 University of Washington
https://www.bakerlab.org