Message boards : Number crunching : minirosetta v1.19 bug thread
Author | Message |
---|---|
James Thompson Send message Joined: 13 Oct 05 Posts: 46 Credit: 186,109 RAC: 0 |
We have an updated version of minirosetta v1.19 which should fix some of the stability issues with v1.15. Post minirosetta v1.19 bugs here. |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
Here is an access violation error after 68,000+ seconds of CPU time: Reason: Access Violation (0xc0000005) at address 0x005C3051 write attempt to address 0x00000024 There is a large and detailed debugger message. Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
glaesum Send message Joined: 16 Oct 06 Posts: 21 Credit: 508,632 RAC: 0 |
things must be going pretty well as the thread is so quiet... good news too with win98 OS - the 1.19 app is running, completing and validating although an error message is still getting thrown up. no idea if this matters or not. on all three wus completed so far this is the message: Task ID 161439715 Name score13_hb_envtest62_A_1ctf__3171_14411_0 Workunit 147493846 Received 8 May 2008 11:10:33 UTC Outcome Success <core_client_version>5.10.30</core_client_version> <![CDATA[ <stderr_txt> AllocateAndInitializeSid Error 120 failed to create shared mem segment # cpu_run_time_pref: 14400 ====================================================== DONE :: 1 starting structures 13875.8 cpu seconds This process generated 3 decoys from 3 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> work unit ID nos are: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=147390671 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=147405464 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=147493846 |
radu Send message Joined: 7 May 08 Posts: 4 Credit: 66,301 RAC: 0 |
I get a crash when I detach from the project. I'm not sure if this is a minirosetta bug. Log messages seem to show that minirosetta was running when the crash occurred. I'm running Gentoo linux 2.6.24-r7. boinc-5.10.45 Logs: 08-May-2008 16:07:47 [rosetta@home] Starting task fa_max_dis_9-2vik_-test_2008-5-6_3222_134_0 using minirosetta version 119 08-May-2008 16:09:29 [rosetta@home] Resetting project 08-May-2008 16:09:30 [rosetta@home] Detaching from project SIGSEGV: segmentation violation Stack trace (9 frames): /usr/bin/boinc_client[0x46cbf9] /lib/libpthread.so.0[0x2aba6d950ed0] /usr/bin/boinc_client[0x40afec] /usr/bin/boinc_client[0x43060e] /usr/bin/boinc_client[0x4310bc] /usr/bin/boinc_client[0x422319] /usr/bin/boinc_client[0x4516a4] /lib/libc.so.6(__libc_start_main+0xf4)[0x2aba6ddfdb74] /usr/bin/boinc_client(__gxx_personality_v0+0x1b1)[0x4048f9] Exiting... |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
I get a crash when I detach from the project. It is quite possible (and logical IMO) that the client forcibly terminates all related processes upon detach. Otherwise it could not clean up client_state.xml, slots/ and projects/. Peter |
radu Send message Joined: 7 May 08 Posts: 4 Credit: 66,301 RAC: 0 |
I get a crash when I detach from the project. I'm new to BOINC so I don't know how the detach operation is handled. I don't use the gui manager and boinc_client appears to be the only BOINC related process running: $ ps -e | grep boinc 6279 ? 00:00:05 boinc_client Anyway killing related processes should not generate segmentation faults, so it's clearly an error in boinc_client. I don't know if it has anything to do with minirosetta though. |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
I get a crash when I detach from the project. I'm sorry, you are right. I was thinking on Rosetta crashing and omitted that actually the client crashed. Off course it should not. (And actually the application should also exit cleanly if asked to by the client.) I don't know if it has anything to do with minirosetta though. It should not. Which client, 5.10.45? Peter |
radu Send message Joined: 7 May 08 Posts: 4 Credit: 66,301 RAC: 0 |
It should not. Which client, 5.10.45? yes, 5.10.45 |
Rob Send message Joined: 16 Oct 06 Posts: 3 Credit: 121,375 RAC: 0 |
Someone forgot to post the Minirosetta 1.19 details on the version thread. |
Alexander Klauer Send message Joined: 10 Mar 08 Posts: 3 Credit: 110,308 RAC: 0 |
Hi, I switched off my computer yesterday, in the middle (maybe 60%) of a task. When I switched it back on today, I got Fri 09 May 2008 09:51:30 AM CEST|rosetta@home|URL: https://boinc.bakerlab.org/rosetta/; Computer ID: 762923; location: (none); project prefs: default Fri 09 May 2008 09:51:31 AM CEST|rosetta@home|Restarting task fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0 using minirosetta version 119 Fri 09 May 2008 09:52:00 AM CEST|rosetta@home|Computation for task fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0 finished Fri 09 May 2008 09:52:01 AM CEST|rosetta@home|Starting lambda_repressor_folding_3191_8370_0 Fri 09 May 2008 09:52:01 AM CEST|rosetta@home|Starting task lambda_repressor_folding_3191_8370_0 using rosetta_beta version 596 Fri 09 May 2008 09:52:03 AM CEST|rosetta@home|Started upload of fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0_0 Fri 09 May 2008 09:52:14 AM CEST|rosetta@home|Finished upload of fa_max_dis_9-1ptq_-test_2008-5-6_3222_268_0_0 so the task finished virtually immediately after restart. When I switched my computer on yesterday morning, I also had some task crunching at 0%. Back then I believed an old task had been restarted from the beginning due to some fluke, but now it seems more likely that the same thing as today has happened. To me, it seems too much of a coincidence of a task interrupted in the middle being finished immediately after resume, twice in a row. |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
Really All access violations https://boinc.bakerlab.org/rosetta/result.php?resultid=161740698 https://boinc.bakerlab.org/rosetta/result.php?resultid=160201341 https://boinc.bakerlab.org/rosetta/result.php?resultid=159794241 https://boinc.bakerlab.org/rosetta/result.php?resultid=160129454 https://boinc.bakerlab.org/rosetta/result.php?resultid=160185394 https://boinc.bakerlab.org/rosetta/result.php?resultid=161332559 https://boinc.bakerlab.org/rosetta/result.php?resultid=159408171 |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 17 Sep 05 Posts: 18 Credit: 40,071 RAC: 0 |
All those crashes are a result of an out of memory error. ----- Rom My Blog |
Ian_D Send message Joined: 21 Sep 05 Posts: 55 Credit: 4,216,173 RAC: 0 |
My latest weirdness <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> Maximum memory exceeded </message> ]]> resultid=161607307 |
Quidgydog Send message Joined: 28 Sep 06 Posts: 3 Credit: 499,462 RAC: 0 |
Having exactly the same issue as I was having with the v1.15 WU. WU just sits there, CPU time not running, no progress. Log file...... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C82A714 read attempt to address 0x00D767E5 Engaging BOINC Windows Runtime Debugger... I'm detaching this computer until this is resolved. |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
With 4Gb of memory what do I do to put it right? |
Pepo Send message Joined: 28 Sep 05 Posts: 115 Credit: 101,358 RAC: 0 |
All those crashes are a result of an out of memory error. You could once get out of memory with also 64 GB of RAM... (Do you know the sentence about 64 KB of RAM?) How much pagefile do you have available there? Any other memory load? Like other projects' applications, preempted and waiting in memory? Take occasionally a look into Task Manager, Performance tab - what are the Commit Charge values like? If the Total (or Peak) anytimes reach the Limit, that's it. You're running at least 7 projects on the host, each Rosetta can require up to 600-900 MB, CPDN at least some 200-300 MB, other projects as well something, and it is a quad... Peter |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
All those crashes are a result of an out of memory error. Yes, I understand but my commit charge is a fraction of of my available charge 10% at the moment. I have increased my page file to 6GB with a total memory of 4GB on Win XP Pro 64 It just strikes me that the very kowledgeable Rom is arrogant enough to point to the cause without indicating any sort of a solution. |
alpha Send message Joined: 4 Nov 06 Posts: 27 Credit: 1,550,107 RAC: 0 |
This work unit finished earlier than expected, but with no errors: https://boinc.bakerlab.org/rosetta/result.php?resultid=161362748 Claimed 130.48, granted 32.86. :( |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Fat Loss, I'm guessing that the error is an indication that the task grew to exceed the maximum memory it was configured for, and so was terminated by BOINC. And so, regardless of your machine's physical configuration or % memory used to BOINC etc. etc. it still would have failed. So that would tend to indicate a logic problem in Mini, or perhaps a task that should be created with a higher memory maximum allowed. We'll have to wait to see what DK finds. Rosetta Moderator: Mod.Sense |
Rom Walton (BOINC) Volunteer moderator Project developer Send message Joined: 17 Sep 05 Posts: 18 Credit: 40,071 RAC: 0 |
In this particular case there isn't anything that any of us can do, I've passed the info on to the MiniRosetta devs. Basically MiniRosetta is a 32-bit process, and generally 32-bit processes are limited to 2GB of user-mode memory. MiniRosetta hit that limit and so when it asked for more the OS said NO, leading to the crash. The sign that this sort of problem has occurred is: LoadLibraryA( dbghelp.dll ): GetLastError = 8 and - Virtual Memory Usage - Sorry for not explaining the situation sooner, I was heading for bed and I started thinking about how I was going to help the devs debug this problem in the wild if they are unable to reproduce this issue in the lab. At present there isn't anything in the BOINC application framework that'll help them debug this in the wild. ----- Rom My Blog |
Message boards :
Number crunching :
minirosetta v1.19 bug thread
©2024 University of Washington
https://www.bakerlab.org