Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux)
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next
Author | Message |
---|---|
Michael G.R. Send message Joined: 11 Nov 05 Posts: 264 Credit: 11,247,510 RAC: 0 |
I seem to have a memory leak with Rosetta Beta 5.85. When the work-unit was processing, my system showed the process using 168MB of memory and total memory utilization of 1.3GB. When I "suspended" it, my total memory utilization dropped to 800MB. I think I had something similar yesterday. I'd check now, but I'm running a 5.82 unit that doesn't have that problem... |
Mikkie Send message Joined: 26 Nov 07 Posts: 2 Credit: 2,770 RAC: 0 |
Rosetta Beta 5.85 WU's are using lot's of Memory and let my PC's crash. I cancel all 5.85 jobs. When not possible to cancel, Rosetta will be shut off until this problem is solved. When running those I also use way too much VM [1.3 GB] I've got 12 beta 5.85 wu's left. What to do? Delete and wait for better times or suspend [deadlines? on the horizon] them till they have a fix for it? |
MattDavis Send message Joined: 22 Sep 05 Posts: 206 Credit: 1,377,748 RAC: 0 |
Rosetta Beta 5.85 WU's are using lot's of Memory and let my PC's crash. I cancel all 5.85 jobs. When not possible to cancel, Rosetta will be shut off until this problem is solved. If you have enough VM then just let them run. That's what I did. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
What to do? Delete and wait for better times or suspend [deadlines? on the horizon] them till they have a fix for it? Unless it is impacting your machine or your user experience, just letting them run is the simplest solution. And certain tasks take more memory then others, so you can't presume all 5.85 tasks will try to use that much memory. They are running against different proteins, using different approaches, and it is only some specific combinations of the two that expose the memory issues. Just to be clear, waiting for a fix is not one of the options. The way BOINC works, the programs that will be used to run a given task are defined at the time the task is created on the server. So, any fix could only help future tasks, not those already out. Rosetta Moderator: Mod.Sense |
AM Send message Joined: 15 Jul 06 Posts: 7 Credit: 522,822 RAC: 6 |
How can you pick which version of Rosetta to run? E.g. 5.82 vs. 5.85 |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
How can you pick which version of Rosetta to run? E.g. 5.82 vs. 5.85 You can't. When you connect to the project, your computer is assigned tasks from either of the two active application versions, based on the needs of the project and the attributes (mostly memory) of your computer. Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
Keith T. Send message Joined: 1 Mar 07 Posts: 58 Credit: 34,135 RAC: 0 |
My last 2 WU's have ended in Compute Errors after significant run times. The second one went 90 minutes past my requested run time. https://boinc.bakerlab.org/rosetta/result.php?resultid=122980874 https://boinc.bakerlab.org/rosetta/result.php?resultid=122892647 Task ID 122892647 Name CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2428442_0 Workunit 111712477 Created 25 Nov 2007 3:50:14 UTC Sent 25 Nov 2007 5:56:18 UTC Received 27 Nov 2007 6:36:15 UTC Server state Over Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) Computer ID 428259 Report deadline 5 Dec 2007 5:56:18 UTC CPU time 5829.65625 stderr out <core_client_version>5.10.7</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 850599 # cpu_run_time_pref: 7200 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010 Engaging BOINC Windows Runtime Debugger... Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010 Engaging BOINC Windows Runtime Debugger... </stderr_txt> ]]> Validate state Invalid Claimed credit 15.6741180286706 Granted credit 0 application version 5.82 ------------------------------------------- Task ID 122980874 Name 1gidA_BOINC_RNA_ABINITIO_SAVE_ALL_OUT_RNA_CONTACT_RNA_LONG_RANGE_CONTACT_RNA_SASA-1gidA-_2330_11990_0 Workunit 111793856 Created 25 Nov 2007 12:57:46 UTC Sent 25 Nov 2007 15:12:19 UTC Received 28 Nov 2007 15:25:02 UTC Server state Over Outcome Client error Client state Compute error Exit status 1 (0x1) Computer ID 428259 Report deadline 5 Dec 2007 15:12:19 UTC CPU time 12625.0625 stderr out <core_client_version>5.10.7</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # random seed: 1787552 # cpu_run_time_pref: 7200 # random seed: 1787552 ERROR:: Exit from: .pose.cc line: 3910 </stderr_txt> ]]> Validate state Invalid Claimed credit 33.8586815867762 Granted credit 0 application version 5.85 |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
{...} Some of the current workunits have HUGE models that take hours to generate a single decoy. When the manager completes each decoy it takes a look at the amount of time left in the runtime preference then takes its best guess as to whether to start another model or terminate the run. I had a success recently that went about 270 minutes under my requested run time: WU 111804144 This is not a killer computer, but it's no slouch either (Athlon64x2 3800+, 3GB RAM). Still, it took an average of nearly 5 hours to generate each decoy. When it completed the 4th decoy, the BOINC manager called it a day rather than start on the 5th. I realize this reply does not address the question of why these WU's failed, and I apologize for that shortcoming. Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
Dr Who Fan Send message Joined: 28 May 06 Posts: 70 Credit: 268,055 RAC: 300 |
they just keep crashing... https://boinc.bakerlab.org/rosetta/result.php?resultid=122996820 Exit status 255 (0xff) CPU time 1.125 stderr out <core_client_version>5.10.28</core_client_version> <![CDATA[ <message> The extended attributes are inconsistent. (0xff) - exit code 255 (0xff) </message> <stderr_txt> # cpu_run_time_pref: 7200 </stderr_txt> ]]> |
Dr Who Fan Send message Joined: 28 May 06 Posts: 70 Credit: 268,055 RAC: 300 |
... dupe |
Rhiju Volunteer moderator Send message Joined: 8 Jan 06 Posts: 223 Credit: 3,546 RAC: 0 |
Sorry, these "1gid" workunits have been canceled... looks like there are a few particular platforms where they consistently crash. Thanks for posting! |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
Sorry, these "1gid" workunits have been canceled... looks like there are a few particular platforms where they consistently crash. Thanks for posting! so abort 1gid WU's or just try? |
David Emigh Send message Joined: 13 Mar 06 Posts: 158 Credit: 417,178 RAC: 0 |
so abort 1gid WU's or just try? For myself, since my system isn't crashing on these, I will continue to run whatever is in my queue. Rosie, Rosie, she's our gal, If she can't do it, no one shall! |
kratko Send message Joined: 16 Aug 06 Posts: 2 Credit: 338,009 RAC: 0 |
Since I'm running it on one production server, I can't afford to load it more than I did recentrly. I switched to seti@home, for this machine. (2x3,6 GHz hyperthreading, which really computes faster than two physical cpus). Just to let you know... |
Ed and Harriet Griffith Send message Joined: 17 Sep 05 Posts: 39 Credit: 1,901,974 RAC: 727 |
When I try to connect I get, "11/29/2007 11:35:29 AM|rosetta@home|Message from server: Project encountered internal error: shared memory" |
AM Send message Joined: 15 Jul 06 Posts: 7 Credit: 522,822 RAC: 6 |
I see things are back up. I hope the next version is a little easier on the page file. I like using R@H, but it's sapping resources left and right. |
Kao_Valin Send message Joined: 26 Jun 07 Posts: 2 Credit: 693,586 RAC: 0 |
I've never had a problem until the new version 5.85. It keeps crashing my system. I've got R@H running on all four cores at 100% with 2GB RAM, but it still decided to use over 2GB of paging file. It is rediculous how much paging file this is using. I was pulling well over 2100 credit/day now that number is going into the toilet. It'll crash and wont get restarted again for days (I'm not home during the week). You guys are losing countless hours of WU with this buggy hog you've unleashed on us. |
Gary D Send message Joined: 5 Jun 06 Posts: 2 Credit: 11,265 RAC: 0 |
After reading some post I am having the same problem with 5.85. C++ errors, virtual memory, system wants to to terminate in an unusual way. I'm running AVG antivirus and it is also having problems with this app. What's going on. I ran R@H for several years and never had any problems.When are you guys going to get this fixed. I aborted several units but I read that I need to just suspend so I'll be doing that for awhile until a work around is forthcoming. Hope its soon cause I like to keep this project running as long as I can without my computer crashing. I'm just have 256M of ram ( I know I need a new computer) but there are lots of us out there who just don't have the money for an upgrade right now |
TA_JC Send message Joined: 7 Nov 05 Posts: 13 Credit: 6,953,387 RAC: 1,846 |
I'm baffled here. I've had maybe 2 WUs exit with the "unusual way" message lately, but since the project came back up the other day I can't seem to complete any WUs on my main machine. My other machines aren't having any problems at all. What gives??? |
TA_JC Send message Joined: 7 Nov 05 Posts: 13 Credit: 6,953,387 RAC: 1,846 |
I'm baffled here. I've had maybe 2 WUs exit with the "unusual way" message lately, but since the project came back up the other day I can't seem to complete any WUs on my main machine. My other machines aren't having any problems at all. What gives??? Finally got some 5.82s, dumped the rest of the 5.85s, and am crunching again :) |
Message boards :
Number crunching :
Problems with Rosetta version 5.85 (or 5.86 for linux)
©2024 University of Washington
https://www.bakerlab.org