Problems with Rosetta version 5.85 (or 5.86 for linux)

Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux)

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

AuthorMessage
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 49108 - Posted: 27 Nov 2007, 19:44:58 UTC - in response to Message 49101.  

I seem to have a memory leak with Rosetta Beta 5.85. When the work-unit was processing, my system showed the process using 168MB of memory and total memory utilization of 1.3GB. When I "suspended" it, my total memory utilization dropped to 800MB.


I think I had something similar yesterday. I'd check now, but I'm running a 5.82 unit that doesn't have that problem...
ID: 49108 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mikkie

Send message
Joined: 26 Nov 07
Posts: 2
Credit: 2,770
RAC: 0
Message 49124 - Posted: 28 Nov 2007, 12:56:48 UTC - in response to Message 49091.  
Last modified: 28 Nov 2007, 13:06:27 UTC

Rosetta Beta 5.85 WU's are using lot's of Memory and let my PC's crash. I cancel all 5.85 jobs. When not possible to cancel, Rosetta will be shut off until this problem is solved.

When running those I also use way too much VM [1.3 GB] I've got 12 beta 5.85 wu's left. What to do? Delete and wait for better times or suspend [deadlines? on the horizon] them till they have a fix for it?
ID: 49124 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 49125 - Posted: 28 Nov 2007, 13:12:38 UTC - in response to Message 49124.  

Rosetta Beta 5.85 WU's are using lot's of Memory and let my PC's crash. I cancel all 5.85 jobs. When not possible to cancel, Rosetta will be shut off until this problem is solved.

When running those I also use way too much VM [1.3 GB] I've got 12 beta 5.85 wu's left. What to do? Delete and wait for better times or suspend [deadlines? on the horizon] them till they have a fix for it?


If you have enough VM then just let them run. That's what I did.
ID: 49125 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49127 - Posted: 28 Nov 2007, 14:18:34 UTC - in response to Message 49124.  

What to do? Delete and wait for better times or suspend [deadlines? on the horizon] them till they have a fix for it?


Unless it is impacting your machine or your user experience, just letting them run is the simplest solution. And certain tasks take more memory then others, so you can't presume all 5.85 tasks will try to use that much memory. They are running against different proteins, using different approaches, and it is only some specific combinations of the two that expose the memory issues.

Just to be clear, waiting for a fix is not one of the options. The way BOINC works, the programs that will be used to run a given task are defined at the time the task is created on the server. So, any fix could only help future tasks, not those already out.
Rosetta Moderator: Mod.Sense
ID: 49127 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AM

Send message
Joined: 15 Jul 06
Posts: 7
Credit: 522,822
RAC: 6
Message 49128 - Posted: 28 Nov 2007, 14:46:06 UTC

How can you pick which version of Rosetta to run? E.g. 5.82 vs. 5.85
ID: 49128 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 49129 - Posted: 28 Nov 2007, 15:05:30 UTC - in response to Message 49128.  

How can you pick which version of Rosetta to run? E.g. 5.82 vs. 5.85


You can't.

When you connect to the project, your computer is assigned tasks from either of the two active application versions, based on the needs of the project and the attributes (mostly memory) of your computer.
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 49129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith T.
Avatar

Send message
Joined: 1 Mar 07
Posts: 58
Credit: 34,135
RAC: 0
Message 49131 - Posted: 28 Nov 2007, 15:42:56 UTC
Last modified: 28 Nov 2007, 15:44:51 UTC

My last 2 WU's have ended in Compute Errors after significant run times. The second one went 90 minutes past my requested run time.

https://boinc.bakerlab.org/rosetta/result.php?resultid=122980874
https://boinc.bakerlab.org/rosetta/result.php?resultid=122892647

Task ID 122892647
Name CNTRL_01ABRELAX_SAVE_ALL_OUT_-1ubi_-_filters_1782_2428442_0
Workunit 111712477
Created 25 Nov 2007 3:50:14 UTC
Sent 25 Nov 2007 5:56:18 UTC
Received 27 Nov 2007 6:36:15 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status -1073741819 (0xc0000005)
Computer ID 428259
Report deadline 5 Dec 2007 5:56:18 UTC
CPU time 5829.65625
stderr out <core_client_version>5.10.7</core_client_version>
<![CDATA[
<message>
- exit code -1073741819 (0xc0000005)
</message>
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 850599
# cpu_run_time_pref: 7200


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x7C9105F8 read attempt to address 0x00150010

Engaging BOINC Windows Runtime Debugger...


</stderr_txt>
]]>
Validate state Invalid
Claimed credit 15.6741180286706
Granted credit 0
application version 5.82

-------------------------------------------

Task ID 122980874
Name 1gidA_BOINC_RNA_ABINITIO_SAVE_ALL_OUT_RNA_CONTACT_RNA_LONG_RANGE_CONTACT_RNA_SASA-1gidA-_2330_11990_0
Workunit 111793856
Created 25 Nov 2007 12:57:46 UTC
Sent 25 Nov 2007 15:12:19 UTC
Received 28 Nov 2007 15:25:02 UTC
Server state Over
Outcome Client error
Client state Compute error
Exit status 1 (0x1)
Computer ID 428259
Report deadline 5 Dec 2007 15:12:19 UTC
CPU time 12625.0625
stderr out <core_client_version>5.10.7</core_client_version>
<![CDATA[
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# random seed: 1787552
# cpu_run_time_pref: 7200
# random seed: 1787552
ERROR:: Exit from: .pose.cc line: 3910

</stderr_txt>
]]>
Validate state Invalid
Claimed credit 33.8586815867762
Granted credit 0
application version 5.85
ID: 49131 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 49132 - Posted: 28 Nov 2007, 16:30:28 UTC - in response to Message 49131.  
Last modified: 28 Nov 2007, 16:48:45 UTC

{...}
The second one went 90 minutes past my requested run time.
{...}


Some of the current workunits have HUGE models that take hours to generate a single decoy.

When the manager completes each decoy it takes a look at the amount of time left in the runtime preference then takes its best guess as to whether to start another model or terminate the run.

I had a success recently that went about 270 minutes under my requested run time: WU 111804144

This is not a killer computer, but it's no slouch either (Athlon64x2 3800+, 3GB RAM). Still, it took an average of nearly 5 hours to generate each decoy. When it completed the 4th decoy, the BOINC manager called it a day rather than start on the 5th.

I realize this reply does not address the question of why these WU's failed, and I apologize for that shortcoming.
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 49132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 70
Credit: 268,007
RAC: 313
Message 49135 - Posted: 28 Nov 2007, 16:58:03 UTC
Last modified: 28 Nov 2007, 16:59:15 UTC

they just keep crashing...

https://boinc.bakerlab.org/rosetta/result.php?resultid=122996820

Exit status 255 (0xff)
CPU time 1.125
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<message>
The extended attributes are inconsistent. (0xff) - exit code 255 (0xff)
</message>
<stderr_txt>
# cpu_run_time_pref: 7200

</stderr_txt>
]]>
ID: 49135 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 70
Credit: 268,007
RAC: 313
Message 49137 - Posted: 28 Nov 2007, 17:00:23 UTC
Last modified: 28 Nov 2007, 17:03:42 UTC

... dupe
ID: 49137 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rhiju
Volunteer moderator

Send message
Joined: 8 Jan 06
Posts: 223
Credit: 3,546
RAC: 0
Message 49146 - Posted: 28 Nov 2007, 19:44:44 UTC - in response to Message 49135.  
Last modified: 28 Nov 2007, 19:45:08 UTC

Sorry, these "1gid" workunits have been canceled... looks like there are a few particular platforms where they consistently crash. Thanks for posting!

ID: 49146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49149 - Posted: 28 Nov 2007, 20:56:14 UTC - in response to Message 49146.  

Sorry, these "1gid" workunits have been canceled... looks like there are a few particular platforms where they consistently crash. Thanks for posting!


so abort 1gid WU's or just try?
ID: 49149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 49150 - Posted: 28 Nov 2007, 21:01:42 UTC - in response to Message 49149.  

so abort 1gid WU's or just try?


For myself, since my system isn't crashing on these, I will continue to run whatever is in my queue.

Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 49150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kratko

Send message
Joined: 16 Aug 06
Posts: 2
Credit: 338,009
RAC: 0
Message 49167 - Posted: 29 Nov 2007, 0:14:18 UTC

Since I'm running it on one production server, I can't afford to load it more than I did recentrly. I switched to seti@home, for this machine. (2x3,6 GHz hyperthreading, which really computes faster than two physical cpus).
Just to let you know...
ID: 49167 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ed and Harriet Griffith
Avatar

Send message
Joined: 17 Sep 05
Posts: 39
Credit: 1,901,974
RAC: 727
Message 49225 - Posted: 29 Nov 2007, 16:42:41 UTC

When I try to connect I get, "11/29/2007 11:35:29 AM|rosetta@home|Message from server: Project encountered internal error: shared memory"


ID: 49225 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile AM

Send message
Joined: 15 Jul 06
Posts: 7
Credit: 522,822
RAC: 6
Message 49238 - Posted: 30 Nov 2007, 20:50:10 UTC

I see things are back up. I hope the next version is a little easier on the page file. I like using R@H, but it's sapping resources left and right.
ID: 49238 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Kao_Valin

Send message
Joined: 26 Jun 07
Posts: 2
Credit: 693,586
RAC: 0
Message 49256 - Posted: 1 Dec 2007, 2:43:14 UTC
Last modified: 1 Dec 2007, 2:45:25 UTC

I've never had a problem until the new version 5.85. It keeps crashing my system. I've got R@H running on all four cores at 100% with 2GB RAM, but it still decided to use over 2GB of paging file. It is rediculous how much paging file this is using.

I was pulling well over 2100 credit/day now that number is going into the toilet. It'll crash and wont get restarted again for days (I'm not home during the week). You guys are losing countless hours of WU with this buggy hog you've unleashed on us.
ID: 49256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gary D

Send message
Joined: 5 Jun 06
Posts: 2
Credit: 11,265
RAC: 0
Message 49258 - Posted: 1 Dec 2007, 4:19:15 UTC

After reading some post I am having the same problem with 5.85. C++ errors, virtual memory, system wants to to terminate in an unusual way. I'm running AVG antivirus and it is also having problems with this app. What's going on. I ran R@H for several years and never had any problems.When are you guys going to get this fixed. I aborted several units but I read that I need to just suspend so I'll be doing that for awhile until a work around is forthcoming. Hope its soon cause I like to keep this project running as long as I can without my computer crashing. I'm just have 256M of ram ( I know I need a new computer) but there are lots of us out there who just don't have the money for an upgrade right now
ID: 49258 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_JC

Send message
Joined: 7 Nov 05
Posts: 13
Credit: 6,953,387
RAC: 1,846
Message 49275 - Posted: 1 Dec 2007, 15:53:14 UTC

I'm baffled here. I've had maybe 2 WUs exit with the "unusual way" message lately, but since the project came back up the other day I can't seem to complete any WUs on my main machine. My other machines aren't having any problems at all. What gives???
ID: 49275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
TA_JC

Send message
Joined: 7 Nov 05
Posts: 13
Credit: 6,953,387
RAC: 1,846
Message 49309 - Posted: 2 Dec 2007, 1:25:54 UTC - in response to Message 49275.  

I'm baffled here. I've had maybe 2 WUs exit with the "unusual way" message lately, but since the project came back up the other day I can't seem to complete any WUs on my main machine. My other machines aren't having any problems at all. What gives???



Finally got some 5.82s, dumped the rest of the 5.85s, and am crunching again :)
ID: 49309 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 8 · Next

Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux)



©2024 University of Washington
https://www.bakerlab.org