Problems with Rosetta version 5.80

Message boards : Number crunching : Problems with Rosetta version 5.80

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 10 · Next

AuthorMessage
Ingemar

Send message
Joined: 28 Feb 06
Posts: 20
Credit: 1,680
RAC: 0
Message 46153 - Posted: 13 Sep 2007, 20:58:05 UTC

Please report problems with this version. Thanks!
ID: 46153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 46198 - Posted: 14 Sep 2007, 15:29:22 UTC

No it isn't.
104430949 94762978 9 Sep 2007
Jmarks
ID: 46198 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 46202 - Posted: 14 Sep 2007, 16:12:51 UTC - in response to Message 46153.  

Please report problems with this version. Thanks!


5.80 needs a lot more memory than previous Betas. BOINC says waiting for memory on a 512MB linux system with 2 CPUs. This did not happen on previous versions of Rosetta. Is this a permanent change? One task runs but the other (second set of threads) below is waiting for memory.

%CPU %MEM VSZ RSS STAT START TIME COMMAND
100 43.5 356264 224188 RN 10:39 87:56 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 43.5 356264 224188 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 43.5 356264 224188 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 43.5 356264 224188 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu

0.1 37.7 320764 194128 SN 10:39 0:06 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 37.7 320764 194128 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 37.7 320764 194128 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu
0.0 37.7 320764 194128 SN 10:39 0:00 rosetta_beta_5.80_i686-pc-linux-gnu

ID: 46202 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Wits End

Send message
Joined: 16 Apr 07
Posts: 4
Credit: 29,477
RAC: 0
Message 46203 - Posted: 14 Sep 2007, 16:23:08 UTC - in response to Message 46153.  
Last modified: 14 Sep 2007, 16:25:52 UTC

First two WUs under v5.80: first validated (95717876), second failed (95776999)!
ID: 46203 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 46221 - Posted: 14 Sep 2007, 19:17:29 UTC - in response to Message 46203.  

First two WUs under v5.80: first validated (95717876), second failed (95776999)!


Perhaps it is only coincidence, but I notice that the failed WU was a Capri WU, the successful WU was not...

Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 46221 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 46222 - Posted: 14 Sep 2007, 19:18:01 UTC

Hi!

two validate errors lately. Is there a specail reason for that?

https://boinc.bakerlab.org/rosetta/result.php?resultid=105570644

https://boinc.bakerlab.org/rosetta/result.php?resultid=104716132
ID: 46222 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 46223 - Posted: 14 Sep 2007, 19:21:39 UTC
Last modified: 14 Sep 2007, 19:21:57 UTC

I moved Rayburner's post here. One of thos was 5.78 the other was 5.80.
Rosetta Moderator: Mod.Sense
ID: 46223 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile anders n

Send message
Joined: 19 Sep 05
Posts: 403
Credit: 537,991
RAC: 0
Message 46224 - Posted: 14 Sep 2007, 20:10:13 UTC

Validate error on this Wu.

Anders n
ID: 46224 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mark Henderson

Send message
Joined: 24 May 06
Posts: 9
Credit: 643,001
RAC: 0
Message 46230 - Posted: 14 Sep 2007, 22:22:11 UTC
Last modified: 14 Sep 2007, 22:24:49 UTC

I had a compute error today on 5.80 and a watchdog termination on another yesterday using 5.78 on my AMD X2 4800. I have ran rosetta a long time and this is the first 2 errors I remember.
ID: 46230 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46234 - Posted: 14 Sep 2007, 23:22:15 UTC
Last modified: 14 Sep 2007, 23:25:45 UTC

Here we go again... (1he8__BOINC_CAPRI14_DOCK_FIXBACKBONE_POSE_LOOPS-1he8_-plexinmonomer__2083_1421_0)

I did look at the screen at about 3 hours... i think it said model 1, step 513, the percentage indicator was 95.9x% - 96.xx% and increasing. Nothing was visibly moving in any of the graphic representations.

Watchdog shut down...

~60+ credits requested for ~4 hours on a single core of a Core2Quad, 20 credits granted...
ID: 46234 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 46239 - Posted: 15 Sep 2007, 1:06:06 UTC - in response to Message 46234.  

~60+ credits requested for ~4 hours on a single core of a Core2Quad, 20 credits granted...


OK, great! I'm glad you were able to catch one. Assuming that others behave the same way (a bit of a stretch with only a single one observed, but it's all we have to go by)... the fact that it is still on model one is the reason why the task fails and only 20 credits are granted.

If you had completed several models, then (at least the design to my knowledge is) these completed results would be reported back and credit issued for them. So that was one of my oustanding questions was "is the partial reporting of tasks that run for a while and then fail working properly?" And, based on your observation, it sounds like it is working as well as I would have expected. But the long running single models are basically exhibiting a worst-case scenario where extensive time is spent and only 20 credits are issued.

Wow, your task shows the score was stuck for 1,800 seconds. I take it Rhiju has increased the timeout for the watchdog.
Rosetta Moderator: Mod.Sense
ID: 46239 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Ingemar

Send message
Joined: 28 Feb 06
Posts: 20
Credit: 1,680
RAC: 0
Message 46241 - Posted: 15 Sep 2007, 1:52:54 UTC

It appears that some of the Capri docking runs get stuck and gets terminated by the watchdog. The watchdog seems to do its job, the problem seem to be the simulations. This is the first time we do large scale tests on some new simulation modes and we will have to analyze why some runs get stuck/crashes. CAPRI ( Critical Assesment of Protein Interactions) is a competion where we try to predict the structure of protein-protein complexes. We have a deadline for submission of our models to this competion coming up soon and thats why you see so many Capri-something jobs. they will soon be out of the queue.

And yes we did increase the watchdog timeout.
ID: 46241 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46242 - Posted: 15 Sep 2007, 2:19:06 UTC

Again, I'm in it for the science, not the credits. So, if the info I am able to provide is helpful, great. Hope it helps for this round (or the next) of the competition (good luck!)...
ID: 46242 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,501,314
RAC: 9,302
Message 46245 - Posted: 15 Sep 2007, 3:09:13 UTC

I continue to get computation errors running Rosetta 5.80

I had very few of these errors over the last few months and recently I have received many of them.

What can I do correct this condition?

thx

PRaney
Thx!

Paul

ID: 46245 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46246 - Posted: 15 Sep 2007, 3:16:28 UTC - in response to Message 46245.  
Last modified: 15 Sep 2007, 3:17:18 UTC

Seems to be a bunch of:

Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x7C812A5B

Engaging BOINC Windows Runtime Debugger...




I continue to get computation errors running Rosetta 5.80

I had very few of these errors over the last few months and recently I have received many of them.

What can I do correct this condition?

thx

PRaney

ID: 46246 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 46251 - Posted: 15 Sep 2007, 4:56:20 UTC
Last modified: 15 Sep 2007, 4:57:09 UTC

Jim's post refers to this invalid result
ID: 46251 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael B
Avatar

Send message
Joined: 13 Feb 06
Posts: 19
Credit: 306,566
RAC: 0
Message 46254 - Posted: 15 Sep 2007, 6:38:58 UTC


One of my BOINC Managers won't let me attach to rosetta...keeps saying project is offline.
ID: 46254 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 46262 - Posted: 15 Sep 2007, 11:49:54 UTC

I got 0 credits for this wu: too many results:

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=94605647
ID: 46262 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 29 Oct 05
Posts: 193
Credit: 66,501,314
RAC: 9,302
Message 46263 - Posted: 15 Sep 2007, 12:09:25 UTC

Just noticed each WU is consuming about 248MB of RAM. With 2 GB of RAM, this was not a problem until the Q6600 went into the system. 4 WUs are consuming 1/2 of the system memory.

What changed in 5.8 to cause the massive memory consumption and all of the computation errors? Can you do anything to pull in the memory requirements? Did the previous versions hold memory requirements at about 128MB per WU?


Thx!

Paul

ID: 46263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Jmarks
Avatar

Send message
Joined: 16 Jul 07
Posts: 132
Credit: 98,025
RAC: 0
Message 46264 - Posted: 15 Sep 2007, 13:03:09 UTC - in response to Message 46263.  
Last modified: 15 Sep 2007, 13:04:49 UTC

Just noticed each WU is consuming about 248MB of RAM. With 2 GB of RAM, this was not a problem until the Q6600 went into the system. 4 WUs are consuming 1/2 of the system memory.

What changed in 5.8 to cause the massive memory consumption and all of the computation errors? Can you do anything to pull in the memory requirements? Did the previous versions hold memory requirements at about 128MB per WU?



Go into Your Account and Edit
General preferences
Disk and memory usage
Use at most - 50% of memory when computer is in use
*** Lower this to what you want.

Ps This post is not about 5.80 you should start a seperate thread in 'Number Crunching'.
Jmarks
ID: 46264 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 . . . 10 · Next

Message boards : Number crunching : Problems with Rosetta version 5.80



©2024 University of Washington
https://www.bakerlab.org