Problems with Rosetta version 5.85 (or 5.86 for linux)

Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux)

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

AuthorMessage
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 49313 - Posted: 2 Dec 2007, 1:53:19 UTC

I don't know if it's the app or the W.U's but these are using alot

of memory also. Up to 98% system resources.

w0x7_1_MolecularRep_1_w0x7_1_ffas03-1-2b0v_StructuralGenomics_a_2336_53689_0

using rosetta_beta version 585.

Pete.


ID: 49313 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49325 - Posted: 2 Dec 2007, 14:46:12 UTC - in response to Message 49313.  
Last modified: 2 Dec 2007, 14:58:56 UTC

I don't know if it's the app or the W.U's but these are using alot

of memory also. Up to 98% system resources.

w0x7_1_MolecularRep_1_w0x7_1_ffas03-1-2b0v_StructuralGenomics_a_2336_53689_0

using rosetta_beta version 585.

Pete.



same with my, it is allowed to use 90% of cpu, but my cpu is running at 100%, with msn and wmp.
my normal memory is using 190000kb out of 1024 mb and my pagefile is using 1,4 GB

this aint a problem yet, my pc can handle this quiet well, but it dousn't has to get any bigger, or my system will start going down....

[edit] my pc has 1024 memory in benches, but my ctrl alt del says i have 1691 mb of memory so the difference +/- 700 mb is virtual memory?
[edit2] my boinc message tab says i have 1,65 gb of virual memory so the number of 1,4 gb i mentioned above is the virtual memory?
ID: 49325 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thomas Leibold

Send message
Joined: 30 Jul 06
Posts: 55
Credit: 19,627,164
RAC: 0
Message 49341 - Posted: 3 Dec 2007, 8:13:52 UTC

Is there any way to find out what caused the validate error on workunit 112697569 ?
The server 679308 is a new machine with dual Quad-Core Opteron 2346HE and 16GB of memory running OpenSuSE 10.3 in 64-bit mode. All other results from the server completed without any errors.

The same workunit was assigned to another computer, but that result has not been returned yet.
Team Helix
ID: 49341 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 76
Credit: 272,544
RAC: 485
Message 49342 - Posted: 3 Dec 2007, 8:53:20 UTC
Last modified: 3 Dec 2007, 8:57:29 UTC

VALIDATE ERROR
https://boinc.bakerlab.org/rosetta/result.php?resultid=124099003

Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 230539
CPU time 6719.932789
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1314598
==
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 8.82837829407972
Granted credit 0
application version 5.85
ID: 49342 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 76
Credit: 272,544
RAC: 485
Message 49343 - Posted: 3 Dec 2007, 8:55:48 UTC
Last modified: 3 Dec 2007, 8:58:13 UTC

VALIDATE ERROR
https://boinc.bakerlab.org/rosetta/result.php?resultid=123840131

Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 623895
CPU time 6879.734375
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>

</stderr_txt>
]]>

Claimed credit 18.6579562086075
Granted credit 0
application version 5.85
ID: 49343 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 76
Credit: 272,544
RAC: 485
Message 49344 - Posted: 3 Dec 2007, 9:00:49 UTC

VALIDATE ERROR
https://boinc.bakerlab.org/rosetta/result.php?resultid=123770365

Outcome Validate error
Client state Done
Exit status 0 (0x0)
Computer ID 623895
CPU time 5466.265625
stderr out

<core_client_version>5.10.28</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 7200
# random seed: 1569590
==
</stderr_txt>
]]>

Validate state Invalid
Claimed credit 14.8246050060423
Granted credit 0
application version 5.85
ID: 49344 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Conan
Avatar

Send message
Joined: 11 Oct 05
Posts: 151
Credit: 4,244,078
RAC: 2,272
Message 49351 - Posted: 3 Dec 2007, 14:26:11 UTC

Can someone from the Project tell what happened with This WU?

No error came up and it was successful but I get about 1 cr/h for it.

What is the go with this ???


ID: 49351 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
AMD_is_logical

Send message
Joined: 20 Dec 05
Posts: 299
Credit: 31,460,681
RAC: 0
Message 49354 - Posted: 3 Dec 2007, 16:14:53 UTC

The WU https://boinc.bakerlab.org/rosetta/result.php?resultid=123991015
seemed to crunch correctly and end normally judging by the stderr file, but it has a validate error.

From the stderr file:

<core_client_version>5.2.13</core_client_version>
<stderr_txt>
Graphics are disabled due to configuration...
# cpu_run_time_pref: 36000
# random seed: 1421496
======================================================
DONE ::     1 starting structures  36007.2 cpu seconds
This process generated    727 decoys from     727 attempts
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>

ID: 49354 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49377 - Posted: 4 Dec 2007, 16:50:30 UTC - in response to Message 49351.  

Can someone from the Project tell what happened with This WU?

No error came up and it was successful but I get about 1 cr/h for it.

What is the go with this ???



that is what you think; 1cr/h but in reallity granted credit is based on the average of claimed credit, times the amount of decoys. since you only created 2 decoys your credit is +/- 8.5 credit per decoy. so you just created verry few decoys. or most people create 1 within a small amount of time and then the task is finished so the averag credit per decoy becomes small

am i right with this? or am i missing some thing, cause i aint shure ;)
ID: 49377 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49383 - Posted: 4 Dec 2007, 17:06:32 UTC - in response to Message 49377.  
Last modified: 4 Dec 2007, 20:14:47 UTC

am i right with this? or am i missing some thing, cause i aint shure ;)


Let me try to explain by example. I found another WU in the same batch. Same name, so same protein, same batch number etc. So comparing the two:

Conan's WU 63,257 seconds, 2 decoys, 17.46 credits
Related WU 10,100 seconds, 5 decoys, 43.69 credits

The granted credit (and thus the average of credit claims so far) indicates that the second case was the more typical user experience for those tasks. So the average credit per model reflects that most models are crunching much more easily then Conan's machine did.

The potential reasons for this are too numerous to mention. And include both potential problems on Conan's machine, as well as the Rosetta application. It is also possible that everything is working perfectly on both ends, and that the particilar starting point of one (or both) of those two models was unusually difficult to study. If you'd like to discuss such potential reasons in more detail, please open a new thread.
Rosetta Moderator: Mod.Sense
ID: 49383 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49387 - Posted: 4 Dec 2007, 18:07:06 UTC

so i was right, at least a bit :) but thats much clearer, and better to understand :)
ID: 49387 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49388 - Posted: 4 Dec 2007, 18:15:54 UTC - in response to Message 49387.  
Last modified: 4 Dec 2007, 18:17:03 UTC

so i was right, at least a bit :) but thats much clearer, and better to understand :)


Yep.

I should ALSO point out that Conan's "user experience for those tasks" goes in to the average as well. And so the report of their completed WU brings up the average credit claimed per model. And this is why you see references elsewhere to this all averaging out over time.

In theory, in the past Conan has reported results for a task their machine found crunched models easily, but credit awarded per model had already been adjusted higher by another user that found it difficult. And so Conan received credit that reflected the task is occaisionally difficult (i.e. time consuming) to process a model. This time around, the luck was reversed, and it was Conan that discovered the long time per model case, and had less credit awarded then you would normally expect.
Rosetta Moderator: Mod.Sense
ID: 49388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49395 - Posted: 4 Dec 2007, 20:09:40 UTC - in response to Message 49388.  

now i was wondering dous that credit adapt over a time, when more results come in and it finds out it is really that difficult. or is it just bad luck and it stays like this?

and when do Wu's get credit, i.e. if its the first 1 of a batch there is nothing to compare, so or it has to wait, or it gets precise the same credit as credit claimed?
ID: 49395 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49398 - Posted: 4 Dec 2007, 20:20:46 UTC

Yes, the average evolves as results come in. But, as averages tend to do, it stabilizes very quickly.

I believe the first to report gets the credit claimed. After that, the granted credit is based on the average of prevous reports credit per model. Then, after credit granted is determined, the user's claimed credit is accumulated in to the average.

This approach prevents anyone from attempting to manipulate the credit per user to their own advantage. Distorting benchmarks or whatever will benefit (very very slightly) everyone that reports AFTER you.
Rosetta Moderator: Mod.Sense
ID: 49398 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 49399 - Posted: 4 Dec 2007, 20:33:47 UTC - in response to Message 49395.  

now i was wondering dous that credit adapt over a time, when more results come in and it finds out it is really that difficult. or is it just bad luck and it stays like this?

and when do Wu's get credit, i.e. if its the first 1 of a batch there is nothing to compare, so or it has to wait, or it gets precise the same credit as credit claimed?

if the decoys are computationally intensive then they'll generally be granted a lot of credit right from the start as the first computers to return the results will request a lot of credit for them. The credit granted will average out after this though so there is more variation initially.
ID: 49399 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49400 - Posted: 4 Dec 2007, 21:00:13 UTC

at the beginning of my WU i could see the graphics, but now at about 50% the showgraphics button greyed out.

some1 else also had this problem and posted i Q&A in a topic that had something to do with cpu runtime preferences
ID: 49400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 49402 - Posted: 4 Dec 2007, 21:26:59 UTC - in response to Message 49400.  

at the beginning of my WU i could see the graphics, but now at about 50% the showgraphics button greyed out.

some1 else also had this problem and posted i Q&A in a topic that had something to do with cpu runtime preferences


You can only display the graphic while the task is running. If the BOINC Manager has rotated to another project, and the status goes to "waiting to run", "wait for memory", or "suspended..." etc. then the button is grayed out.
Rosetta Moderator: Mod.Sense
ID: 49402 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Luuklag

Send message
Joined: 13 Sep 07
Posts: 262
Credit: 4,171
RAC: 0
Message 49425 - Posted: 5 Dec 2007, 14:25:57 UTC - in response to Message 49402.  


You can only display the graphic while the task is running. If the BOINC Manager has rotated to another project, and the status goes to "waiting to run", "wait for memory", or "suspended..." etc. then the button is grayed out.


that's something i know, but i am only running rosetta, and the task was runnig, thats what bothered me.
ID: 49425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mistified

Send message
Joined: 13 Jun 07
Posts: 1
Credit: 35,150,310
RAC: 0
Message 49443 - Posted: 6 Dec 2007, 13:55:28 UTC

Recently (in the past 12? hours) I've only gotten WUs like this one: 113612147 for my computer.

These WUs consume 1.2 GB virtual memory a piece, which virtually exhausts the availiable VM on my system.

Is this a bug in the v5.85 of the software, is it just these WUs that are that memory-intensive?

In any case, why isn't BOINC/Rosetta respecting the settings I've made in my profile and in the Boinc Manager with regards to memory use? It clearly states there that it should not use more than 50% of memory and 50% of swap space. This should allow one such workunit to run at a time, leaving the second core idle - if the software actually respected the limitations, that is.

With two workunits like these running I don't have the memory capacity left to run the applications I need to, which means I need to suspend the project for most of the time.
ID: 49443 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
upstatelabs

Send message
Joined: 22 Jun 06
Posts: 10
Credit: 516,767
RAC: 0
Message 49451 - Posted: 6 Dec 2007, 17:13:30 UTC - in response to Message 49258.  
Last modified: 6 Dec 2007, 17:17:55 UTC

I also have several machines that are having problems with unexpected BOINC stops, VM errors and C++ runtime errors. I dont check machines every day, so often a week or more goes by with no crunching on a system. Can rosetta@home stop sending out these problem WUs? Its a pain to have to reset machines.
ID: 49451 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next

Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux)



©2024 University of Washington
https://www.bakerlab.org