Message boards : Number crunching : Problems with Rosetta version 5.85 (or 5.86 for linux)
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 . . . 8 · Next
Author | Message |
---|---|
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I don't know if it's the app or the W.U's but these are using alot of memory also. Up to 98% system resources. w0x7_1_MolecularRep_1_w0x7_1_ffas03-1-2b0v_StructuralGenomics_a_2336_53689_0 using rosetta_beta version 585. Pete. |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
I don't know if it's the app or the W.U's but these are using alot same with my, it is allowed to use 90% of cpu, but my cpu is running at 100%, with msn and wmp. my normal memory is using 190000kb out of 1024 mb and my pagefile is using 1,4 GB this aint a problem yet, my pc can handle this quiet well, but it dousn't has to get any bigger, or my system will start going down.... [edit] my pc has 1024 memory in benches, but my ctrl alt del says i have 1691 mb of memory so the difference +/- 700 mb is virtual memory? [edit2] my boinc message tab says i have 1,65 gb of virual memory so the number of 1,4 gb i mentioned above is the virtual memory? |
Thomas Leibold Send message Joined: 30 Jul 06 Posts: 55 Credit: 19,627,164 RAC: 0 |
Is there any way to find out what caused the validate error on workunit 112697569 ? The server 679308 is a new machine with dual Quad-Core Opteron 2346HE and 16GB of memory running OpenSuSE 10.3 in 64-bit mode. All other results from the server completed without any errors. The same workunit was assigned to another computer, but that result has not been returned yet. Team Helix |
Dr Who Fan Send message Joined: 28 May 06 Posts: 76 Credit: 272,544 RAC: 485 |
VALIDATE ERROR https://boinc.bakerlab.org/rosetta/result.php?resultid=124099003 Outcome Validate error Client state Done Exit status 0 (0x0) Computer ID 230539 CPU time 6719.932789 stderr out <core_client_version>5.10.28</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 1314598 == </stderr_txt> ]]> Validate state Invalid Claimed credit 8.82837829407972 Granted credit 0 application version 5.85 |
Dr Who Fan Send message Joined: 28 May 06 Posts: 76 Credit: 272,544 RAC: 485 |
VALIDATE ERROR https://boinc.bakerlab.org/rosetta/result.php?resultid=123840131 Outcome Validate error Client state Done Exit status 0 (0x0) Computer ID 623895 CPU time 6879.734375 stderr out <core_client_version>5.10.28</core_client_version> <![CDATA[ <stderr_txt> </stderr_txt> ]]> Claimed credit 18.6579562086075 Granted credit 0 application version 5.85 |
Dr Who Fan Send message Joined: 28 May 06 Posts: 76 Credit: 272,544 RAC: 485 |
VALIDATE ERROR https://boinc.bakerlab.org/rosetta/result.php?resultid=123770365 Outcome Validate error Client state Done Exit status 0 (0x0) Computer ID 623895 CPU time 5466.265625 stderr out <core_client_version>5.10.28</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 7200 # random seed: 1569590 == </stderr_txt> ]]> Validate state Invalid Claimed credit 14.8246050060423 Granted credit 0 application version 5.85 |
Conan Send message Joined: 11 Oct 05 Posts: 151 Credit: 4,244,078 RAC: 2,272 |
Can someone from the Project tell what happened with This WU? No error came up and it was successful but I get about 1 cr/h for it. What is the go with this ??? |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
The WU https://boinc.bakerlab.org/rosetta/result.php?resultid=123991015 seemed to crunch correctly and end normally judging by the stderr file, but it has a validate error. From the stderr file: <core_client_version>5.2.13</core_client_version> <stderr_txt> Graphics are disabled due to configuration... # cpu_run_time_pref: 36000 # random seed: 1421496 ====================================================== DONE :: 1 starting structures 36007.2 cpu seconds This process generated 727 decoys from 727 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
Can someone from the Project tell what happened with This WU? that is what you think; 1cr/h but in reallity granted credit is based on the average of claimed credit, times the amount of decoys. since you only created 2 decoys your credit is +/- 8.5 credit per decoy. so you just created verry few decoys. or most people create 1 within a small amount of time and then the task is finished so the averag credit per decoy becomes small am i right with this? or am i missing some thing, cause i aint shure ;) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
am i right with this? or am i missing some thing, cause i aint shure ;) Let me try to explain by example. I found another WU in the same batch. Same name, so same protein, same batch number etc. So comparing the two: Conan's WU 63,257 seconds, 2 decoys, 17.46 credits Related WU 10,100 seconds, 5 decoys, 43.69 credits The granted credit (and thus the average of credit claims so far) indicates that the second case was the more typical user experience for those tasks. So the average credit per model reflects that most models are crunching much more easily then Conan's machine did. The potential reasons for this are too numerous to mention. And include both potential problems on Conan's machine, as well as the Rosetta application. It is also possible that everything is working perfectly on both ends, and that the particilar starting point of one (or both) of those two models was unusually difficult to study. If you'd like to discuss such potential reasons in more detail, please open a new thread. Rosetta Moderator: Mod.Sense |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
so i was right, at least a bit :) but thats much clearer, and better to understand :) |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
so i was right, at least a bit :) but thats much clearer, and better to understand :) Yep. I should ALSO point out that Conan's "user experience for those tasks" goes in to the average as well. And so the report of their completed WU brings up the average credit claimed per model. And this is why you see references elsewhere to this all averaging out over time. In theory, in the past Conan has reported results for a task their machine found crunched models easily, but credit awarded per model had already been adjusted higher by another user that found it difficult. And so Conan received credit that reflected the task is occaisionally difficult (i.e. time consuming) to process a model. This time around, the luck was reversed, and it was Conan that discovered the long time per model case, and had less credit awarded then you would normally expect. Rosetta Moderator: Mod.Sense |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
now i was wondering dous that credit adapt over a time, when more results come in and it finds out it is really that difficult. or is it just bad luck and it stays like this? and when do Wu's get credit, i.e. if its the first 1 of a batch there is nothing to compare, so or it has to wait, or it gets precise the same credit as credit claimed? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Yes, the average evolves as results come in. But, as averages tend to do, it stabilizes very quickly. I believe the first to report gets the credit claimed. After that, the granted credit is based on the average of prevous reports credit per model. Then, after credit granted is determined, the user's claimed credit is accumulated in to the average. This approach prevents anyone from attempting to manipulate the credit per user to their own advantage. Distorting benchmarks or whatever will benefit (very very slightly) everyone that reports AFTER you. Rosetta Moderator: Mod.Sense |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,821,902 RAC: 15,180 |
now i was wondering dous that credit adapt over a time, when more results come in and it finds out it is really that difficult. or is it just bad luck and it stays like this? if the decoys are computationally intensive then they'll generally be granted a lot of credit right from the start as the first computers to return the results will request a lot of credit for them. The credit granted will average out after this though so there is more variation initially. |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
at the beginning of my WU i could see the graphics, but now at about 50% the showgraphics button greyed out. some1 else also had this problem and posted i Q&A in a topic that had something to do with cpu runtime preferences |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
at the beginning of my WU i could see the graphics, but now at about 50% the showgraphics button greyed out. You can only display the graphic while the task is running. If the BOINC Manager has rotated to another project, and the status goes to "waiting to run", "wait for memory", or "suspended..." etc. then the button is grayed out. Rosetta Moderator: Mod.Sense |
Luuklag Send message Joined: 13 Sep 07 Posts: 262 Credit: 4,171 RAC: 0 |
that's something i know, but i am only running rosetta, and the task was runnig, thats what bothered me. |
Mistified Send message Joined: 13 Jun 07 Posts: 1 Credit: 35,150,310 RAC: 0 |
Recently (in the past 12? hours) I've only gotten WUs like this one: 113612147 for my computer. These WUs consume 1.2 GB virtual memory a piece, which virtually exhausts the availiable VM on my system. Is this a bug in the v5.85 of the software, is it just these WUs that are that memory-intensive? In any case, why isn't BOINC/Rosetta respecting the settings I've made in my profile and in the Boinc Manager with regards to memory use? It clearly states there that it should not use more than 50% of memory and 50% of swap space. This should allow one such workunit to run at a time, leaving the second core idle - if the software actually respected the limitations, that is. With two workunits like these running I don't have the memory capacity left to run the applications I need to, which means I need to suspend the project for most of the time. |
upstatelabs Send message Joined: 22 Jun 06 Posts: 10 Credit: 516,767 RAC: 0 |
I also have several machines that are having problems with unexpected BOINC stops, VM errors and C++ runtime errors. I dont check machines every day, so often a week or more goes by with no crunching on a system. Can rosetta@home stop sending out these problem WUs? Its a pain to have to reset machines. |
Message boards :
Number crunching :
Problems with Rosetta version 5.85 (or 5.86 for linux)
©2024 University of Washington
https://www.bakerlab.org