Message boards : Number crunching : Minirosetta v1.40 bug thread
Author | Message |
---|---|
Sarel Send message Joined: 11 May 06 Posts: 51 Credit: 81,712 RAC: 0 |
Please report any bugs in this version here. Sarel. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The link on the homepage to the bugs thread leads you to the v1.39 thread. Rosetta Moderator: Mod.Sense |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
we have also located the graphic problem when there is non-protein ligand displayed and implemented a fix to that. So please let us know if you still observe such problems. |
Sarel Send message Joined: 11 May 06 Posts: 51 Credit: 81,712 RAC: 0 |
Thanks! Fixed... Sarel |
Naesbye Send message Joined: 30 Jul 08 Posts: 5 Credit: 201,436 RAC: 0 |
My first 1.40 unit ended with a computation error. |
Odd Braathun Send message Joined: 2 Sep 08 Posts: 9 Credit: 16,125 RAC: 0 |
Problem with this task: Task ID 206078107 Name 1vcc__BOINC_ABRELAX_SPLIT_CONTROL_IGNORE_THE_REST-S25-9-S3-3--1vcc_-_4677_199_0 Workunit 188017112 Exiting numerous times but no "finished" file. Boinc said to reset project. Odd |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I have this task running now it is very slow to progress, I watched and it is only making .001% in 20sec. It has been running for 8hrs,20min and is at 98.050% my run time is 6hrs i haven't had this big a margin to finish before. Could it be the new mini app 1.40 or the task? 1hzh_2fiw_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_76 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=188078859 I'll let it run to end. pete. |
Odd Braathun Send message Joined: 2 Sep 08 Posts: 9 Credit: 16,125 RAC: 0 |
I have had one of these, too, but have now aborted it. Task ID 206030023 Name 1hzh_1juv_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_27_0 Workunit 187974469 I had also Task ID 206101035 Name IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_2osa_4683_197_0 Workunit 188036989 This task ran smoothly for 2 hours, but ended up with a validate error. Odd |
Aegis Maelstrom Send message Joined: 29 Oct 08 Posts: 61 Credit: 2,137,555 RAC: 0 |
Hi, I'm just having a similar problem as above. Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 restarted twice so far, now processing: 2008-11-09 07:36:00|rosetta@home|Starting IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 2008-11-09 07:36:35|rosetta@home|Starting task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 using minirosetta version 140 2008-11-09 09:36:44|rosetta@home|Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 exited with zero status but no 'finished' file 2008-11-09 09:36:45|rosetta@home|If this happens repeatedly you may need to reset the project. 2008-11-09 09:38:42|rosetta@home|Restarting task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 using minirosetta version 140 2008-11-09 12:16:02|rosetta@home|Task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 exited with zero status but no 'finished' file 2008-11-09 12:16:03|rosetta@home|If this happens repeatedly you may need to reset the project. 2008-11-09 12:16:48|rosetta@home|Restarting task IL23p40_p40BrubYhbond_design_jecorn_SAVE_ALL_OUT_IGNORE_THE_REST_ip40_1wr2_4683_55_1 using minirosetta version 140 Just before the second restart, it had a progress near 60% and it was on a "model 1" 11000+ step, I guess unfolding/testing a beautifully folded protein (step around 10000 had a lower "low energy" than 11000) - I've made a snapshot. When it restarted, it began from something like 18% and a still not enough folded protein. The time elapsed has been reduced as well. What I would like to ask first is to add some checkpoints, it would help to process and bugtest. Now I am waiting to check if this workunit is endable. |
Aegis Maelstrom Send message Joined: 29 Oct 08 Posts: 61 Credit: 2,137,555 RAC: 0 |
The Workunit restarted third time, seemingly in the same place as the previous time (the percentage "completed" was higher but I was checking a couple minutes earlier and it was once again step 10000 then, so now it was probably 11000). The WU started for the fourth time, now with 24% but I guess it was the same moment as before. When I restarted the WU after temporarily halting once again, it went back to 17%. Now I can see 18,23% and step 523. Now I am halting this task and my business with Rosetta. When the BOINC tried to download a different task, I got a following log: 2008-11-09 14:29:23|rosetta@home|Message from server: No work sent 2008-11-09 14:29:23|rosetta@home|Message from server: Your preferences limit memory usage to 452 MB, and 488 MB is needed The problem seems to be with a higher memory usage although one of the mods recently assured us that there is no increase in memory requirements. I could increase amount of memory dedicated to BOINC, however I would like to have this problem explained and ironed out. Frankly speaking, as this is just a next computational problem in a few days, any explanations from Rosetta developers/maintainers would be highly appreciated. Thanks for your co-operation and good luck. |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
Hello all, Just saw an error from this WU: loopbuild_boinc4_hombench_loopbuild_t308__IGNORE_THE_REST_1UKVY_1_4693_12_0 <core_client_version>6.2.25</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 Too many restarts with no progress. Keep application in memory while preempted. ====================================================== DONE :: 1 starting structures 24.3206 cpu seconds This process generated 0 decoys from 0 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> <message> <file_xfer_error> <file_name>loopbuild_boinc4_hombench_loopbuild_t308__IGNORE_THE_REST_1UKVY_1_4693_12_0_0</file_name> <error_code>-161</error_code> </file_xfer_error> </message> Well, looks like 2 errors: Too many restarts & file_xfer error. Be aware: I'm running WCG's (beta-)BOINC 6.2.25, which seems to be pretty stable (so far). Have a nice day, Path7. |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
And another error: oopbuild_boinc4_hombench_loopbuild_t326__IGNORE_THE_REST_1I1QB_3_4700_8_0 failed with: ERROR: NANs occured in hbonding! ERROR:: Exit from: ....srccorescoringhbondshbonds_geom.cc line: 763 called boinc_finish Have a nice day, Path7. |
neil.hunter14 Send message Joined: 9 May 06 Posts: 10 Credit: 278,867 RAC: 0 |
I have this task running now it is very slow to progress, I watched and it I grabbed a few WUs on both an XP and Linux m/c. Both have the same problem for me, in that they get to around 98% complete, then seem to just hang there. Completion does not take place and I have aborted all 1.40 WUs on both PCs for now. Neil, UK. |
neil.hunter14 Send message Joined: 9 May 06 Posts: 10 Credit: 278,867 RAC: 0 |
[/quote] I grabbed a few WUs on both an XP and Linux m/c. Both have the same problem for me, in that they get to around 98% complete, then seem to just hang there. Completion does not take place and I have aborted all 1.40 WUs on both PCs for now. Neil, UK. [/quote] ......they all seem to finally stick with 9m 53s to the end of the WU. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I have this task running now it is very slow to progress, I watched and it Well it finally finished after 11hrs not very happy, something needs to be fixed. <core_client_version>5.10.45</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 39696.6 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> Over _ Success _ Done _ 39,697.10 _ 278.57 _ 16.10 b.t.w the credit is a bad joke. pete. |
Allan Hojgaard Send message Joined: 4 May 08 Posts: 9 Credit: 591,749 RAC: 0 |
Adding my share of long working WUs: 1hzh_2pww_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_86 Result: <core_client_version>6.2.18</core_client_version> <![CDATA[ <stderr_txt> # cpu_run_time_pref: 21600 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 39652.9 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> As many have said before I do not mind crunching large WUs, but I would like to be credited/warned about it beforehand. Currently one of my cores is working on 1hzh_1a58_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_87_0 and it has now been working on it for 14 hours and 24 minutes and it has reached 98.840%. I am sure that I will get very low credit for it like the others in this thread. This what the graphics show me: http://www.home.no/kalumba/rosetta.png Until the mess has been sorted out/properly explained I'm crunching for another project. I'm going to visit the forum frequently as Rosetta@Home is my favourite project. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Looks like i have another run away task it's at 6hrs, 45min at 97.655% and as slow as wet cement about .001% every 10 sec better then the last one but not much. I bet i don't get much for it if & when it finisher's. 1hzh_2fe5_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_76 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=188078846 pete. |
AMD_is_logical Send message Joined: 20 Dec 05 Posts: 299 Credit: 31,460,681 RAC: 0 |
And another error: I got the same error on one of my Linux nodes: h005__BOINC_ABRELAX_RANGE_yebf_IGNORE_THE_REST-S25-7-S3-8--h005_-_4675_19_0 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
I've got another one of those workunits that are running longer than expected: 11/9/2008 5:57:49 PM|rosetta@home|Starting 1hzh_1o9g_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_155_1 11/9/2008 5:57:54 PM|rosetta@home|Starting task 1hzh_1o9g_fchbonds_20_30sarel_SAVE_ALL_OUT_4704_155_1 using minirosetta version 140 Last night, it had accumulated about 6 CPU hours and claimed that it would finish in another 10 CPU minutes. This morning, it has accumulated over 12 CPU hours and claims that it will finish in another 9 CPU minutes and 56 seconds. Also, it's currently the most memory hungry process on my machine. The Windows Task Manager recently said it was using over 256,000K of memory - over 10 times as much as the next process - but then dropped that to a little over 200,000K and is now 223,132K. Since it hasn't let any other process take a turn in its CPU core for much longer than the 2 hours I've tried to set it for, I'll suspend it for a while and see if that helps. The other person with a similar workunit had a compute error after about 6 CPU hours. |
caesar1987 Send message Joined: 28 Nov 06 Posts: 13 Credit: 22,268 RAC: 0 |
same by me it say that it will finish in 9minuter and 51 sec. But by me is accumulates only 5 hour 5 min, but las hour it is the same. "mini"rosetta mem usage -cca 290,000 K, VMsize - 320,000 K!!! whats on this mini? |
Message boards :
Number crunching :
Minirosetta v1.40 bug thread
©2024 University of Washington
https://www.bakerlab.org