Message boards : Number crunching : Problems with Rosetta version 5.46
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
EigenState Send message Joined: 16 Feb 07 Posts: 4 Credit: 1,667 RAC: 0 |
Yes, I do use BAM. If I use it properly is an entirely different question to which I hope the answer would be yes, but I am not certain of that. As above, you have to attach, using the host options in BAM. If you try to attach yourself and it is a project boinc support, when it contacts BAM it will kick the project off (unfortunatly now questions asked) To help you out http://www.boincstats.com/bam/host_list.php Link to your host list OK, I did try to attach to Rosetta directly through the BOINC Manager, so that might explain the detachments I observed. I also did have BAM set to attach to Rosetta, but so far nothing has actually happened. Being on dialup, I just cannot allow the connection to stand open forever. So is there a way to force the attachment through BAM to proceed? |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Yes, I do use BAM. If I use it properly is an entirely different question to which I hope the answer would be yes, but I am not certain of that. Under tools in Boinc press synconice to BAM. |
EigenState Send message Joined: 16 Feb 07 Posts: 4 Credit: 1,667 RAC: 0 |
I have successfully attached to Rosetta, and am currently calculating a Work Unit. Thanks to all of you for the help, and my apologies for taking this thread off topic. |
Michael.L Send message Joined: 12 Nov 06 Posts: 67 Credit: 31,295 RAC: 0 |
Result ID- 62740659 Winny XP home. AMD 3200+. CAPRI 12 ND 73 GLOBAL DOCKING 1562 9497 00 - Was stuck at score 199.144 for 3600 seconds. stderr out <core_client_version>5.4.11</core_client_version> <stderr_txt> # random seed: 1432284 # cpu_run_time_pref: 14400 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -199.144 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .aand73.out </stderr_txt> |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i think the docking units are getting stuck on our AMD chips for some reason. I had a similar error on this: CAPRI_12_ND73_GLOBAL_DOCKING_1562_9081_0 and a similar one. Both were global docking. I read somewhere in here or over in Dr. Bakers message board area that this was likely to happen at random with these WU's. |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Hi you all, we actually did find a bug which has caused very high rate of "stuck" trjactories for docking workunits and has included it in V5.46 update. Those trajectories are not stuck themselves, but the watchdog thread is fooled to think they are stuck. Although this fix did make improvement a lot, we also find that it is not completely solving the problem. Compared to other "protein folding or farlx" workunits, docking workunits generally have less number of "acceptance" steps and therefore energy values do not change as frquently. That is believed to be the culprit as the watchdog thread is checking energy value periodically to decide whether a run is stuck or not. We have proposed a more robust solution to this problem and plan to include it in the next scheduled update. Thank you all for the help. i think the docking units are getting stuck on our AMD chips for some reason. |
Viromancy Send message Joined: 23 Sep 06 Posts: 8 Credit: 125,713 RAC: 0 |
Nope, it was running at the standard speed. Just for the heck of it though, I've now underclocked it 6% to see how it goes. Has underclocking helped? |
Vagelis Stefas Send message Joined: 27 Aug 06 Posts: 5 Credit: 118,856 RAC: 0 |
Problem with rosseta 5.46 WU name: DOC_1MLC_R070216_pose_u_pert_from_farl_abs_tot_1571_1078_0 Target run time was 6 hours. The WU was currently at 96.5% and stated that it wanted another 13 minutes to complete. Having run about 6 hours that wasn't too unreasonable. After an extra hour of running it still reported that it wanted about 15 minutes to complete. So I checked the graphics in that WU and it seemed to be stuck. I paused and started over only to see that an hour of processing was gone (Back to 5:47). Now it seems to work fine but I can't babysit rosetta forever. The computer was not overclocked and other than doing rosetta no other major program was running. Computer ID: 377933 |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
Q: I have five work's with "Client error" BUT (CPU time ~ 55,000) Will i have "granted credit" for this work's ? https://boinc.bakerlab.org/rosetta/results.php?hostid=350614&offset=0 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I moved Kodak's post here. He's got at least one of the three WUs that was reported more then 24hrs ago and has been granted zero credit. Is the daily credit granting script running? The other two SHOULD have SOME credit granted by the daily script. It will show on the result page. But I believe the maximum credit awarded by the script is 20 credits. Your machine is crunching at a stellar rate and claims over 300 credits for each of these tasks, and is typically granted significantly more then it claims on successful tasks. All of your failure codes seem to be -107s, this often points to problems accessing memory. Have you done memory tests on this machine? Rosetta Moderator: Mod.Sense |
288VKYUjwsXfAaTXn6SFJC4LVPRf Send message Joined: 16 Dec 05 Posts: 31 Credit: 153,110 RAC: 0 |
Failed WU I also had a WU, a DOC that after 1 hour of processing didn't pass step 507. It was a huge result, an energy of -1100. Maybe that's important to know. So I aborted it |
(_KoDAk_) Send message Joined: 18 Jul 06 Posts: 109 Credit: 1,859,263 RAC: 0 |
All of your failure codes seem to be -107s, this often points to problems accessing memory. Have you done memory tests on this machine? I run 24hur work's.( pending 3 days) before (in 5-6 days)I run 12hur work's AHD all other time I run in AUTO (~3 hur) I test machine by ("OCCT" work fine 35 min) MB P5B Deluxe (Bios 1004) CPU E6600(FSB 390Mhz) RAM Corsair Volue 667 (OC @ 780Mhz -2.0V) "RightMark Memory Analyzer' test stability 10 min -"Fine" |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
Nope, it was running at the standard speed. Just for the heck of it though, I've now underclocked it 6% to see how it goes. Looks like it - I don't think the host has failed on a WU since. I'll keep an eye on it though. |
Vagelis Stefas Send message Joined: 27 Aug 06 Posts: 5 Credit: 118,856 RAC: 0 |
Failed WU In my case yhe problem in the WU appeared in model 51 or 57 step 307 something. |
Michael.L Send message Joined: 12 Nov 06 Posts: 67 Credit: 31,295 RAC: 0 |
Name DOC_1BRC_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_1116_0 Workunit 56364454 Created 18 Feb 2007 4:51:58 UTC Sent 18 Feb 2007 4:57:04 UTC Received 18 Feb 2007 21:35:33 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 416175 Report deadline 28 Feb 2007 4:57:04 UTC CPU time 7124.953125 stderr out <core_client_version>5.4.11</core_client_version> <stderr_txt> # random seed: 1079855 # cpu_run_time_pref: 14400 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score -549.143 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .ii1BRC.out </stderr_txt> Validate state Valid Claimed credit 22.1295493695625 Granted credit 20 application version 5.46 --AMD 3200 64bit WXP Home |
mark Send message Joined: 3 Sep 06 Posts: 1 Credit: 633 RAC: 0 |
I'm running the latest BOINC from Seti, and Rosetta is simply not resuming when Seti tries to switch: Sun 18 Feb 2007 01:25:54 PM CST|SETI@home|Task 24no03aa.2465.11330.379824.3.199_1 exited with zero status but no 'finished' file Sun 18 Feb 2007 01:25:54 PM CST|SETI@home|If this happens repeatedly you may need to reset the project. Sun 18 Feb 2007 02:34:05 PM CST|SETI@home|Restarting task 24no03aa.2465.11330.379824.3.199_1 using setiathome_enhanced version 512 Sun 18 Feb 2007 02:34:07 PM CST|rosetta@home|Task DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 exited with zero status but no 'finished' file Sun 18 Feb 2007 02:34:07 PM CST|rosetta@home|If this happens repeatedly you may need to reset the project. Sun 18 Feb 2007 03:34:17 PM CST|rosetta@home|Restarting task DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 using rosetta version 546 Sun 18 Feb 2007 03:34:22 PM CST|SETI@home|Task 24no03aa.2465.11330.379824.3.199_1 exited with zero status but no 'finished' file Sun 18 Feb 2007 03:34:22 PM CST|SETI@home|If this happens repeatedly you may need to reset the project. Sun 18 Feb 2007 04:01:43 PM CST||Restarting DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 - message timeout Sun 18 Feb 2007 04:01:44 PM CST||[error] Process 6459 not found Sun 18 Feb 2007 05:06:48 PM CST|SETI@home|Restarting task 24no03aa.2465.11330.379824.3.199_1 using setiathome_enhanced version 512 Sun 18 Feb 2007 05:06:50 PM CST|rosetta@home|Task DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 exited with zero status but no 'finished' file Sun 18 Feb 2007 05:06:50 PM CST|rosetta@home|If this happens repeatedly you may need to reset the project. Sun 18 Feb 2007 06:29:25 PM CST|rosetta@home|Restarting task DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 using rosetta version 546 Sun 18 Feb 2007 06:29:27 PM CST|SETI@home|Task 24no03aa.2465.11330.379824.3.199_1 exited with zero status but no 'finished' file Sun 18 Feb 2007 06:29:27 PM CST|SETI@home|If this happens repeatedly you may need to reset the project. Sun 18 Feb 2007 07:56:03 PM CST||Restarting DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 - message timeout Sun 18 Feb 2007 07:56:03 PM CST|rosetta@home|Restarting task DOC_1BVK_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_2226_0 using rosetta version 546 Sun 18 Feb 2007 07:56:04 PM CST||[error] Process 14659 not found Sun 18 Feb 2007 09:23:22 PM CST|SETI@home|Restarting task 24no03aa.2465.11330.379824.3.199_1 using setiathome_enhanced version 512 Mon 19 Feb 2007 01:04:00 AM CST||Restarting 24no03aa.2465.11330.379824.3.199_1 - message timeout Mon 19 Feb 2007 01:04:00 AM CST|SETI@home|Restarting task 24no03aa.2465.11330.379824.3.199_1 using setiathome_enhanced version 512 Mon 19 Feb 2007 01:04:02 AM CST||[error] Process 21992 not found I have reset both projects several times. Any ideas? |
Michael.L Send message Joined: 12 Nov 06 Posts: 67 Credit: 31,295 RAC: 0 |
Result ID 63293328 Name DOC_1CSE_R070216_pose_u_pert_bbmin_from_farlx_abs_tol_1571_1911_0 Workunit 56411744 Created 18 Feb 2007 12:36:16 UTC Sent 18 Feb 2007 12:41:57 UTC Received 19 Feb 2007 17:13:20 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x0) Computer ID 416175 Report deadline 28 Feb 2007 12:41:57 UTC CPU time 13927.140625 stderr out <core_client_version>5.4.11</core_client_version> <stderr_txt> # random seed: 1039060 # cpu_run_time_pref: 14400 ********************************************************************** Rosetta score is stuck or going too long. Watchdog is ending the run! Stuck at score 3696.27 for 3600 seconds ********************************************************************** GZIP SILENT FILE: .ii1CSE.out </stderr_txt> Validate state Valid Claimed credit 43.2566138514458 Granted credit 20 application version 5.46 -- Do we still need to report these?? |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
... A general note, not specific to this WU: Please note that Rosetta's estimates of time left to completion are only accurate immediately after they drop in value. Then they slowly increase until the next time they drop in value again. The increases can be disregarded. The value will be approximately right providing the WU does not get stuck. If a WU does get stuck then the estimated time to complete will go on increasing slowly forever (if the clock is still running) or will stay the same forever (if the clock has stuck also). In brief, you cannot rely on the time left to complete when there is a risk of a stuck WU. River~~ |
Viromancy Send message Joined: 23 Sep 06 Posts: 8 Credit: 125,713 RAC: 0 |
Another watchdog termination in 5.46...this time after quite an impressive amount of time: https://boinc.bakerlab.org/rosetta/result.php?resultid=63410631 |
TeAm Enterprise Send message Joined: 28 Sep 05 Posts: 18 Credit: 27,911,735 RAC: 8 |
Message log from my Core 2 Duo, computer 286302 is below. WU ending in 40997_0 started at 2:27am and was still unfinished a 5:31pm when I suspended it. We'll see if it resumes when a core is available. Watchdog didn't do much for me here. Should have taken less than 3 hours to complete. 2/19/2007 2:27:10 AM|rosetta@home|Starting ep10__BOINC_ABRELAX_hom001__1569_40997_0 2/19/2007 2:27:10 AM|rosetta@home|Starting task ep10__BOINC_ABRELAX_hom001__1569_40997_0 using rosetta version 546 2/19/2007 2:27:12 AM|rosetta@home|[file_xfer] Started upload of file ep10__BOINC_ABRELAX_hom001__1569_23899_0_0 2/19/2007 2:27:17 AM|rosetta@home|[file_xfer] Finished upload of file ep10__BOINC_ABRELAX_hom001__1569_23899_0_0 19/2007 5:30:59 PM|rosetta@home|Sending scheduler request: Requested by user 2/19/2007 5:30:59 PM|rosetta@home|Reporting 5 tasks 2/19/2007 5:31:04 PM|rosetta@home|Scheduler RPC succeeded [server version 509] 2/19/2007 5:31:04 PM|rosetta@home|Deferring communication for 4 min 2 sec 2/19/2007 5:31:04 PM|rosetta@home|Reason: requested by project 2/19/2007 5:31:16 PM|rosetta@home|Starting BAK1topH_TnC_loop_model_1561_16682_0 2/19/2007 5:31:16 PM|rosetta@home|Starting task BAK1topH_TnC_loop_model_1561_16682_0 using rosetta version 546 Crunch with friends - TeAm Anandtech |
Message boards :
Number crunching :
Problems with Rosetta version 5.46
©2025 University of Washington
https://www.bakerlab.org