Message boards : Number crunching : Problems with Minirosetta v1.54
Previous · 1 . . . 11 · 12 · 13 · 14 · 15 · Next
Author | Message |
---|---|
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
|
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
TomaszPawel sights two cases where 99 models were completed in less then an hour with a 6.6.20 Win XP client, and resulted in validate error from miniRosetta v1.54. WU names 243895936 rest3d85_ip40_2oqk.patchdock.7.pdb_0003_fa_dock.xml_score12_pert38_DOCK_10797_652_0 244107786 rest3d85_ip40_2w4f.patchdock.1.pdb_0001_fa_dock.xml_score12_pert38_DOCK_10797_943_0 Rosetta Moderator: Mod.Sense |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Validate error with 80 decoys, 10K seconds: lb_all_multi_threshold.1.5_hb_t327__IGNORE_THE_REST_2F2EA_3_10393_1_2 |
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=245909228 Reason: Divide by Zero (0xc000008e) at address 0x004E51A9 |
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
|
TomaszPawel Send message Joined: 28 Apr 07 Posts: 54 Credit: 2,791,145 RAC: 0 |
|
William Kahler Send message Joined: 26 Oct 06 Posts: 1 Credit: 323,177 RAC: 530 |
MiniRosetta 1.54 constantly crashing after ~5 seconds & (note to Bill G) w/Boinc 6.4.x & 6.6.x (Error Code 5). It runs a little slow for first 5 seconds of CPU time w/last stable Boinc 5.x & finishes ok. No difference with protected app. or not. Complete BOINC un/re-install & Rosetta de/re-attach no help. Dell Core Duo 2 GHz w/2 Gig Ram. WinXP Sp3 Home Edition (up to date). 24/7, no throttle, no graphics/screensaver, leave in memory. Stand alone or with other projects. Memtest x2/Prime95/Dell Diagnostics run fine. thoughts? suggestions? |
Gavin Shaw Send message Joined: 1 Feb 07 Posts: 10 Credit: 506,456 RAC: 0 |
And another big upload. Task 246174559 run for 4 hours with 82 decoys. File upload size was 8.9MB. Took a while to upload. Hate to see what it would have been if there were 99 decoys... Never surrender and never give up. In the darkest hour there is always hope. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi there. I got this on Ubuntu x64 this morning, haven't had any in a while. That's 41min run time. Docking_benchmark_unbound__1AVZ.unbound.mppk.pdb.gzdock_score12_hi.xml_11475_29_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=224594412 Over__Validate error__Done__2,496.64 ====================================================== DONE :: 1 starting structures 2496.42 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== pete. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi me again. This was a big one, 7.04MB result file for a six hour run. Docking_benchmark_natives__1FIN.mppk.pdb.gzdock_score_docking_hi.xml_11477_209_0 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=224813196 ====================================================== DONE :: 1 starting structures 21620.9 cpu seconds This process generated 75 decoys from 75 attempts ====================================================== pete. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 8,235 |
Very few errors nowadays, but just came up with two compute errors: Docking_benchmark_unbound__1ATN.unbound.mppk.pdb.gzdock_score_docking_hi.xml_11476_94_1 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x005C1D7D read attempt to address 0xC49A08B0 res_careful_ourward_cst_chunk_0_8_hb_t342__IGNORE_THE_REST_1VKBA_5_10927_2_2 ERROR: [ERROR] Unable to open constraints file: resample_outward0.05_ub0.1_lb0.02_median.t342_.cst ERROR:: Exit from: ....srccorescoringconstraintsConstraintIO.cc line: 330 BOINC:: Error reading and gzipping output datafile: default.out Running AMD9850 Vista64 8Gb RAM Boinc 6.6.20 |
Yaroslav Isakov Send message Joined: 2 Nov 07 Posts: 11 Credit: 98,027 RAC: 0 |
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 8,235 |
[quote]Hello, I have a problem: very long pending status in my last WUs: I'm assuming this is fixed now. 17 of my WUs have been allocated credit since the original post, but I have another 15 pending credit - 13 hours worth. Just awaiting catch-up, I assume. The Server Status page is showing all systems 'Running'. I also noticed credit was taking more than 4 minutes to come through in the days leading up to the outage, so the problem may've been building up for a few days. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
BOINC v6.6.20 seems to be causing failures due to too many restarts. https://boinc.bakerlab.org/rosetta/result.php?resultid=247095859 https://boinc.bakerlab.org/rosetta/result.php?resultid=246620233 It suggests keeping tasks in memory. But I've always had it configured to do so. I've also limited the memory available to BOINC while computer is in use. This seems to cause BOINC to begin and then suspend the tasks numerous times during the day. When the task attempts to run and then exceeds memory bound, it goes to a status of waiting for memory. But it no longer appears in the Windows task list, hence was removed from memory. I have a HT P4, so 2 CPUs. As the primary task cycles through periods with lower memory usage, it attempts to fire up the second core. Only to find it ends up short of memory again a few minutes later as the second task gears up and uses more, or the first cycles in to another phase of higher memory usage. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Speedy Send message Joined: 25 Sep 05 Posts: 163 Credit: 808,098 RAC: 0 |
This task has been in my pending list since 29 Apr 2009 22:36:17 UTC Since 30 Apr 2009 3:12:54 UTC & Since 30 Apr 2009 3:12:54 UTC Any ideas as to why this is happening? Have a crunching good day!! |
WilMar Send message Joined: 29 Mar 09 Posts: 1 Credit: 1,984 RAC: 0 |
Hello ! Now at the advent of version 1.64, I´ve difficulties to load up my last crunched file with version 1.54. I get repeatedly the following messages: 30/04/2009 13:40:31|rosetta@home|Started upload of lb_all_multi_threshold.0.5_hb_t311__IGNORE_THE_REST_1ZK8A_1_10279_7_2_0 30/04/2009 13:42:19||Project communication failed: attempting access to reference site 30/04/2009 13:42:19|rosetta@home|Temporarily failed upload of lb_all_multi_threshold.0.5_hb_t311__IGNORE_THE_REST_1ZK8A_1_10279_7_2_0: connect() failed 30/04/2009 13:42:19|rosetta@home|Backing off 1 hr 50 min 57 sec on upload of lb_all_multi_threshold.0.5_hb_t311__IGNORE_THE_REST_1ZK8A_1_10279_7_2_0 30/04/2009 13:42:21||Internet access OK - project servers may be temporarily down. As seen on the server status page, all servers are running. So, why this problem and how to cure it ? Martin |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
BOINC v6.6.20 seems to be causing failures due to too many restarts. BOINC 6.6.20 is wotking better for me, so lets's compare our machines and settings. My newer machine, with BOINC 6.6.20 under 64-bit Vista SP1 with 8 GB of memory, does not appear to have any memory problems. My 32-bit Vista SP1 machine, with BOINC 6.2.28, originally came with 1 GB of memory. I found that wasn't enough to even start running two minirosetta@home workunits at the same time. After enough other problems showed up which I decided were memory problems, I used this site to find out how much memory my motherboard could handle, and then order enough to raise it to the 2 GB limit for my motherboard: http://www.crucial.com/ This was enough to allow it to start running two minirosetta workunits at one on my 2 CPU cores, but still not enough to run them well. Eventually, I raised both the amount of disk space BOINC is allowed to use, and the amount of swap space BOINC is allowed to use. It's not clear which of the last two steps were actually needed, if not both of them, but that combination handled the memory problems on that machine. At least some versions of BOINC do not divide up the available swap space in the most efficient way - they first divide it up into equal shares for each BOINC project you have subscribed to, then those shares into smaller shares for each CPU core. If these smaller shares aren't large enough, it can't preserve any work done since the last checkpoint by simply swapping one into the swap space on the hard drive. Does the HT stand for hyperthreaded, a method of appearing to have twice as many CPU cores by giving each one of them an extra set of registers? If so, I've seen messages from other BOINC users saying that this does not increase the total throughput very much. Therefore, until you are able to handle the memory and swapfile problems, you may find it worthwhile to tell BOINC to use only one of the two apparant CPU cores on your machine. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
I've recently had two workunits with the lockfile problem: https://boinc.bakerlab.org/rosetta/result.php?resultid=247527853 https://boinc.bakerlab.org/rosetta/result.php?resultid=247443039 Both were then completed successfully by someone else. Could minirosetta be modified to check for the lockfile problem sooner, and at least produce more debug information about it instead of wasting CPU time first? |
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org