Message boards : Number crunching : Problems with Minirosetta v1.54
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 15 · Next
Author | Message |
---|---|
rembertw Send message Joined: 21 Apr 07 Posts: 14 Credit: 628,529 RAC: 0 |
Mod.Sense In the meantime I have set that computer on NNT, and changed the preferred runtime. I will reactivate that computer, and evaluate Saturday or after the weekend. You'll be informed :) |
BrnmccO1 Send message Joined: 26 Jun 07 Posts: 17 Credit: 578,825 RAC: 0 |
Very good so far, zero error results on all machines for a long time. This 1.54 is much better than the prev versions, much more stable etc. Keep up the good work stamping out the bugs. Its been a long time since I've reviewed the results on all my crunchers and found no compute errors. If things keep going the way they are, we might break 100 Tflops yet! |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Workunit 205979363 Task 228619747 Bame loopbuild_ref_tex_cst_hombench_loopbuild_tex_cst_t332__IGNORE_THE_REST_2FLIA_6_6646_10_1 Mac OS X 10.4.11 This failed after 216 seconds : tail of stderr below Setting database description ... Setting up checkpointing ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 14400 Hbond tripped. interpolate rotamers bin out of range: ARG 1.43667e-05 nan nan nan nan nan 81 81 19 20 2147483649 22 1.43667e-06 nan ERROR:: Exit from: src/core/scoring/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh line: 593 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
Yaroslav Isakov Send message Joined: 2 Nov 07 Posts: 11 Credit: 98,027 RAC: 0 |
|
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
Hello, I have some problems with Minirosetta 1.54 I got a couple of validate errors too: Task 228125280 Task 228133134 There's nothing more frustrating than completing a job ok only for it to go wrong when uploaded. I notice yours are a bit different though. The first ones just include the line: hbond tripped The other two show: Starting work on structure: _1JUDA_2_00001 Not sure if one leads to the other but hbond tripped seems to be coming up in reports more regularly. |
epcorian Send message Joined: 1 Jan 09 Posts: 16 Credit: 253,062 RAC: 0 |
I think I spoke too soon...that first WU crunched successfully but only 1 other was WU successful out of the 8 WU's. 2/8, better but still not good. I might try replacing Vista 64 with XP 64 another weekend when I'm bored. Just for curiosity sake I had my P4 and Atom 330 PC's running 32-bit XP SP3 crunch some Mini's and they did just fine. So this weekend I installed a fresh copy of XP x64, upgraded it to SP2, installed my x64 version of NOD32 antivirus, told BOINC to use "...use at most 75% of the processors" meaning 3 of 4 cores on my Q6600 and it's crunching Mini's and Beta's without a problem! 1 successful Beta, 5 successful Mini's with 4 more coming down the pipe. So it looks like Mini does not like Vista x64 and on my adventures on google, it turns out that XP x64 is actually based on the Server 2003 code tree while Vista is based on crap. :) |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Just noted that I have two tasks that failed. One had an exception, the other a validate error with 99 decoys ... Validate Error Exception Does the system have an issue with too many decoys? The reissue has not returned ... |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Just noted that I have two tasks that failed. One had an exception, the other a validate error with 99 decoys ... If I remember correctly, they have created a 99 model stop line to keep the tasks from running forever. |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Yeah, the 99 stop limit was to avoid a problem with the file size that is zipped up and uploaded. However, I was just wondering if there is now a new companion problem that the validator does not properly handle those results... or, the result itself is somehow bad... In that I have gone back to the 3rd of Feb and have at least a hundred (220) results with only three errors this is a puzzlement ... {edit} added number .. Also I note that The runtime is only 145 seconds ... so that was fast work ... :) |
Pharrg Send message Joined: 10 Jul 06 Posts: 10 Credit: 6,478 RAC: 0 |
I started running Rosetta this morning on a 64bit Vista machine and all seems to be working well. It's been working well on other projects too. Here is what I'm running: Core i7 920 CPU Asus P6T6 WS Revolution motherboard 6Gb DDR3 Triple Channel RAM Vista Home Premium SP1 64bit 64bit BOINC 6.6.7 As I said, no problems yet and a number of WU's have completed already. |
Pharrg Send message Joined: 10 Jul 06 Posts: 10 Credit: 6,478 RAC: 0 |
Ok, after a number of successful completions, I did see one that looks like it failed. Message as follows: 2/16/2009 7:49:12 PM rosetta@home Computation for task ss-neg-1i17__7365_4677_1 finished 2/16/2009 7:49:12 PM rosetta@home Output file ss-neg-1i17__7365_4677_1_0 for task ss-neg-1i17__7365_4677_1 absent Don't know the cause of that one... |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Well, a couple hundred tasks and several with the same error, multiple systems (3 different), based on Xeon, Q9300, and i7 processors, various amounts of available RAM, though in common all are running Win XP Pro 32-Bit: 228932012 229013783 229066094 229072515 The error: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004E3308 read attempt to address 0x00000000 |
Yaroslav Isakov Send message Joined: 2 Nov 07 Posts: 11 Credit: 98,027 RAC: 0 |
Hey, you're right, all my errors are with Hbond tripped in stderr, so I think that it's a source of problems |
Pharrg Send message Joined: 10 Jul 06 Posts: 10 Credit: 6,478 RAC: 0 |
So... I completed a bunch more tasks successfully, then got a 2nd task where it said the output file was missing. Anyone else getting these? 2/17/2009 6:20:35 AM rosetta@home Computation for task ss-neg-1i17__7365_5964_0 finished 2/17/2009 6:20:35 AM rosetta@home Output file ss-neg-1i17__7365_5964_0_0 for task ss-neg-1i17__7365_5964_0 absent I noticed that both tasks that gave the 'absent output file' message had a name the started witht the same first part: ss-neg-1i17__7365_ perhaps a bug in that one? |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I had one of those fail too. Firewall blocked it from reporting the symbol tables :( Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like Pharrg actually had three of these fail ss-neg-1i17__7365_5964_0 ss-neg-1i17__7365_5190_1 (wingman failed too) ss-neg-1i17__7365_4677_1 (wingman failed too) Rosetta Moderator: Mod.Sense |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I had two more similar tasks on my machiens, so I suspended others to try and run them. I've got an ss-neg-1je9 that seems normal so far. But my other ss-net-1i17 doesn't seem able to display graphics. Black window, no pane lines, on WinXP. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Yep, my next ss-neg-1i17 failed too. As soon as you bring up the graphic, which never gets beyond black, Windows task manager shows the graphic thread as "not responding". Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
2 ss-neg tasks died on me as well, i have a 3rd in progress at 50% complete so far. Here are the failures: ss-neg-1i17__7365_1743_0 ss-neg-1i17__7365_542_1 They both do the following: initialization is ok, but then when it is about to start it errors out: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x004E3308 read attempt to address 0x00000000 ---------- |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 8,235 |
|
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org