minirosetta v1.15 bug thread

Message boards : Number crunching : minirosetta v1.15 bug thread

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 52810 - Posted: 30 Apr 2008, 14:06:08 UTC

I hope this is the last of the few. it got there in the end but needed a bit of kicking. 159497259
ID: 52810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 52814 - Posted: 30 Apr 2008, 21:12:59 UTC

Finally, a success with mini 1.15!

This computer, hostid=599043, made it to the end of a 24 hour run on resultid=159274026.

It may be important to note that this is a 64 bit processor with 4GB RAM and 8GB of swap space...
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 52814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 52815 - Posted: 1 May 2008, 2:16:43 UTC - in response to Message 52794.  

I am 0/5 now on mini 1.15 workunits with these two computers:

hostid=623950
hostid=663412

Both have successfully completed mini 1.15 workunits on RALPH.
{...}


Make that 0/8 now. Neither of the two above linked computers, normally stable, reliable crunchers, has been able to successfully complete a mini 1.15 workunit since that application version was released on Rosetta.

I have set the runtime preference for the two computers down to 3 hours, since that time frame seems to have worked for my wingmen on all of these failed tasks.
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 52815 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dave Mickey

Send message
Joined: 29 Dec 07
Posts: 33
Credit: 4,136,957
RAC: 0
Message 52816 - Posted: 1 May 2008, 2:53:27 UTC

One cruncher (333M Celeron, 192MB, NT4) got some mini 1.15s and every one
(of about 5 or 6) did the thing where they say they have started,
but accumulate no elapsed CPU time (remains as 0) and time to complete
is reported blank, and they certainly seem to be going no where.
After aborting them (giving each a fair chance to actually compute),
turned off new work for a while (and SETI ran in the mean time), then
it collected some Beta 5.96 WUs, and is crunching them now, seemingly
OK. Will have to wait to see if more 1.15s come along, but it doesn't
look good. Seems to be solid distinction of runnable vs. not.

Other machine (500M P3, 512M, W2K) has gotten one mini 1.15, and ran it
OK.

ID: 52816 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile David Emigh
Avatar

Send message
Joined: 13 Mar 06
Posts: 158
Credit: 417,178
RAC: 0
Message 52821 - Posted: 1 May 2008, 15:15:35 UTC
Last modified: 1 May 2008, 15:17:53 UTC

I suspect the focus of development energy has already shifted to 1.16, but if there is any information still to be gleaned from the failures of 1.15, the data in this post may be significant.

I changed the runtime preference for the two computers that I've been whining about for several days now, decreasing it from 24 hours to 3 hours. The result was dramatic.

On one system, my success rate with mini 1.15 went from 0% to 100%. That computer is now 5/5 with the formerly un-runnable application.
link to host


The other system is still struggling, but the type of error reported has changed from "compute error" to "validate error."
link to host
Rosie, Rosie, she's our gal,
If she can't do it, no one shall!
ID: 52821 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
glaesum

Send message
Joined: 16 Oct 06
Posts: 21
Credit: 508,632
RAC: 0
Message 52823 - Posted: 1 May 2008, 16:29:27 UTC

so... ...it doesn't look terribly hopeful that minirosetta is fully ready for the launch of CASP8 next monday, does it!!!
ID: 52823 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mewbysea

Send message
Joined: 29 Jan 06
Posts: 17
Credit: 15,843,832
RAC: 1,327
Message 52826 - Posted: 1 May 2008, 23:56:08 UTC - in response to Message 52821.  



I changed the runtime preference for the two computers that I've been whining about for several days now, decreasing it from 24 hours to 3 hours. The result was dramatic.

On one system, my success rate with mini 1.15 went from 0% to 100%. That computer is now 5/5 with the formerly un-runnable application.
link to host


The other system is still struggling, but the type of error reported has changed from "compute error" to "validate error."
link to host


I've been having the same problem as David with a runtime preference of 10 hours. I'll adjust to a shorter runtime and see if that helps.

ID: 52826 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 52827 - Posted: 2 May 2008, 1:35:54 UTC

I've been away from the boards for a few days and I just noticed 3 WU of mine errored out (all minirosetta). This appears to be a fairly known problem after reading about all the problems everyone else has had. My question is this, am I going to get credit for the lost time (about 600 credits by my count)?

Timothy
ID: 52827 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile staffann

Send message
Joined: 7 Oct 07
Posts: 7
Credit: 69,937
RAC: 0
Message 52834 - Posted: 2 May 2008, 12:24:22 UTC

First posted in the "problems with minirosetta 1.+" thread but realised that this is probably a better place:

I just had a minirosetta 1.15 fail right after I had hit the graphics button. My computer runs WinXPSP2 on an Athlon 64 X2 3800+ and has a NVidia graphics card. AV is Avast.

https://boinc.bakerlab.org/rosetta/result.php?resultid=160186210

2008-05-02 13:42:02|World Community Grid|Sending scheduler request: To report completed tasks
2008-05-02 13:42:02|World Community Grid|Reporting 1 tasks
2008-05-02 13:42:07|World Community Grid|Scheduler RPC succeeded [server version 601]
2008-05-02 13:42:07|World Community Grid|Deferring communication for 1 min 1 sec
2008-05-02 13:42:07|World Community Grid|Reason: requested by project
2008-05-02 13:45:11|rosetta@home|Deferring communication for 1 min 0 sec
2008-05-02 13:45:11|rosetta@home|Reason: Unrecoverable error for result 1who__BOINC_ABINITIO_IGNORE_THE_REST-S25-9-S3-3--1who_-_3092_1911_0 ( - exit code -1073741819 (0xc0000005))
2008-05-02 13:45:11|rosetta@home|Computation for task 1who__BOINC_ABINITIO_IGNORE_THE_REST-S25-9-S3-3--1who_-_3092_1911_0 finished
2008-05-02 13:45:11|rosetta@home|Output file 1who__BOINC_ABINITIO_IGNORE_THE_REST-S25-9-S3-3--1who_-_3092_1911_0_0 for task 1who__BOINC_ABINITIO_IGNORE_THE_REST-S25-9-S3-3--1who_-_3092_1911_0 absent
2008-05-02 13:45:11|World Community Grid|Resuming task faah4030_NSC119913_chem3D_B_xmd05230_02_1 using faah version 603


Computer & BOINC data:
2008-05-02 09:56:38||Starting BOINC client version 5.10.20 for windows_intelx86
2008-05-02 09:56:38||log flags: task, file_xfer, sched_ops
2008-05-02 09:56:38||Libraries: libcurl/7.16.4 OpenSSL/0.9.8e zlib/1.2.3
2008-05-02 09:56:38||Data directory: C:ProgramBOINC
2008-05-02 09:56:38|SETI@home|Found app_info.xml; using anonymous platform
2008-05-02 09:56:38||Processor: 2 AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ [x86 Family 15 Model 43 Stepping 1]
2008-05-02 09:56:38||Processor features: fpu tsc pae nx sse sse2 3dnow mmx
2008-05-02 09:56:38||OS: Microsoft Windows XP: Professional Edition, Service Pack 2, (05.01.2600.00)
2008-05-02 09:56:38||Memory: 1023.23 MB physical, 3.40 GB virtual
2008-05-02 09:56:38||Disk: 76.32 GB total, 5.24 GB free
2008-05-02 09:56:38||Local time is UTC +1 hours
ID: 52834 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
david @ TPS

Send message
Joined: 26 Nov 06
Posts: 3
Credit: 881,762
RAC: 0
Message 52835 - Posted: 2 May 2008, 13:37:07 UTC

My Celeron with WinME seems happy with the mini's.

Dave
ID: 52835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 52839 - Posted: 2 May 2008, 20:46:06 UTC

My minirosetta 1.15 apps crash about 50-60% of the time on AMD 64's and
Intel conroe duals Intel Kentsfield quads and none are overclocked.

Typical error is # 107374.

ID: 52839 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 52841 - Posted: 3 May 2008, 6:43:41 UTC

https://boinc.bakerlab.org/rosetta/results.php?hostid=736555
Invalid

Task ID 159962434
Task ID 159962433
Task ID 159962424
Task ID 159318279
Task ID 159128030

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x766C42EB

WTF ?????
ID: 52841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Blacksun
Avatar

Send message
Joined: 2 May 07
Posts: 2
Credit: 1,284,699
RAC: 0
Message 52842 - Posted: 3 May 2008, 7:12:01 UTC

Client error

Task ID 160171745 and
Task ID 160164808

mfg Blacksun
ID: 52842 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Evan

Send message
Joined: 23 Dec 05
Posts: 268
Credit: 402,585
RAC: 0
Message 52844 - Posted: 3 May 2008, 8:35:52 UTC

Compute error
160204008
ID: 52844 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile (_KoDAk_)

Send message
Joined: 18 Jul 06
Posts: 109
Credit: 1,859,263
RAC: 0
Message 52849 - Posted: 3 May 2008, 17:27:20 UTC

Client error
https://boinc.bakerlab.org/rosetta/result.php?resultid=158424537
https://boinc.bakerlab.org/rosetta/result.php?resultid=158422144
https://boinc.bakerlab.org/rosetta/result.php?resultid=158418985

ID: 52849 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Venturini Dario[VENETO]

Send message
Joined: 25 May 07
Posts: 22
Credit: 245,028
RAC: 0
Message 52851 - Posted: 3 May 2008, 19:07:19 UTC

Not sure if this is a bug, but it's definitely weird:

Workunit 146096927

stderr out	

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<stderr_txt>
# cpu_run_time_pref: 14400
======================================================
DONE ::     1 starting structures  14353.2 cpu seconds
This process generated     14 decoys from      14 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish
# cpu_run_time_pref: 14400
======================================================
DONE ::     1 starting structures  15500.1 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================

BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...
called boinc_finish

</stderr_txt>
]]>


Validate state Valid
Claimed credit 80.519063844009
Granted credit 5.15238734890731
application version 1.15
ID: 52851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 52856 - Posted: 4 May 2008, 4:02:03 UTC

The mini rosetta 1.15 units just continually crash. Why keep queuing them to
distribute until the problems are sorted? People are wasting k watts of power
for nothing in the meantime...
I would think we would just line up 5.96 units until the bugs were sorted instead
of wasting thousands of watts of energy for nothing.

????

ID: 52856 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 52861 - Posted: 4 May 2008, 14:48:15 UTC - in response to Message 52856.  

The mini rosetta 1.15 units just continually crash. Why keep queuing them to
distribute until the problems are sorted? People are wasting k watts of power
for nothing in the meantime...
I would think we would just line up 5.96 units until the bugs were sorted instead
of wasting thousands of watts of energy for nothing.

????


I just abort them as soon as I see them but I'm sure that may be a problem for someone such as yourself with 14K RAC, sadly I only have 1 computer...

I would set Rosetta to "no new work" for the time being and come back later when this gets all sorted out.





ID: 52861 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
glaesum

Send message
Joined: 16 Oct 06
Posts: 21
Credit: 508,632
RAC: 0
Message 52863 - Posted: 4 May 2008, 17:01:41 UTC - in response to Message 52835.  

My Celeron with WinME seems happy with the mini's.

Dave


that's interesting as it must mean that win98 is only one tweak away from working...

meanwhile, as long I'm trashing under 50% of tasks sent I'll keep going on that pc (no probs on the XP m/ch at all).
ID: 52863 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 52870 - Posted: 5 May 2008, 13:51:35 UTC - in response to Message 52861.  

The mini rosetta 1.15 units just continually crash. Why keep queuing them to
distribute until the problems are sorted? People are wasting k watts of power
for nothing in the meantime...
I would think we would just line up 5.96 units until the bugs were sorted instead
of wasting thousands of watts of energy for nothing.

????


I just abort them as soon as I see them but I'm sure that may be a problem for someone such as yourself with 14K RAC, sadly I only have 1 computer...

I would set Rosetta to "no new work" for the time being and come back later when this gets all sorted out.

Yes, you are right. I should stop whining and do as you say or just run another
project in the meantime. Hopefully it will be sorted out soon.

/amgthis



ID: 52870 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : minirosetta v1.15 bug thread



©2024 University of Washington
https://www.bakerlab.org