Problems with Rosetta version 5.40

Message boards : Number crunching : Problems with Rosetta version 5.40

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 31099 - Posted: 14 Nov 2006, 1:04:45 UTC

I just had a bunch of computational errors:

47017244,
47015664,
47011260,
46994328,

Validation Error also:

46992660



ID: 31099 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Stuart

Send message
Joined: 17 Oct 06
Posts: 1
Credit: 76,349
RAC: 0
Message 31100 - Posted: 14 Nov 2006, 1:09:17 UTC

I am getting errors too, when I have never had any before :(
Anyone know if its the WU or what?
ID: 31100 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 31108 - Posted: 14 Nov 2006, 5:06:35 UTC

So... many... ERRORS!!!!
ID: 31108 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 31109 - Posted: 14 Nov 2006, 5:09:20 UTC

We found the problem and it was unfortunate that the new updated application 5.40 has some backward compatibility issues with some of the existing docking jobs in the queue which work just fine with 5.36. After finding out the conflict, we have removed most of the conflicting jobs in the queue to minimize the damage from this problem and hope that there are not too many jobs like this being sent out together with the new 5.40 application. Please accept my apology for not being careful enough to check on this issue and this has given us a very important lesson on how to sync Rosetta with Ralph so that this kind of problem will no longer happen in the future.
ID: 31109 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
MattDavis
Avatar

Send message
Joined: 22 Sep 05
Posts: 206
Credit: 1,377,748
RAC: 0
Message 31110 - Posted: 14 Nov 2006, 5:55:31 UTC

We forgive you <3
ID: 31110 · Rating: -1 · rate: Rate + / Rate - Report as offensive    Reply Quote
Team TMR

Send message
Joined: 2 Nov 05
Posts: 21
Credit: 1,583,679
RAC: 0
Message 31118 - Posted: 14 Nov 2006, 8:28:42 UTC

I woke up this morning to find that over 20 WUs failed overnight. It's good to see the cause has already been found though.
ID: 31118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 31121 - Posted: 14 Nov 2006, 9:05:11 UTC

Can you not use both versions at the same time like I have seen some other projects do (Leiden Classical for example). Use 5.36 for Docking and 5.40 for whatever works at the moment until a 5.41 comes out with fixes.
Team mauisun.org
ID: 31121 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Evil-Dragon
Avatar

Send message
Joined: 4 Mar 06
Posts: 1
Credit: 67,507
RAC: 0
Message 31123 - Posted: 14 Nov 2006, 9:36:35 UTC

Most of my WU's errored out with "Error 1 (0x1)"

Will keep an eye on it now that the broken jobs have been removed.
ID: 31123 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Oldman

Send message
Joined: 17 Oct 06
Posts: 4
Credit: 1,706,631
RAC: 0
Message 31126 - Posted: 14 Nov 2006, 11:03:10 UTC

11/14/2006 12:26:21 AM|rosetta@home|Unrecoverable error for result DOC_1MLC_R061030_st_model_08_1383_1166_1 (Incorrect function. (0x1) - exit code 1 (0x1))

This was the error code I got last night and the WU was about 1/3 complete.
ID: 31126 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
alexpoon

Send message
Joined: 28 Dec 05
Posts: 6
Credit: 1,846
RAC: 0
Message 31128 - Posted: 14 Nov 2006, 11:48:26 UTC
Last modified: 14 Nov 2006, 11:50:02 UTC

11/14/2006 11:47:48|rosetta@home|Unrecoverable error for result DOC_R061113_2SIC_p2_fa_relax_from_native_unbound_1392_260_0 ( - exit code -1073741819 (0xc0000005))

ID: 31128 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayburner

Send message
Joined: 4 Oct 05
Posts: 32
Credit: 16,518,823
RAC: 0
Message 31129 - Posted: 14 Nov 2006, 12:46:21 UTC

Two validation erorrs:

https://boinc.bakerlab.org/rosetta/result.php?resultid=46997671

https://boinc.bakerlab.org/rosetta/result.php?resultid=46989353

These WUs were canceled by Rosetta Admins. Is this because of the issue described below? However they were completed successfully on my machine (see links above).

Best Regards

Rayburner


ID: 31129 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 31138 - Posted: 14 Nov 2006, 18:58:22 UTC

Ray, I believe they cancelled the WUs because they had a higher error rate, not a 100% error rate. So some successful results would be expected. But they cancelled the WUs to avoid further user problems, until they can address them in a new version, and testing on Ralph.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 31138 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rob147147

Send message
Joined: 5 Jan 06
Posts: 4
Credit: 115,444
RAC: 0
Message 31139 - Posted: 14 Nov 2006, 19:02:34 UTC

Same as the people below it seems

14/11/2006 18:28:34|rosetta@home|Unrecoverable error for result DOC_1MLC_R061030_st_model_04_1383_1473_0 (Incorrect function. (0x1) - exit code 1 (0x1))

ID: 31139 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 31142 - Posted: 14 Nov 2006, 19:42:55 UTC
Last modified: 14 Nov 2006, 19:49:00 UTC

IGNORE THIS: this is the last WU for 5.36

url=https://boinc.bakerlab.org/rosetta/result.php?resultid=46960837

DOC_2PTC_R061030_st_model_06_1388_690_0

<core_client_version>5.4.9</core_client_version>
<stderr_txt>
# random seed: 3021911
# cpu_run_time_pref: 28800
WARNING! error deleting file .hf2PTC.out
WARNING! error deleting file .hf2PTC.out.bonds
WARNING! error deleting file .hf2PTC.out.rot_templates
======================================================
DONE :: 1 starting structures built 48 (nstruct) times
This process generated 48 decoys from 48 attempts
0 starting pdbs were skipped
======================================================


BOINC :: Watchdog shutting down...
BOINC :: BOINC support services shutting down...

</stderr_txt>
ID: 31142 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 31145 - Posted: 14 Nov 2006, 20:20:39 UTC

I had one of these DOC_2SNI error after 1hr 10min.

ID: 31145 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile scsimodo

Send message
Joined: 17 Sep 05
Posts: 93
Credit: 946,359
RAC: 0
Message 31146 - Posted: 14 Nov 2006, 21:15:11 UTC
Last modified: 14 Nov 2006, 21:17:36 UTC

Something's seriously broken, don't know if it's the client or the WUs. Many DOC-WUs are dying after approx. 1 hour, and those new fibril-WUs don't even think about surviving the first step! Have a look at my results

I turned on the graphics and immediately after switching from "searching backbone" to "search all atoms" the graphics changed and the WU died. Turning on the graphics after the switch to "search all atoms" seem to work fine. Will try this again and keep an eye on it.

[EDIT] Nope, not reproducable! Fourth fibril-WU runs fine even with graphics on[/EDIT]

ID: 31146 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Keith Akins

Send message
Joined: 22 Oct 05
Posts: 176
Credit: 71,779
RAC: 0
Message 31147 - Posted: 14 Nov 2006, 21:19:46 UTC
Last modified: 14 Nov 2006, 21:21:43 UTC

Same here except no graphics running:

https://boinc.bakerlab.org/rosetta/result.php?resultid=47002079

DOC_1MEL_R061030_st_model_08_1382_1261_0

<core_client_version>5.4.9</core_client_version>
<message>
Incorrect function. (0x1) - exit code 1 (0x1)
</message>
<stderr_txt>
# random seed: 3136555
# cpu_run_time_pref: 28800
ERROR:: Exit at: .docking.cc line:3479

</stderr_txt>
ID: 31147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 31148 - Posted: 14 Nov 2006, 21:28:21 UTC
Last modified: 14 Nov 2006, 21:29:47 UTC

If an exit code -1 (or ERROR:: Exit at: .docking.cc line:3479) ccurs, that is the result of the conflict between the existing docking jobs with the new 5.40 application. The failure rate under this condition is not 100%, but pretty high. So yesterday when we found the problem, we had to cancel all the jobs in that batch since most of them were still in the queue. Sorry again for causing this mess.

Looking into your fibril runs right now. Those WUs should be compatible with the new application 5.40 and we have so far had many WUs returned successfully. Has anybody seen the same type of errors on the fibril WUs also?


ID: 31148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Chu

Send message
Joined: 23 Feb 06
Posts: 120
Credit: 112,439
RAC: 0
Message 31149 - Posted: 14 Nov 2006, 21:40:20 UTC - in response to Message 31121.  

the mistake we made is not to keep backward compatibility for the new application. So when 5.40 was updated, it conflicted with some of the docking jobs which have been submitted into the queue earlier. This is an important lesson for us and we will make better coordination in future so that this type of mistake will not be repeated.

The incompatibility can be fixed easily with a new command line flag with 5.40, but all the old jobs in the queue do not have such a flag as 5.36 does not require it.

I do not know we can run multiple versions at the same time and I will suggest it to our BOINC team manager.

Can you not use both versions at the same time like I have seen some other projects do (Leiden Classical for example). Use 5.36 for Docking and 5.40 for whatever works at the moment until a 5.41 comes out with fixes.


ID: 31149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile scsimodo

Send message
Joined: 17 Sep 05
Posts: 93
Credit: 946,359
RAC: 0
Message 31151 - Posted: 14 Nov 2006, 21:54:11 UTC - in response to Message 31148.  
Last modified: 14 Nov 2006, 22:00:27 UTC


Looking into your fibril runs right now. Those WUs should be compatible with the new application 5.40 and we have so far had many WUs returned successfully. Has anybody seen the same type of errors on the fibril WUs also?

Just managed it to deliver 1 successful fibril-WU, all others died (5). Bad ratio, if you ask me :) Still one in my queue, let's see how this one behave. I'll let this one run without touching the graphic...





ID: 31151 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Problems with Rosetta version 5.40



©2024 University of Washington
https://www.bakerlab.org