Message boards : Number crunching : Problems with Rosetta version 5.40
Author | Message |
---|---|
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
|
Stuart Send message Joined: 17 Oct 06 Posts: 1 Credit: 76,349 RAC: 0 |
I am getting errors too, when I have never had any before :( Anyone know if its the WU or what? |
MattDavis Send message Joined: 22 Sep 05 Posts: 206 Credit: 1,377,748 RAC: 0 |
So... many... ERRORS!!!! |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
We found the problem and it was unfortunate that the new updated application 5.40 has some backward compatibility issues with some of the existing docking jobs in the queue which work just fine with 5.36. After finding out the conflict, we have removed most of the conflicting jobs in the queue to minimize the damage from this problem and hope that there are not too many jobs like this being sent out together with the new 5.40 application. Please accept my apology for not being careful enough to check on this issue and this has given us a very important lesson on how to sync Rosetta with Ralph so that this kind of problem will no longer happen in the future. |
MattDavis Send message Joined: 22 Sep 05 Posts: 206 Credit: 1,377,748 RAC: 0 |
We forgive you <3 |
Team TMR Send message Joined: 2 Nov 05 Posts: 21 Credit: 1,583,679 RAC: 0 |
I woke up this morning to find that over 20 WUs failed overnight. It's good to see the cause has already been found though. |
FluffyChicken Send message Joined: 1 Nov 05 Posts: 1260 Credit: 369,635 RAC: 0 |
Can you not use both versions at the same time like I have seen some other projects do (Leiden Classical for example). Use 5.36 for Docking and 5.40 for whatever works at the moment until a 5.41 comes out with fixes. Team mauisun.org |
Evil-Dragon Send message Joined: 4 Mar 06 Posts: 1 Credit: 67,507 RAC: 0 |
|
Oldman Send message Joined: 17 Oct 06 Posts: 4 Credit: 1,706,631 RAC: 0 |
11/14/2006 12:26:21 AM|rosetta@home|Unrecoverable error for result DOC_1MLC_R061030_st_model_08_1383_1166_1 (Incorrect function. (0x1) - exit code 1 (0x1)) This was the error code I got last night and the WU was about 1/3 complete. |
alexpoon Send message Joined: 28 Dec 05 Posts: 6 Credit: 1,846 RAC: 0 |
11/14/2006 11:47:48|rosetta@home|Unrecoverable error for result DOC_R061113_2SIC_p2_fa_relax_from_native_unbound_1392_260_0 ( - exit code -1073741819 (0xc0000005)) |
Rayburner Send message Joined: 4 Oct 05 Posts: 32 Credit: 16,518,823 RAC: 0 |
Two validation erorrs: https://boinc.bakerlab.org/rosetta/result.php?resultid=46997671 https://boinc.bakerlab.org/rosetta/result.php?resultid=46989353 These WUs were canceled by Rosetta Admins. Is this because of the issue described below? However they were completed successfully on my machine (see links above). Best Regards Rayburner |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Ray, I believe they cancelled the WUs because they had a higher error rate, not a 100% error rate. So some successful results would be expected. But they cancelled the WUs to avoid further user problems, until they can address them in a new version, and testing on Ralph. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
rob147147 Send message Joined: 5 Jan 06 Posts: 4 Credit: 115,444 RAC: 0 |
Same as the people below it seems 14/11/2006 18:28:34|rosetta@home|Unrecoverable error for result DOC_1MLC_R061030_st_model_04_1383_1473_0 (Incorrect function. (0x1) - exit code 1 (0x1)) |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
IGNORE THIS: this is the last WU for 5.36 url=https://boinc.bakerlab.org/rosetta/result.php?resultid=46960837 DOC_2PTC_R061030_st_model_06_1388_690_0 <core_client_version>5.4.9</core_client_version> <stderr_txt> # random seed: 3021911 # cpu_run_time_pref: 28800 WARNING! error deleting file .hf2PTC.out WARNING! error deleting file .hf2PTC.out.bonds WARNING! error deleting file .hf2PTC.out.rot_templates ====================================================== DONE :: 1 starting structures built 48 (nstruct) times This process generated 48 decoys from 48 attempts 0 starting pdbs were skipped ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... </stderr_txt> |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
I had one of these DOC_2SNI error after 1hr 10min. |
scsimodo Send message Joined: 17 Sep 05 Posts: 93 Credit: 946,359 RAC: 0 |
Something's seriously broken, don't know if it's the client or the WUs. Many DOC-WUs are dying after approx. 1 hour, and those new fibril-WUs don't even think about surviving the first step! Have a look at my results I turned on the graphics and immediately after switching from "searching backbone" to "search all atoms" the graphics changed and the WU died. Turning on the graphics after the switch to "search all atoms" seem to work fine. Will try this again and keep an eye on it. [EDIT] Nope, not reproducable! Fourth fibril-WU runs fine even with graphics on[/EDIT] |
Keith Akins Send message Joined: 22 Oct 05 Posts: 176 Credit: 71,779 RAC: 0 |
Same here except no graphics running: https://boinc.bakerlab.org/rosetta/result.php?resultid=47002079 DOC_1MEL_R061030_st_model_08_1382_1261_0 <core_client_version>5.4.9</core_client_version> <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> # random seed: 3136555 # cpu_run_time_pref: 28800 ERROR:: Exit at: .docking.cc line:3479 </stderr_txt> |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
If an exit code -1 (or ERROR:: Exit at: .docking.cc line:3479) ccurs, that is the result of the conflict between the existing docking jobs with the new 5.40 application. The failure rate under this condition is not 100%, but pretty high. So yesterday when we found the problem, we had to cancel all the jobs in that batch since most of them were still in the queue. Sorry again for causing this mess. Looking into your fibril runs right now. Those WUs should be compatible with the new application 5.40 and we have so far had many WUs returned successfully. Has anybody seen the same type of errors on the fibril WUs also? |
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
the mistake we made is not to keep backward compatibility for the new application. So when 5.40 was updated, it conflicted with some of the docking jobs which have been submitted into the queue earlier. This is an important lesson for us and we will make better coordination in future so that this type of mistake will not be repeated. The incompatibility can be fixed easily with a new command line flag with 5.40, but all the old jobs in the queue do not have such a flag as 5.36 does not require it. I do not know we can run multiple versions at the same time and I will suggest it to our BOINC team manager. Can you not use both versions at the same time like I have seen some other projects do (Leiden Classical for example). Use 5.36 for Docking and 5.40 for whatever works at the moment until a 5.41 comes out with fixes. |
scsimodo Send message Joined: 17 Sep 05 Posts: 93 Credit: 946,359 RAC: 0 |
Just managed it to deliver 1 successful fibril-WU, all others died (5). Bad ratio, if you ask me :) Still one in my queue, let's see how this one behave. I'll let this one run without touching the graphic... |
Message boards :
Number crunching :
Problems with Rosetta version 5.40
©2024 University of Washington
https://www.bakerlab.org