Message boards : Number crunching : Minirosetta v1.40 bug thread
Previous · 1 . . . 10 · 11 · 12 · 13 · 14 · 15 · Next
Author | Message |
---|---|
Chu Send message Joined: 23 Feb 06 Posts: 120 Credit: 112,439 RAC: 0 |
Hi everyone, the fix to the NAN hbonding problem will be included in the next update (probably after this weekend) and we are still investigating the problem of lockfile and that some WUs cannot be suspended. Sorry for the trouble and inconvenience and we will try our best to avoid such problems from happening on such a large scale in future. Please continue to report other errors and problems that are not mentioned above. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
overlapping loop regions error https://boinc.bakerlab.org/rosetta/result.php?resultid=210296907 cc_0_6_nocst_homo_bench_foldcst_chunk_general_t286__olange_IGNORE_THE_REST_1FXWF_7_4848_20_1 died at 13 secs <core_client_version>6.2.19</core_client_version> <![CDATA[ <message> Incorrect function. (0x1) - exit code 1 (0x1) </message> <stderr_txt> recovering checkpoint of tag S_1FXWF_7_00000001 with id abrelax_rg_state Loops::add_loop error -- overlapping loop regions existing loop begin/end: 123/182 new loop begin/end: 182/202 ERROR:: Exit from: ....srcprotocolsloopsLoopClass.cc line: 233 called boinc_finish </stderr_txt> ]]> |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
message 4366 This message from James describes among other matters some of the problems that are being solved on RALPH Conan - those loop boundary errors were input errors by the person who submitted those workunits. The validate errors are the result of a new format added that's not yet supported by the BOINC server, and we'll have to update our server code to deal with it over the weekend. That slow workunit bug looks like something that we fixed several months ago, we've alerted the person who submitted those jobs and he's looking into them. |
rochester new york Send message Joined: 2 Jul 06 Posts: 2842 Credit: 2,020,043 RAC: 0 |
|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
Hi everyone, Thanks to you and David for the above comments. As I'm out of work (just with results to upload) I'll take the opportunity to delete all the lockfiles again, as previously advised, and reset the project. Seems to me like the perfect opportunity. I suggest others with similar problems to do the same. |
Evan Send message Joined: 23 Dec 05 Posts: 268 Credit: 402,585 RAC: 0 |
I'll take the opportunity to delete all the lockfiles again, as previously advised, and reset the project. Seems to me like the perfect opportunity. Make sure you upload your results before you reset, or you will lose everything. |
Alec Rosa Send message Joined: 11 Nov 08 Posts: 18 Credit: 2,635 RAC: 0 |
So, once again, the lock file thingy! https://boinc.bakerlab.org/rosetta/result.php?resultid=211319613 |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
I'll take the opportunity to delete all the lockfiles again, as previously advised, and reset the project. Seems to me like the perfect opportunity. Good point. I realised that just in time. I've set Boinc Manager not to get new WUs just yet and waiting for the upload to go through successfully before I reset. Just noticed only 43k successes in the last 24hours. Some are obviously going through, but I don't know if there's a bottleneck or a problem receiving them on the Rosetta side. Nothing's going through for me yet. |
Mike Tyka Send message Joined: 20 Oct 05 Posts: 96 Credit: 2,190 RAC: 0 |
*All* the lock files ? Where do they accumulate ? Do they accumulate after avery job ? Or only after failed ones ? This might be a leading thread to solving this silly lockfile problem! Cheers, Mike http://beautifulproteins.blogspot.com/ http://www.miketyka.com/ |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
As I'm out of work (just with results to upload) I'll take the opportunity to delete all the lockfiles again, as previously advised, and reset the project. Seems to me like the perfect opportunity. In fact I spoke too soon before checking. The last time this was mentioned there were numerous 0-byte boinc_lockfiles in C:ProgramDataBOINCslots (and folders 1, 2, 3, 4 etc) - under Vista64 btw. This time the slots folder was empty, so no lockfiles, even though I got many WUs with too many errors after repeated "Can't acquire lockfile" messages. I'd been away from home 11/27 to 11/30 See my results Also, note this Validate error here Server state Over I don't think I've noticed this particular one before. |
Alec Rosa Send message Joined: 11 Nov 08 Posts: 18 Credit: 2,635 RAC: 0 |
I'll take the opportunity to delete all the lockfiles again, as previously advised, and reset the project. Seems to me like the perfect opportunity. That was wise. What I've been doing is to set Boinc Manager not to get new tasks too. I then click 'Update', so that the client communicates with the Roseta server(s). Finally, to avoid doing something wrong, I boot the computer. That makes the Rosetta slots disappear (with them the lock file(s). Like magic. Of course, when I allow a new WU to be downloaded, the process is fraked up. Again. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,284,221 RAC: 1,121 |
Just noticed only 43k successes in the last 24hours. Some are obviously going through, but I don't know if there's a bottleneck or a problem receiving them on the Rosetta side. Nothing's going through for me yet. The uploads server hasn't caught up with uploading all the results from all the workunits that completed during the recent fileserver problem. If you have enough free disk space to hold the results, and have told BOINC it can use enough of it that Rosetta@home's share will hold a few day's worth of results, all you should really have to do is wait for the uploads server to catch up. |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
Hi . I've got two more of these, they don't want to stop when preempted. 1tig__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1tig_-_4845_1488_0 1c9oA_BOINC_ABRELAX_SPLIT_SPLIT_IGNORE_THE_REST-S25-9-S3-3--1c9oA-_4678_404_1 pete. |
ramostol Send message Joined: 6 Feb 07 Posts: 64 Credit: 584,052 RAC: 0 |
My (MacBook) "abinitio_nohomfrag_70_A_1unpA_4466"-tasks show a failure rate of three out of four, all failures terminate after some hours' computing with finishing file absent. Cannot link to a result, as I am unable to report to the project at the moment. |
[B^S] HenryHunter Send message Joined: 28 May 08 Posts: 1 Credit: 72,915 RAC: 0 |
Please report any bugs in this version here. 02.12.2008 04:58:10|rosetta@home|Message from server: Server error: can't attach shared memory any solution? CU |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Please report any bugs in this version here. see here if things do not resolve themselves automatically. The team created a new server for task processing as the main server was getting overloaded. The address has changed, but should correct automatically. if not see the link. |
mikylinux Send message Joined: 25 Jul 07 Posts: 3 Credit: 73,155 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=211014218 https://boinc.bakerlab.org/rosetta/result.php?resultid=208363538 https://boinc.bakerlab.org/rosetta/result.php?resultid=208319555 https://boinc.bakerlab.org/rosetta/result.php?resultid=206052369 And workunits: https://boinc.bakerlab.org/rosetta/result.php?resultid=209971190 and https://boinc.bakerlab.org/rosetta/result.php?resultid=210257656 are working by 37 and 19 hours.... I wait a bit and stop the tasks.... |
upstatelabs Send message Joined: 22 Jun 06 Posts: 10 Credit: 516,767 RAC: 0 |
I have a pair of errors to report: 12/1/2008 11:07:42 PM|rosetta@home|Task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1494_0 exited with zero status but no 'finished' file 12/1/2008 11:07:42 PM|rosetta@home|If this happens repeatedly you may need to reset the project. 12/1/2008 11:07:43 PM|rosetta@home|Restarting task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1494_0 using minirosetta version 140 12/1/2008 11:08:24 PM|rosetta@home|Task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1494_0 exited with zero status but no 'finished' file 12/1/2008 11:08:24 PM|rosetta@home|If this happens repeatedly you may need to reset the project. 12/1/2008 11:08:24 PM|rosetta@home|Restarting task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1494_0 using minirosetta version 140 12/1/2008 11:09:05 PM|rosetta@home|Task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1494_0 exited with zero status but no 'finished' file Above repeating ~50 times. And this: 12/2/2008 5:19:27 AM|rosetta@home|Task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1471_0 exited with zero status but no 'finished' file 12/2/2008 5:19:27 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 12/2/2008 5:19:27 AM|rosetta@home|Restarting task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1471_0 using minirosetta version 140 12/2/2008 5:20:08 AM|rosetta@home|Task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1471_0 exited with zero status but no 'finished' file 12/2/2008 5:20:08 AM|rosetta@home|If this happens repeatedly you may need to reset the project. 12/2/2008 5:20:08 AM|rosetta@home|Restarting task 1vie__BOINC_ABRELAX_SPLIT_SPLIT2_NOHATR_IGNORE_THE_REST-S25-9-S3-3--1vie_-_4845_1471_0 using minirosetta version 140 Again repeating many times. Could someone look into this? Thanks! |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I have a pair of errors to report: could you post the links either in plain text or in a link so people can look directly at the files your talking about? because you have two system on rosetta it would take quite a long time to isolate the tasks you are talking about. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2127 Credit: 41,266,340 RAC: 8,573 |
As I'm out of work (just with results to upload) I'll take the opportunity to delete all the lockfiles again, as previously advised, and reset the project. Seems to me like the perfect opportunity. Final note, because I'm now officially depressed: After uploading all previous results, changing server urls, resetting the project, dl'ing new WUs, my first 4 MiniRosetta WUs all crashed out in the usual way between 10 and 100 minutes. Can't acquire lockfile. I now have 7 folders inside the slots folder (named 0, 1, 2, 3, 4, 5 & 6) four of which contain a 0-byte boinc_lockfile, while only 2 mini-rosetta WUs are currently running. I guess I should've let those WUs abort with the usual Computation Error so they could report properly, but I was that p'd off I aborted them to let some infallible Rosetta 5.98 WUs run. |
Message boards :
Number crunching :
Minirosetta v1.40 bug thread
©2024 University of Washington
https://www.bakerlab.org