Message boards : Number crunching : Compute and Client Error on a whole lot of work units with exit status -185 (0xffffff47)
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Server state Over Outcome Client error Client state Compute error Exit status -185 (0xffffff47) <core_client_version>5.10.28</core_client_version> <![CDATA[ <message> Can't link input file </message> ]]> This happened with stuff of 5.81,5.82 and 5.85 I lost about 19 tasks to this error and they all contain the same message in the log. This is just one sample. 11/23/2007 4:35:55 PM|rosetta@home|Computation for task w007_1_MolecularRep_1_w007_1_ffas03-1-2b0v_StructuralGenomics_a_2325_7744_0 finished 11/23/2007 4:35:55 PM|rosetta@home|Output file w007_1_MolecularRep_1_w007_1_ffas03-1-2b0v_StructuralGenomics_a_2325_7744_0_0 for task w007_1_MolecularRep_1_w007_1_ffas03-1-2b0v_StructuralGenomics_a_2325_7744_0 absent 11/23/2007 4:39:55 PM|rosetta@home|Starting 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0 11/23/2007 4:43:56 PM|rosetta@home|[error] Can't link projects/boinc.bakerlab.org_rosetta/rosetta_beta_5.85_windows_intelx86.exe to slots/1/rosetta_beta_5.85_windows_intelx86.exe Later one task shows this: 11/23/2007 4:43:56 PM|rosetta@home|Computation for task 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0 finished 11/23/2007 4:43:56 PM|rosetta@home|Output file 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0_0 for task 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0 absent 11/23/2007 4:43:58 PM|rosetta@home|Finished upload of 4ubpA_FRAGPRED_ABRELAX_SAVE_ALL_OUT-4ubpA-__2309_17581_0_0 Later it shows this after the firewal found something it did not like and I had to allow the Boinc through the firewall. But this seems odd as it was ok when I restarted Boinc after the install and put the firewall on auto learn. 11/23/2007 4:43:56 PM|rosetta@home|Computation for task 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0 finished 11/23/2007 4:43:56 PM|rosetta@home|Output file 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0_0 for task 1uis__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1uis_-crystal_foldanddock__2318_61717_0 absent 11/23/2007 4:43:58 PM|rosetta@home|Finished upload of 4ubpA_FRAGPRED_ABRELAX_SAVE_ALL_OUT-4ubpA-__2309_17581_0_0 11/24/2007 3:26:21 PM|rosetta@home|Fetching scheduler list 11/24/2007 3:26:26 PM|rosetta@home|Master file download succeeded 11/24/2007 3:26:32 PM|rosetta@home|Sending scheduler request: Requested by user. Requesting 669600 seconds of work, reporting 45 completed tasks 11/24/2007 3:26:52 PM|rosetta@home|Scheduler request succeeded: got 36 new tasks Now it is running ok and shows this message: 11/24/2007 4:59:59 PM|rosetta@home|Sending scheduler request: To fetch work. Requesting 19 seconds of work, reporting 0 completed tasks 11/24/2007 5:00:04 PM|rosetta@home|Scheduler request succeeded: got 1 new tasks 11/24/2007 6:29:09 PM|rosetta@home|Sending scheduler request: To fetch work. Requesting 138 seconds of work, reporting 0 completed tasks 11/24/2007 6:29:14 PM|rosetta@home|Scheduler request succeeded: got 1 new tasks The firewall now knows all the functions of Boinc. Is the error related to the firewall or is it something to do with Boinc and all the error messages I saw earlier in other threads about to much information or wrong filenames on the server? I took a hit of over 20 credits due to these errors. That makes me very mad! It will take a long time to build those back and get back to climbing to where I was before I had some other problems. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Greg, it sounds as though your firewall saw Rosetta trying to use the internet (not the usual BOINC). This occurs when failures are captured and it is trying to report them back. It needs the program symbol tables and etc. and so it can't be done via BOINC the way all normal communication is done. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ok, that fits, because the firewall also 'learned' rossetaa 5.85, but not the 5.82 stuff I have queued. So if I am reading this right, if rosie gets a error then the version tried to access the internet and not the boinc manager? So if 5.82 gets any errors then the firewall will block it because it has not learned that version yet? But what I don't get is this, did I lose credit the normal way because all 19 or so work units errored out or did I lose credit beacuse rosie was not able to connect to the firewall? Some of these were the crystal fold tasks in 5.81 and a whole lot of others in 5.82 and 5.85 take for instance Task ID 122381870 Name 1i8f__BOINC_SYMM_FOLD_AND_DOCK_RELAX-1i8f_-crystal_foldanddock__2318_54614_0 Workunit 111250295 it got client and compute error and all the rest of the stuff i posted from the stderr text. in addition to the error message from the boinc manager. also: Task ID 122177508 Name 2a43__BOINC_RHO_OMEGA1_OMEGA2_HALFBACKBONEHB_RNA_ABINITIO-2a43_-_2322_27_0 Workunit 111064109 same problem. But to lose nearly half of the work units I crunched really irrates me. So someone should review my load of failed tasks, some of which i got 2nd hand due to the same error. The majority were 5.81 Crystal Fold with a mix of others to go with it. Final question if the firewall learns all the versions of rosie and it already knows the boinc manager, then what will happen if there is another error with a task? Am I going to get the same client and compute errors or will it figure out what it needs to do to correct the issue and complete it computing and report as a success? Greg, it sounds as though your firewall saw Rosetta trying to use the internet (not the usual BOINC). This occurs when failures are captured and it is trying to report them back. It needs the program symbol tables and etc. and so it can't be done via BOINC the way all normal communication is done. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
ok, that fits, because the firewall also 'learned' rossetaa 5.85, but not the 5.82 stuff I have queued. So if I am reading this right, if rosie gets a error then the version tried to access the internet and not the boinc manager? So if 5.82 gets any errors then the firewall will block it because it has not learned that version yet? Exactly. But what I don't get is this, did I lose credit the normal way because all 19 or so work units errored out or did I lose credit beacuse rosie was not able to connect to the firewall? The tasks errored out, which is what caused the request for Rosetta to access the internet from your firewall. Any credit lost, issued, granted by the nightly script or whatever is due to the task error, not the inability to send in all the diagnostics. Rosetta Moderator: Mod.Sense |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
ok, that fits, because the firewall also 'learned' rossetaa 5.85, but not the 5.82 stuff I have queued. So if I am reading this right, if rosie gets a error then the version tried to access the internet and not the boinc manager? So if 5.82 gets any errors then the firewall will block it because it has not learned that version yet? thanks for clearing that up, as for all the errors, is it worth posting or just leaving them be? its around 19 or so different tasks with 3 different versions mostly that troublesome crystal_fold task |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
There have been a few different issues around lately. Some .out files getting overly full, some missing, and some large swap space and/or memory consumed during the run. If you issues are inline with those, I would say it's already been posted. Otherwise, the Problems with... thread for the release of the task would be the place to post. Rosetta Moderator: Mod.Sense |
Message boards :
Number crunching :
Compute and Client Error on a whole lot of work units with exit status -185 (0xffffff47)
©2024 University of Washington
https://www.bakerlab.org