Why does this still happen.

Author	Message
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 56450 - Posted: 23 Oct 2008, 21:58:52 UTC I guess i'm not the only one this happens to, why can't the tasks be canceled by the project if they haven't been started. Other projects do this it saves wasting time. https://boinc.bakerlab.org/rosetta/workunit.php?wuid=181582962 ===================================================== DONE :: 1 starting structures 21135.6 cpu seconds This process generated 42 decoys from 42 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> ]]> Validate state Workunit error - check skipped Claimed credit 148.221875622837 Granted credit 0 application version 1.34 pete ID: 56450 · Rating: 0 · rate: / Reply Quote

Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0	Message 56454 - Posted: 24 Oct 2008, 3:16:39 UTC Two words Peter: BOINC Bug http://boinc.berkeley.edu/trac/ticket/276 Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ ID: 56454 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 56469 - Posted: 25 Oct 2008, 5:21:12 UTC My thanks to whoever fixed this one up. pete. ID: 56469 · Rating: 0 · rate: / Reply Quote

FoldingSolutions Send message Joined: 2 Apr 06 Posts: 129 Credit: 3,506,690 RAC: 0	Message 56515 - Posted: 29 Oct 2008, 19:53:52 UTC Task ID - 202592794 Work unit ID - 185058479 Sent - 27 Oct 2008 20:17:33 UTC Time reported or deadline - 29 Oct 2008 19:33:36 UTC Server state - Over Outcome - Client error Client state - Compute error CPU time (sec) - 70,590.59 Claimed credit - 329.12 Granted credit - --- Shouldn't there be some kind of credit compensation for 20 hours of wasted CPU time?? ID: 56515 · Rating: 0 · rate: / Reply Quote

Greg_BE Send message Joined: 30 May 06 Posts: 5770 Credit: 6,139,760 RAC: 0	Message 56674 - Posted: 3 Nov 2008, 18:20:24 UTC - in response to Message 56515. Task ID - 202592794 Work unit ID - 185058479 Sent - 27 Oct 2008 20:17:33 UTC Time reported or deadline - 29 Oct 2008 19:33:36 UTC Server state - Over Outcome - Client error Client state - Compute error CPU time (sec) - 70,590.59 Claimed credit - 329.12 Granted credit - --- Shouldn't there be some kind of credit compensation for 20 hours of wasted CPU time?? you should post this info in the 1.34 thread in case the team didn't see it. be sure to tell them you had a exit code 255 as that will help them narrow down the issue. ID: 56674 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 60301 - Posted: 24 Mar 2009, 20:52:02 UTC Hi. Looks like this is not fixed yet, wasted 6hrs on it. Why are tasks getting sent out when others are still not past their deadlines. Could have been doing something else. Workunit error - check skipped Over_Success_Done_21,377.59_154.48_0.00 https://boinc.bakerlab.org/rosetta/workunit.php?wuid=214522002 pete. ID: 60301 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 60302 - Posted: 24 Mar 2009, 20:55:26 UTC Last modified: 24 Mar 2009, 20:56:35 UTC The timestamps are a bit misleading. The deadline is always 10 days. If you look at it again, the 10 day deadline was indeed crossed and this caused the task to be reissued. Then, after that, a result came in. Rosetta Moderator: Mod.Sense ID: 60302 · Rating: 0 · rate: / Reply Quote

P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0	Message 60306 - Posted: 25 Mar 2009, 4:25:47 UTC Hi there Mod Sense. Well that dosen't make me feel all warm & fuzzy. If they can't be returning the work on time then i see that as a waste of time. I'm just going to have to abort all that are just sent out because there overdue then, i don't like wasting the time. More work for me but so be it. I'm guessing my result for that one won't be used at all, it might be a better answer!. pete. ID: 60306 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1899 Credit: 12,884,078 RAC: 216	Message 60309 - Posted: 25 Mar 2009, 9:16:45 UTC - in response to Message 60306. Hi there Mod Sense. Well that dosen't make me feel all warm & fuzzy. If they can't be returning the work on time then i see that as a waste of time. I'm just going to have to abort all that are just sent out because there overdue then, i don't like wasting the time. More work for me but so be it. I'm guessing my result for that one won't be used at all, it might be a better answer!. pete. It just means you need to lower the cache for this Project. If you have special reasons why you can't do that then as you suggested this may not be the Project for you. A 10 day cache is pretty long and unless you have a very slow pc would result in a ton of workunits. My computer is taking about 2 to 2 1/2 hours per workunit, roughly. That is say 9 units per day times 10 days is 90 workunits! Just for this Project alone. I just looked at your 2 pc's and both seem to have a very short cache already. One pc has one workunit and the other has 2 workunits that haven't been returned yet. I wonder if Boinc is having problems? It should be able to tell that a unit is near its deadline and switch to high priority crunching for that unit, so that it gets returned on time. Do you crunch 24/7? ID: 60309 · Rating: 0 · rate: / Reply Quote

Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0	Message 60316 - Posted: 25 Mar 2009, 13:30:09 UTC mikey, it wasn't peter that was late with the results, so a smaller cache doesn't help what he's talking about. You could also increase your Rosetta preference for runtime and have a cleaner task list if you are crunching all the time anyway. The default runtime is 3 hours, but you can set it as high as 24hrs. If you make changes to target runtime, make them gradually. BOINC will still request enough work units for the time based on the old preference before it sees they begin running longer. So, best to make changes when you are requesting only a small cache, and to make changes of just a notch or two per day. peter, I hear ya. I would just point out that it is not every time a task is late that results in a credit problem. It only seems to be if one fails, a second is late and then a third is issued and then the second is reported back. So what I'm saying is, don't just go by the last digit on the WU name to judge. Also keep in mind that chances are that the late result will not come back in time to conflict with you. Although with a larger cache, you would have time to go look and see if it came in. Perhaps you could add to the trac item, and post about this issue on other project boards as well. It's gone unfixed for a long time. Rosetta Moderator: Mod.Sense ID: 60316 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1899 Credit: 12,884,078 RAC: 216	Message 60360 - Posted: 28 Mar 2009, 13:13:24 UTC - in response to Message 60316. mikey, it wasn't peter that was late with the results, so a smaller cache doesn't help what he's talking about. Whoops, sorry peter, I hear ya. I would just point out that it is not every time a task is late that results in a credit problem. It only seems to be if one fails, a second is late and then a third is issued and then the second is reported back. So what I'm saying is, don't just go by the last digit on the WU name to judge. Also keep in mind that chances are that the late result will not come back in time to conflict with you. Although with a larger cache, you would have time to go look and see if it came in. Perhaps you could add to the trac item, and post about this issue on other project boards as well. It's gone unfixed for a long time. This is a long time Boinc thing, if I understand this time...if person A gets a unit but doesn't return it before the deadline the project reissues the unit, sending it to person B. But then if Person A returns the unit before person B, then person A does get credit and person B gets the "too many results" error message. Dr. A, and others, knew about this long, long ago and decided it was not a big deal since it only happened rarely. The way I see solving the problem is to not allow person A to return the unit once it has been reissued, giving them an error message if they do try to return it. If person B returns the unit before person A, then person A does get an error message. That is why at Seti they toyed with the idea of only sending units out as reissues to computers that could return the unit within 24 hours or less. This would also clear the database of 'hanging' units quicker. ID: 60360 · Rating: 0 · rate: / Reply Quote