Message boards : Number crunching : Workunit error - check skipped?
Author | Message |
---|---|
Tom Philippart Send message Joined: 29 May 06 Posts: 183 Credit: 834,667 RAC: 0 |
Could anyone please explain to me what happened here? https://boinc.bakerlab.org/rosetta/workunit.php?wuid=50835993 This is my result: https://boinc.bakerlab.org/rosetta/result.php?resultid=59077178 Thanks |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
I haven't seen that one before. I'll look into it. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
Tom, We purged the logs last night so I couldn't track down the error. It may have been a corrupted result file. I granted the claimed credit for you. |
Tom Philippart Send message Joined: 29 May 06 Posts: 183 Credit: 834,667 RAC: 0 |
thanks! |
Monkey Send message Joined: 14 Nov 06 Posts: 1 Credit: 1,001,689 RAC: 0 |
I seem to have the same problem. It seems that I was the only one that did the workunit. Workunit: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=50393160 Result: https://boinc.bakerlab.org/rosetta/result.php?resultid=60300613 |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
I seem to have the same problem. It seems that I was the only one that did the workunit. The "max # of total results" (clicky) is set to 2 which was probably exceeded by the first two results that didn't get returned. I guess "No reply" results don't count as an error, which would have stopped the 3rd result being sent out - if that's the case, I'd think that the "max # of total results" is too low at 2. The WU in Tom Philippart's post (WU 50835993) went funny because the second result was not returned by its deadline, so a 3rd copy was sent out. Before the 3rd was returned, the second result came back late and passed validation, the "max # of success results" was hit so the 3rd was rejected. IMHO, if results returned after their deadline are accepted, the "max # of success results" must be higher than 1. And "max # of total results" probably needs to be higher than 2. |
Tom Philippart Send message Joined: 29 May 06 Posts: 183 Credit: 834,667 RAC: 0 |
not the same problem, but look at this: https://boinc.bakerlab.org/rosetta/workunit.php?wuid=51477081 again the same computer, I hope it isn't on my side though :( |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
not the same problem, but look at this: I've got loads of WUs I've returned today that are all stuck in Pending state. The credit for my team hasn't increased at all since before 0800 UTC. |
alpha Send message Joined: 4 Nov 06 Posts: 27 Credit: 1,550,107 RAC: 0 |
I've got a validate error on this result, which is this work unit. It seems to have appeared out of the blue on a very stable machine. The only thing I noticed is that the other two computers that were given this WU generated a "client error" and "unknown" outcome. |
alpha Send message Joined: 4 Nov 06 Posts: 27 Credit: 1,550,107 RAC: 0 |
Bump. Can someone advise why I wasn't credited for the above work unit? |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
The first result errored, which should have killed the entire WU, but two more results were sent out after that. When you returned your result (assuming it was valid), you didn't get any credit because the max # of error results had been hit. The time period is close to when the validator server failed so maybe that's why the extra two results were sent out. I still think the settings for max # of error/total/success results is set too low on all WUs. Is a project admin going to respond to my points in this post? |
anders n Send message Joined: 19 Sep 05 Posts: 403 Credit: 537,991 RAC: 0 |
Bump. If you look at the bottom of the result page you se this text. -Validate state Invalid -Claimed credit 20.7750264779357 -Granted credit 20 -application version 5.45 As you se you got 20 credits for the WU :) Happy crunching Anders n |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
The 20 credits is the maximum the daily credit granting script allows. And by the way you see the credits in the result page, but not the WU display, further confirms that these credits were granted by the daily script. About the time of your reported problem, the project's validater went down, and so I suspect that is why your result failed to validate properly. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
The first result errored, which should have killed the entire WU, but two more results were sent out after that. When you returned your result (assuming it was valid), you didn't get any credit because the max # of error results had been hit. The result in question was invalid because it may have been corrupted for some reason and/or the validator was not able to read the result file. We set the max #s low because we like to keep the lifespan of work units to a minimum without having to decrease the delay bound (since user's have requested a longer delay bound). It does seem odd to us that the scheduler may send more results than the max # of total results though. It may help to start using the reliable_time scheduler option which attempts to send old results to reliable hosts after we update the server this week. Maybe with this option, we could increase the max #s. |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
We set the max #s low because we like to keep the lifespan of work units to a minimum without having to decrease the delay bound (since user's have requested a longer delay bound). It does seem odd to us that the scheduler may send more results than the max # of total results though. It may help to start using the reliable_time scheduler option which attempts to send old results to reliable hosts after we update the server this week. Maybe with this option, we could increase the max #s. I can understand wanting to avoid WUs erroring out many times, but what about results that get returned just 1 hour after the deadline? The current settings will mean a second result will already have been sent out, but it won't get any credit because the late result gets it. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1480 Credit: 4,334,829 RAC: 0 |
This is definitely an issue. It would be nice if the scheduler just didn't send out the third result. After the server update, we'll look into a fix. We may just modify the validator. |
Message boards :
Number crunching :
Workunit error - check skipped?
©2025 University of Washington
https://www.bakerlab.org