All tasks beginning with lb* have computation error

Message boards : Number crunching : All tasks beginning with lb* have computation error

To post messages, you must log in.

AuthorMessage
Rayfen Windspear

Send message
Joined: 13 May 09
Posts: 6
Credit: 113,749
RAC: 0
Message 61669 - Posted: 11 Jun 2009, 5:58:53 UTC

Any and all of the tasks I get that start with lb* all fail after about an hour with a computation error. Has anyone else had this problem or is it just some weird localized thing?

Perhaps my AMD Phenom quadcore is incapable of doing the computations for it?

Just FYI it takes about 4 hours to complete a task so basically it gets about 1/4 complete before it decides something is wrong.

Oh yeah and I think I already know the answer to this question but... do I get credit when it fails like that?
ID: 61669 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammeh

Send message
Joined: 11 Nov 08
Posts: 63
Credit: 211,283
RAC: 0
Message 61676 - Posted: 11 Jun 2009, 9:04:15 UTC

^+1

All of my lb* tasks are also failing with computation errors some after 7 hours (run time is set to 3) and some after 10 minutes. I am aborting all of the lb* tasks i have.
ID: 61676 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Wang Solutions
Avatar

Send message
Joined: 16 Jul 06
Posts: 3
Credit: 1,909,342
RAC: 0
Message 61682 - Posted: 11 Jun 2009, 12:07:22 UTC

Same issue here. All lb_threading are failing, sometimes running for up to 8 hours (run time set to 4 hours) before doing so. I am aborting all.

Join the No.1 Australian Team!
ID: 61682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
googloo
Avatar

Send message
Joined: 15 Sep 06
Posts: 133
Credit: 22,724,831
RAC: 3,039
Message 61683 - Posted: 11 Jun 2009, 12:49:15 UTC

After 7+ hours of crunching and a computation error on one, the second looked like it was going to do the same, so I aborted them all.
ID: 61683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammeh

Send message
Joined: 11 Nov 08
Posts: 63
Credit: 211,283
RAC: 0
Message 61689 - Posted: 11 Jun 2009, 15:50:39 UTC

They are failing with error 161. The validator is given the full claimed credit for these results so no harm done.
ID: 61689 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayfen Windspear

Send message
Joined: 13 May 09
Posts: 6
Credit: 113,749
RAC: 0
Message 61694 - Posted: 11 Jun 2009, 18:20:28 UTC - in response to Message 61689.  

I just caught one in progress before it goes to error. I opened the graphics and it pretty much just sits there with straight lines and the steps don't increase or anything. It seems they just stall out for whatever reason.
ID: 61694 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bono_vox

Send message
Joined: 5 Dec 05
Posts: 8
Credit: 371,092
RAC: 0
Message 61697 - Posted: 11 Jun 2009, 21:00:19 UTC - in response to Message 61689.  

They are failing with error 161. The validator is given the full claimed credit for these results so no harm done.


No it doesn't, or at least not in my case.

257062181 304453 7 Jun 2009 12:41:55 UTC 10 Jun 2009 22:27:57 UTC Over Client error Compute error 29,120.28 74.19 ---
257106400 304453 7 Jun 2009 16:54:25 UTC 11 Jun 2009 19:15:49 UTC Over Client error Compute error 29,052.34 74.01 ---

But after those 2 failed WUs, so far I've returned 4 valid ones.
ID: 61697 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Hammeh

Send message
Joined: 11 Nov 08
Posts: 63
Credit: 211,283
RAC: 0
Message 61698 - Posted: 11 Jun 2009, 21:10:25 UTC - in response to Message 61697.  
Last modified: 11 Jun 2009, 21:10:50 UTC

They are failing with error 161. The validator is given the full claimed credit for these results so no harm done.


No it doesn't, or at least not in my case.

257062181 304453 7 Jun 2009 12:41:55 UTC 10 Jun 2009 22:27:57 UTC Over Client error Compute error 29,120.28 74.19 ---
257106400 304453 7 Jun 2009 16:54:25 UTC 11 Jun 2009 19:15:49 UTC Over Client error Compute error 29,052.34 74.01 ---

But after those 2 failed WUs, so far I've returned 4 valid ones.


Yes it does.
Have a look at the actual work unit page for the first task you have outlined above: https://boinc.bakerlab.org/rosetta/result.php?resultid=257062181
At the bottom of the stderr out section it says:
<message>
<file_xfer_error>
<file_name>lb_thread_all_multi_hb_t326__IGNORE_THE_REST_12732_232_0_0</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>

Which is the error message and below that it says:
Validate state Invalid
Claimed credit 74.1858587672511
Granted credit 74.1858587672511
application version 1.71

Therefore the credit for the workunit has been granted even though it does not show up on the tasks list because the result is labelled "invalid".

Hammeh
ID: 61698 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bono_vox

Send message
Joined: 5 Dec 05
Posts: 8
Credit: 371,092
RAC: 0
Message 61700 - Posted: 11 Jun 2009, 21:15:23 UTC - in response to Message 61698.  


Therefore the credit for the workunit has been granted even though it does not show up on the tasks list because the result is labelled "invalid".

Hammeh


Many thanks Hammeh. You are correct.
ID: 61700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayfen Windspear

Send message
Joined: 13 May 09
Posts: 6
Credit: 113,749
RAC: 0
Message 61704 - Posted: 11 Jun 2009, 23:52:20 UTC
Last modified: 11 Jun 2009, 23:53:28 UTC

Seems like its not ALL the lb* ones that are dying. I had one that was frozen earlier today and now it seems to be doing fine... only problem is that its like 8 hours in the making with 60% and it usually only takes 3 hours to finish WUs.

From the looks of the graphics though, the lb* ones are incredibly complex.

I'm not worried about the friggen credits, I'm worried about getting the data crunched correctly.
ID: 61704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 61709 - Posted: 12 Jun 2009, 12:03:50 UTC

From what I've seen and heard, it is not all lb tasks that have a problem. However, for many people, all of their problem tasks start with lb (other completed tasks did as well, but they don't notice). There were some lb tasks that were created incorrectly that seem to be causing the file transfer errors. So, nothing on your end to change to correct. I'm hoping most of the problem tasks are now purged from the system.
Rosetta Moderator: Mod.Sense
ID: 61709 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mike Tyka

Send message
Joined: 20 Oct 05
Posts: 96
Credit: 2,190
RAC: 0
Message 61714 - Posted: 12 Jun 2009, 17:00:41 UTC

Yes this was a rotten batch. I don't know how this got through RALPH, the person submitting these jobs has been notified.

Sorry about the errors. The lastest WUs seem to be running smoothly though!

Thanks for crunching !

Mike
http://beautifulproteins.blogspot.com/
http://www.miketyka.com/
ID: 61714 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Rayfen Windspear

Send message
Joined: 13 May 09
Posts: 6
Credit: 113,749
RAC: 0
Message 61763 - Posted: 15 Jun 2009, 13:21:16 UTC - in response to Message 61714.  

Yes this was a rotten batch. I don't know how this got through RALPH, the person submitting these jobs has been notified.

Sorry about the errors. The lastest WUs seem to be running smoothly though!

Thanks for crunching !

Mike



Yeah I haven't seen any problems with them anymore from what I can tell. I only have my computer send in reports every 2 days so I check em all now and then and they all seem fine now
ID: 61763 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : All tasks beginning with lb* have computation error



©2024 University of Washington
https://www.bakerlab.org