Message boards : Number crunching : to many validate/compute errors
Author | Message |
---|---|
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
This is getting stupid Combined total of 9 compute or validate errors in 3 days. Some within hours of each other. I thought you guys would do a better job of filtering out bad tasks. I hate wasting cpu time/power on projects that will not validate or go down in flames at the end of their run. I am tempted until CASP8 is over to set my run time to 2 hrs instead of 4 so I don't waste time on 'junk' It is still another 6 hours before I get home and see what errors showed up this time. You guys know that you are losing good computing power due to alot of hung tasks and validate/compute errors. The lack of anyone saying anything about these problems is not a good track record for RAH. |
Jeremy Send message Joined: 15 May 08 Posts: 13 Credit: 2,636 RAC: 0 |
100% agreed |
jaxom1 Send message Joined: 5 Jun 06 Posts: 180 Credit: 1,586,889 RAC: 0 |
Although I am not as angry as you seem to be, this was one of the reasons Poem is getting more work from me recently. This is getting stupid |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
i just aborted the rest of the rb06 tasks, i was getting about a 50% error rate, either compute or validate. time for some new work. |
David E K Volunteer moderator Project administrator Project developer Project scientist Send message Joined: 1 Jul 05 Posts: 1018 Credit: 4,334,829 RAC: 0 |
Sorry for all the errors. You should still be granted credit. The rb_06 tasks had a high failure/invalid rate with minirosetta version 1.25 but should have run okay with version 1.28. A relatively small number of rb_06 jobs were issued with 1.25 and it seems like you got a bunch of them. The validate errors were actually not a waste of cpu time though and I'll try to explain why. There are a few filters in place to conserve cpus. Basically, if the structure does not look protein-like at certain stages (do not pass the filters), the process continues on to the next model. The typical functionality which is in place in version 1.28 and rosetta++, is for the failed structure to get tagged and written to the final result file but in version 1.25, in an effort to push out the app too quickly for CASP, the handling of failed structures was not set up correctly and the failed structure was not being written out to the final result file. rb_06 was an unusually large CASP target for ab initio modeling so the pass rate was low and thus with version 1.25, models were often not being sent back to our servers causing validate errors. rb_06 was a Robetta target which is completely automated and does not use Ralph. Instead, it initially sends out a small batch of jobs and makes sure the success rate is high before sending out more jobs. You unfortunately got a bunch of the initial batch before we were able to update to version 1.28. Sorry for the late response. We are working on getting minirosetta more stable and I'll be talking to the developers about the current status of CASP jobs using minirosetta on R@h. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Thank you for taking the time to respond. So I was just unlucky. Well on to better things then. |
Message boards :
Number crunching :
to many validate/compute errors
©2024 University of Washington
https://www.bakerlab.org