Questions and Answers : Unix/Linux : Computation error?
Author | Message |
---|---|
Viktor Astrom Send message Joined: 7 Apr 06 Posts: 3 Credit: 1,113,859 RAC: 0 |
I have a problem with Rosetta jobs stop working on 3 different Linux machines running Debian Sarge with different 2.6 kernels. World community grid works flawless but Rosetta stops executing. I get a yellow line in BoincView 1.2.6 everytime BOINC tries to run the broken project until I manually terminate them. BOINC never seems to stop donating CPU time to these computations (every 1h cycle orso it tries to run the project, making the computers sit idle). Been waiting for it to terminate the job itself for more than 10h now without it ever terminating the project. If this continues I will be forced to pause my contribution to Rosetta as alot of my jobs for this project goes awry (one even hanged at 98% done, talk about frustrating). When I manually cancel them in BoincView it stands "Computation error (aborted by user)" under Status. Then finally BOINC will get a new project from Rosetta and start working again. I use boinc_client 5.4.11 from Backports and Rosetta 5.40. |
asmodeus Send message Joined: 8 Aug 07 Posts: 1 Credit: 210 RAC: 0 |
I have the same problem and it seems that happens mostly if you suspend and restart the project twice. Possible the same happens when boinc switces between projects. |
Evan Send message Joined: 10 Aug 07 Posts: 1 Credit: 8,944 RAC: 0 |
Can the process recover from a hard shutdown? A couple of mine have been failing recently. It's possible that they were just both in the cue when I had to shut it down improperly (for reasons unrelated to Boinc/Rosetta). I am concerned because I also switched from Fedora to Ubuntu. I am now using the package from synaptic, instead of the official version. Is it possible that this is the problem too? |
mrm Send message Joined: 11 Jan 08 Posts: 1 Credit: 53,153 RAC: 0 |
Can the process recover from a hard shutdown? I seem to have a related issue. My setup calls for a break in computation during office hours. I noticed, that when I restart boinc I get cpu utilisation of 6.5 (which is as planned). Next day, it will fall to ~4, following days to ~3, ~2, etc. The number of rosetta processes increases, but they are mostly stalled. I'll change my setup to run jobs for 24h w/o breaks and see if it improves the situation. If yes, I'll reconsider whether to apply my computation resources to other project. Babying rosetta jobs takes too much of my time already :( |
Questions and Answers :
Unix/Linux :
Computation error?
©2024 University of Washington
https://www.bakerlab.org