Computation error?

Questions and Answers : Unix/Linux : Computation error?

To post messages, you must log in.

AuthorMessage
Profile Viktor Astrom

Send message
Joined: 7 Apr 06
Posts: 3
Credit: 1,113,859
RAC: 0
Message 31668 - Posted: 25 Nov 2006, 22:34:53 UTC

I have a problem with Rosetta jobs stop working on 3 different Linux machines running Debian Sarge with different 2.6 kernels.

World community grid works flawless but Rosetta stops executing. I get a yellow line in BoincView 1.2.6 everytime BOINC tries to run the broken project until I manually terminate them. BOINC never seems to stop donating CPU time to these computations (every 1h cycle orso it tries to run the project, making the computers sit idle). Been waiting for it to terminate the job itself for more than 10h now without it ever terminating the project.

If this continues I will be forced to pause my contribution to Rosetta as alot of my jobs for this project goes awry (one even hanged at 98% done, talk about frustrating). When I manually cancel them in BoincView it stands "Computation error (aborted by user)" under Status. Then finally BOINC will get a new project from Rosetta and start working again.

I use boinc_client 5.4.11 from Backports and Rosetta 5.40.
ID: 31668 · Rating: 1 · rate: Rate + / Rate - Report as offensive    Reply Quote
asmodeus

Send message
Joined: 8 Aug 07
Posts: 1
Credit: 210
RAC: 0
Message 44929 - Posted: 13 Aug 2007, 7:17:34 UTC

I have the same problem and it seems that happens mostly if you suspend and restart the project twice. Possible the same happens when boinc switces between projects.
ID: 44929 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Evan

Send message
Joined: 10 Aug 07
Posts: 1
Credit: 8,944
RAC: 0
Message 45072 - Posted: 17 Aug 2007, 1:21:04 UTC

Can the process recover from a hard shutdown?

A couple of mine have been failing recently. It's possible that they were just both in the cue when I had to shut it down improperly (for reasons unrelated to Boinc/Rosetta).

I am concerned because I also switched from Fedora to Ubuntu. I am now using the package from synaptic, instead of the official version. Is it possible that this is the problem too?
ID: 45072 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mrm

Send message
Joined: 11 Jan 08
Posts: 1
Credit: 53,153
RAC: 0
Message 51218 - Posted: 7 Feb 2008, 13:58:01 UTC - in response to Message 45072.  

Can the process recover from a hard shutdown?

A couple of mine have been failing recently. It's possible that they were just both in the cue when I had to shut it down improperly (for reasons unrelated to Boinc/Rosetta).


I seem to have a related issue. My setup calls for a break in computation during office hours. I noticed, that when I restart boinc I get cpu utilisation of 6.5 (which is as planned). Next day, it will fall to ~4, following days to ~3, ~2, etc. The number of rosetta processes increases, but they are mostly stalled.
I'll change my setup to run jobs for 24h w/o breaks and see if it improves the situation. If yes, I'll reconsider whether to apply my computation resources to other project. Babying rosetta jobs takes too much of my time already :(

ID: 51218 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Unix/Linux : Computation error?



©2024 University of Washington
https://www.bakerlab.org