hard lock on linux

Message boards : Number crunching : hard lock on linux

To post messages, you must log in.

AuthorMessage
root

Send message
Joined: 5 Mar 10
Posts: 3
Credit: 8,189
RAC: 0
Message 65589 - Posted: 19 Mar 2010, 12:24:13 UTC
Last modified: 19 Mar 2010, 12:24:54 UTC

Hello.

I have a stability problem with running Rosetta on my linux box. Please look for details to thread on boinc forum: http://boinc.berkeley.edu/dev/forum_thread.php?id=5556

After all, it does not look like hardware problem for me. So, because I don't found any reports of the similar problems, I can only ask: what else I can do to localize the problem?

Thanks in advance.
ID: 65589 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65592 - Posted: 19 Mar 2010, 16:13:58 UTC
Last modified: 19 Mar 2010, 16:14:45 UTC

The E7300 you described in the other post is reporting problems locating required files. Such as this task. So that doesn't point to hardware issues at all. It points to authority and antivirus problems, or, less commonly, to network instability (which would more often cause a signature violation instead).
Rosetta Moderator: Mod.Sense
ID: 65592 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
root

Send message
Joined: 5 Mar 10
Posts: 3
Credit: 8,189
RAC: 0
Message 65593 - Posted: 19 Mar 2010, 17:15:40 UTC - in response to Message 65592.  

That is not related to the missed files at all. I found no other way not to run rosetta at the boinc start than delete some important rosetta files. I'm not familiar with the boinc and that was a way to switch to POEM without immediate hard lock.

Error reporting is useful only when you know exactly what does it mean :-)
ID: 65593 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 65597 - Posted: 20 Mar 2010, 10:57:27 UTC - in response to Message 65593.  

That is not related to the missed files at all. I found no other way not to run rosetta at the boinc start than delete some important rosetta files. I'm not familiar with the boinc and that was a way to switch to POEM without immediate hard lock.

Error reporting is useful only when you know exactly what does it mean :-)


It looks like you have 2 gig of memory in that E7300, do you have the setting to YES leave units in memory when they swap? It is under Your Account, Computing Preferences, then in the top section it says
"Leave applications in memory while suspended?
(suspended applications will consume swap space if 'yes') yes"

Make sure yours says yes both here at Rosetta and at all your other Boinc projects too. No it does not solve all problems but it does solve some of them and is worth trying.

You also might need to reload Boinc and detach and reattach to Rosetta, deleting Boinc related files is always a bad thing, especially important ones.
ID: 65597 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 65600 - Posted: 20 Mar 2010, 21:09:19 UTC

Oh, I see, so the missing file was not the true cause. You intentionally deleted it.

I haven't heard any other similar reports. So I tend to lean towards the things suggested by mikey.

I would just point out that you could use the <start_delay> setting in the cc_config.xml file to allow BOINC to get started, and give you some time to suspend a given project if you wish, prior to running tasks. You can read more about the settings and usage of this file here].
Rosetta Moderator: Mod.Sense
ID: 65600 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 65620 - Posted: 22 Mar 2010, 17:55:23 UTC
Last modified: 22 Mar 2010, 17:55:39 UTC

Just for a test, try reducing your memory overclock by 1x. For example, if your memory is 800 MHz, go down to 667MHz. Then, 1) abort all work units, 2) reset the Rosetta project, 3) get new tasks for testing.

also, I highly recommend Mod.Sense's suggestion of adding the start delay parameter.
ID: 65620 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : hard lock on linux



©2024 University of Washington
https://www.bakerlab.org