Questions and Answers : Unix/Linux : Rosetta Client Routinely Hangs
Author | Message |
---|---|
Raster Send message Joined: 1 Apr 06 Posts: 2 Credit: 213,377 RAC: 0 |
I'm running BOINC 5.8.16 for i686-pc-linux-gnu. I've noticed that the Rosetta clients routinely hang in the beginning/middle/near end of processing a work-unit. The hardware is a dual-processor/2 core per CPU/hyper-threaded Intel Xeons so the OS sees essentially 8 CPUs. So I have it configured to process up-to 8 work-units simultaneously, however after running about 2 weeks, work-units start to get stuck. This morning it was down to 1 active process. I've tried the beta version of BOINC with the same results. Do I just need to restart the boinc client every week or so? thanks, Mike Morgan |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
1st, check your memory usage and settings. With 8 cores, chances are R@H may run out of memory before 8 processes can run. BOINC will leave them in memory (sometimes) but not "run" them if there isn't enough free memory (according to your settings). R@H uses between 120 and 360 MB of memeory for each task. 2nd, a memory contention would explain what several users and I have experienced with R@H. After suspending a WU (doesn't matter why/how), it will not resume properly even though BOINC thinks it's running. Eventually, it crashes or you have to kill the pid, resulting in a compute error. A work around is to limit number of CPUs, set memory limit settings high, and set "leave suspended applications in memory = yes". This will not solve the problem, but it minimizes the bug from occuring. Does that describe your problem? Read more here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=3481 |
Raster Send message Joined: 1 Apr 06 Posts: 2 Credit: 213,377 RAC: 0 |
Thanks for the suggestions! I think they may minimize the problem. I think I'm hitting condition #2 because of condition #1. My machine has 2G but is configured to use 50% of available memory "while computer is in use" and 90% if otherwise idle. Rosetta is my only project on this machine so I suspect that when the machine was idle it was able to start 8 clients, but when it detected that the machine was busy it suspended some WUs. Since my preferences were set to not keep the processes in memory, I guess I hit the bug you described in condition #2. After several weeks of suspend/resume failures I was left with just one WU being processed. Is this a BOINC defect? The discussions at the link you provided seem to suggest it's a R@H problem. thanks, Mike |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
Is this a BOINC defect? The discussions at the link you provided seem to suggest it's a R@H problem. It is definitely a Rosetta problem. Other projects I've run, including CPDN, Einstein, Seasonal Attribution, and SETI, do not have this problem uninitializing. I've posted about it several times, but the admins ignore me and other users who confirm my diagnosis. I don't think they care much about their Linux application. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
DJ, I wouldn't say you are ignored. I've been thinking we should start a thread in the Number Crunching forum about Linux task preemption problems. And the tips and things to check that you've added here would be a good start to putting helpful information about the topic in a single place. I'd also like to collect the symptoms all in one place, and if one reverses your recommendations, they can see configurations that seem to expose the problem. I should also point out that just because other projects do not see the problem does not mean the Rosetta team will be able to make the fix. I believe BOINC is in charge of ending and tearing the thread down when preempted tasks are not retained in memory. So, if the thread isn't ending when BOINC wants it to, it may prove to be a Linux bug in the end. Please start a thread to discuss this in detail. List specifics about Linux versions and memory preferences, and perhaps we can get some input on whether there are flavors of Linux that don't have the problem. Or if there are other factors to when people see it occur, and when not. Rosetta Moderator: Mod.Sense |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
|
Questions and Answers :
Unix/Linux :
Rosetta Client Routinely Hangs
©2024 University of Washington
https://www.bakerlab.org