Message boards : Number crunching : Rosetta Process Stalls
Author | Message |
---|---|
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
Seems to have just started doing this a week or two ago. New PC: CPU: Intel Pentium D 915 2800MHz @ 2800MHz Motherboard: Intel D945Gpm Memory: 1024 MB of Corsair DDR2-667 PS: OCZ Modstream Video Card: ATI X700 Pro 256MB Radeon Hard Drive: Seagate 7200.10 320.0 GB @ 7200 RPMS OS: XP Pro with all updates The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again. Screensaver set to blank, no other running processes except for antivirus (SAV 10.0.1). I know a lot about PC's, so there's no viruses nor malware on the system, it's very clean. Temperatures and voltages are fine, the PC is working perfectly. Any ideas as to why the Rosetta processes are stalling? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Which BOINC version are you running? Rosetta Moderator: Mod.Sense |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
Which BOINC version are you running? Whatever the latest version is. Last night I tried a simple uninstall/reinstall, but it didn't help. Last I checked both processes were "running" but stalled. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Looks like most of your hosts are 5.4.11. But This one is 5.2.11. Similar BOINC issues have been reported. It might be helpful if you could identify the specific host. BOINC manager seems to lose contact with the running threads. And seems to not detect when they end (generally with a no heartbeat indication) to schedule more tasks. Rosetta Moderator: Mod.Sense |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
Seems to have just started doing this a week or two ago. New PC: I have the same problem on one of my machines and it appears to only affect my dual core Pentium D processor. I have suspended Rossetta on that machine |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
Looks like most of your hosts are 5.4.11. But This one is 5.2.11. Similar BOINC issues have been reported. It might be helpful if you could identify the specific host. BOINC manager seems to lose contact with the running threads. And seems to not detect when they end (generally with a no heartbeat indication) to schedule more tasks. This is the machine I'm having the problem with. Tonight I'll check the core temps with 100% load using Intel TAT to make sure it's not throttling. I'd be surprized because the regular Intel temp monitor utility shows load temps at about 70°C per core. The P4/PD CPU's don't normally start throttling until about 85+°C, and I'm way below that.....but one never knows. Looking over the results, there's a ton of "compute errors". Guess that could be the root of the problem? Maybe I need to run some dianostics on that PC to check for a bad CPU or bad RAM modules? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
All -107 exit codes it looks like. This is one of the primary symptom of PCs that are having problems running with BOINC/Rosetta as the screensaver... yet you said you aren't using the screensaver. Do you display the graphics? (if so, it can be prone to the same problems as the screensaver). Interesting, that host shows significantly more floating point ops per second then integer. Measured floating point speed 1411.95 million ops/sec Measured integer speed 1189.46 million ops/sec Also interesting, when a task completes normally, you are granted roughly double the credit you claim. Please let us know the result of your tests tonight. Rosetta Moderator: Mod.Sense |
Dotsch Send message Joined: 12 Feb 06 Posts: 111 Credit: 241,803 RAC: 0 |
The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again. This is a problem in the BOINC API. David Anderson has debugged this problem and written a fix for this problem, which is inlcuded in the BOINC API 5.8.0. This fix needs a recompile of the science application. This could also happen with BOINC 5.4.x, but mostly seen if CPU throtteling was used (BOINC client 5.6.x/5.7.x and 5.8.x). |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
All -107 exit codes it looks like. This is one of the primary symptom of PCs that are having problems running with BOINC/Rosetta as the screensaver... yet you said you aren't using the screensaver. Do you display the graphics? (if so, it can be prone to the same problems as the screensaver). To be honest, I never looked at it that closely. I have a somewhat large number of PC's running the project and micro-management is time-comsuming. When I installed the client, I (as always) uncheck the "set as screensaver" option. On this PC, I just run the normal client3. The Intel TAT tool will not run on a PD processor, so all I have it the regular Intel utility. But my load temps are still fine, about 63°C with two processes of R@H running. I backed the RAM speed down to 533 from 667 for now, even though the ram is rated at 667MHz. I'll see if that has any effect. |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
The BOINC client shows two processes are running, but the time is not incrementing and neither core shows a load in task manager. Shutdown the client and restart, it runs fine....for awhile. Today only ONE process was running on a single core.Restarted BOINC, everything is fine, came back 4-5 hours later to find Rosetta "stalled" again. Hummm...that's interesting, thanks. I seem to only have this problem on one PC, which is a bit strange. I'm wondering if it may have something to do with the ATI video driver? That's the most unique characteristic on this PC compared to my others. |
Dotsch Send message Joined: 12 Feb 06 Posts: 111 Credit: 241,803 RAC: 0 |
Any ideas as to why the Rosetta processes are stalling? No. The problem occurs only one sometimes on some hosts on some different projects. But on some hosts and projects it happens more often. Mostly, if the science app will be to often suspended. To include the fix of the BOINC API, the science application must be recompiled. |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
Any ideas as to why the Rosetta processes are stalling? Thanks! This will happen soon hopefully. |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold. I came home yesterday to find the PC turned off. My wife said it was beeping with an error about being "too hot", which was the Intel software. I increased the warning threshold to 85°C in the software, but this morning I found the R@H process stalled again. The temp seems to maxing out around 73°C with R@H. This dumb Pentium D 920 should not be throttling at that low of a temperature. Apparently, I'm going to have to buy an after-market cooler to tame this beast. I swear, Intel never designed this CPU to run under 100% load 24/7. I'm about ready to toss the thing out the window, lol. The system is adequately cooled IMO. There's a huge 120mm fan in the bottom of the PS, which is just above the CPU. Then I have a 60mm rear exhaust fan and a front 80mm, all of this in a 100% aluminum case with managed cables. I guess I'll have to do some agressive work here to cool this Pentium D! Thanks to all that replied! |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold. I have a Pentium D 3Gigrunning 24/7 never gets above 35c Doesn't run Rossetta at the moment because it stalls. |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold. Sure, idle temp is fairly low, my system idles around 40°C. At least now you know why it stalls. I'm testing a little program I found on the web (since last night) that controls the throttling. I'm not letting the CPU throttle! It doesn't need to throttle until about 80-85°C, and I'm not running that hot. Last I have checked the system it was running 68-72°C under R@H. If this program works, I'll be happy. |
Betting Slip Send message Joined: 26 Sep 05 Posts: 71 Credit: 5,702,246 RAC: 0 |
OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold. It runs at 35c working on other projects, it's not idle, it's at 100% |
BadThad Send message Joined: 8 Nov 05 Posts: 30 Credit: 71,834,523 RAC: 0 |
OK, it looks like the CPU is definately throttling. I traced back my work completed and the problem seems to have started when I uninstalled the Intel Active Monitor program. It seems the BIOS started making the decision to throttle when I did that, now it's the Intel program doing it, but it appears to have a higher threshold. Wow, 35°C with 100% CPU load? That's unreal! Well, I've given up on this PC for my daughter, it's just too hot. I've taken the Pentium D and put it in a new case with an Artic Freezer cooler. Just got this going last night, in the new setup it's running at 35°C idle and max of 48°C with Rosetta. I've decided to make this PC nothing but a full-time cruncher in my herd of PC's. Good news for my daughter, I'm putting a Gigabyte DS3 with a C2D E6300 into her case.....should take care of my temperature problems with her case for good. |
Message boards :
Number crunching :
Rosetta Process Stalls
©2024 University of Washington
https://www.bakerlab.org