Temperature spike in beginning of Rosetta WU

Message boards : Number crunching : Temperature spike in beginning of Rosetta WU

To post messages, you must log in.

AuthorMessage
Profile Mark
Avatar

Send message
Joined: 1 Dec 12
Posts: 10
Credit: 20,184
RAC: 0
Message 77654 - Posted: 18 Nov 2014, 3:53:49 UTC

Normally, when my computer is crunching at full speed, either Seti or Rosetta, the cpu temp measures about 60C. About 10-15 minutes into a Rosetta wu the cpu temp jumps to 83C for 5-10 seconds. Is this normal?

Better yet, can someone explain what's happening?

Thanks
ID: 77654 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 77655 - Posted: 18 Nov 2014, 7:58:16 UTC

A logical guess could be that your CPU is hitting thermal interlock temperatures and is throttling the clock down to prevent damage; thus the resulting drop in temperature. It might be time to perform an inspection on your cooling system.
ID: 77655 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mark
Avatar

Send message
Joined: 1 Dec 12
Posts: 10
Credit: 20,184
RAC: 0
Message 77658 - Posted: 19 Nov 2014, 0:47:40 UTC

Except that it's not a drop but a spike in temperature. It happens repeatably near the beginning of Rosetta units. Does Rosetta do something in setting up that uses more of the processor?
The universe is not only stranger than you imagine, it is stranger than you can imagine.

ID: 77658 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,870,251
RAC: 1,154
Message 77660 - Posted: 19 Nov 2014, 1:04:04 UTC

I think Polian's point is that it might be throttled and then hit full speed during the spike before throttling again. It could be a few other things like one thread working the cpu harder than two because of a memory bottleneck or something like that. Try downloading cpu-z to see what's happening, but cooling is a good starting point to fix it. Might need to redo the thermal compound if the heatsink is clean.
ID: 77660 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mark
Avatar

Send message
Joined: 1 Dec 12
Posts: 10
Credit: 20,184
RAC: 0
Message 77661 - Posted: 19 Nov 2014, 1:12:59 UTC

Except that it "hits full speed" repeatably in the beginning of Rosetta units and no where else. The odds of this happening by chance are astronomical.
The universe is not only stranger than you imagine, it is stranger than you can imagine.

ID: 77661 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 77662 - Posted: 19 Nov 2014, 2:55:57 UTC - in response to Message 77660.  

I think Polian's point is that it might be throttled and then hit full speed during the spike before throttling again. It could be a few other things like one thread working the cpu harder than two because of a memory bottleneck or something like that. Try downloading cpu-z to see what's happening, but cooling is a good starting point to fix it. Might need to redo the thermal compound if the heatsink is clean.


Yes, that's what I was trying to convey, thanks! CPU-Z would tell you for sure as dcdc says.

Except that it "hits full speed" repeatably in the beginning of Rosetta units and no where else. The odds of this happening by chance are astronomical.


The cooling is still the most plausible explanation. Check your fan/water loop and thermal interface material. Stock heat sink grease or TIM pads have a shorter lifespan than, say, Arctic Silver. Assuming that you're running at stock clock speeds, 83C is far too hot. A generic explanation (since I'm not really in the know here) would be that it could be that Rosetta is more taxing on the CPU vs SETI, I don't know. I get higher temps when doing stability testing with Prime95 (and even higher yet with IntelBurnTest) than I do when running Rosetta.
ID: 77662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,606,783
RAC: 3,431
Message 77664 - Posted: 19 Nov 2014, 5:52:33 UTC

I'm experimenting with overclocking at the moment and found speedfan to be pretty vital in seeing what's happening with cpu & motherboard temperatures and fan-speed responses.

This followed an incident where the power-connector to the motherboard melted into its socket(!). Thankfully I saved the motherboard and have upgraded both the fan and power-supply.
ID: 77664 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,870,251
RAC: 1,154
Message 77667 - Posted: 19 Nov 2014, 23:36:02 UTC - in response to Message 77661.  

Except that it "hits full speed" repeatably in the beginning of Rosetta units and no where else. The odds of this happening by chance are astronomical.

Rosetta takes a little while to get going on the CPU which is why I suggested it might be that one thread is taxing the CPU more than two threads, or that one thread might run fine and when the second thread kicks in, it drags the temp up briefly before throttling kicks in. Either way, those scenarios would point to the cooling not being adequate, as Polian says.

ID: 77667 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mark
Avatar

Send message
Joined: 1 Dec 12
Posts: 10
Credit: 20,184
RAC: 0
Message 77669 - Posted: 20 Nov 2014, 1:49:32 UTC

Cpu-Z tells me I am getting my full processor speed and am not throttling.

I was hoping someone here might watch temps as closely as I do and might have noticed this behavior. Or better yet, there might be someone familiar with the code who could tell me what the application is doing during those spikes.

It is stock cooling, and it is a few years old, so it probably doesn't have the best thermal solution. But nothing else takes it over about 60C (at the current ambient temp). I may look for another backup project rather than take my heatsink apart to remove the stock thermal grease or pad and reapply Arctic Silver.

Thanks for the help.
The universe is not only stranger than you imagine, it is stranger than you can imagine.

ID: 77669 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77670 - Posted: 20 Nov 2014, 3:11:15 UTC
Last modified: 20 Nov 2014, 3:12:04 UTC

Well, in general terms, R@h is a very intense application. I've heard some say it can put more stress on the overall system then many benchmarks and stress tests. This is because it uses a lot of memory, and has intense floating point operations going on. It will make full use of L2 cache too. Many benchmarks and tests do some things and not others, but you don't get all of it happening at the same time.

Having said that, I wouldn't think a 5-10 second temp. spike would be of too much concern. Some other methods of addressing would be to reduce the % of CPU used by BOINC (although that would reduce your throughputs all day long, not just during the temp. spike); or to bring in a throttle mechanism that reduces CPU speed briefly during periods of high temp.; or to have more than one backup project, to reduce the likelihood of having a large number of tasks with the same runtime properties running at the same time.
Rosetta Moderator: Mod.Sense
ID: 77670 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 77671 - Posted: 20 Nov 2014, 13:07:44 UTC - in response to Message 77670.  
Last modified: 20 Nov 2014, 13:10:24 UTC

Well, in general terms, R@h is a very intense application.

Not really, on all my systems, both Intel and AMD, with Rosetta I always have 3-4°C less than with SETI optimized applications.



or to bring in a throttle mechanism that reduces CPU speed briefly during periods of high temp.

Any CPU, which doesn't fall into the category "ancient", has such mechanism. But as you say, few seconds of a bit too high is not an issue, specially since the CPU will throttle itself if it really get too hot.

Another thing, that's possible (and actually very likely): simple read error. A CPU actually can't get suddenly 20-30° warmer and than suddenly cold again (unless the cooler falls off and than magically comes back on it's place). This would also explain the 5-10 seconds, the temperature is checked every few seconds. At least on my AMD system I often get garbage data from the sensors, so suddenly I have there 99° as a max. temperature (for you it seems to be 83°).
.
ID: 77671 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mark
Avatar

Send message
Joined: 1 Dec 12
Posts: 10
Credit: 20,184
RAC: 0
Message 77682 - Posted: 23 Nov 2014, 0:57:04 UTC - in response to Message 77671.  

Another thing, that's possible (and actually very likely): simple read error. A CPU actually can't get suddenly 20-30° warmer and than suddenly cold again (unless the cooler falls off and than magically comes back on it's place). This would also explain the 5-10 seconds, the temperature is checked every few seconds. At least on my AMD system I often get garbage data from the sensors, so suddenly I have there 99° as a max. temperature (for you it seems to be 83°).

I use speedfan to graph temperatures. It updates the temps every second. I typically see 1-2 degrees of jitter in the readings. The only time I see this 20 degree jump is near the beginning of a R@H unit and during a backup (Acronis True Image) lasting 5-10 seconds. Sometimes I see this twice, near each other, in a R@H unit. It happens 3 times during a backup. I pause Boinc during backups. In all cases the max temp is different, 83C was an average. I have one temp sensor that reads -128C. I assume this is disconnected.

(Note: speedfan reports Temp1, Temp2, Temp3, HD0, and Core. Temp1 is the highest temp in the system. On the boards at speedfan's homepage I found info that the highest read temp is probably really the core. When Temp1 is 60C, Core is about 45C. I am assuming Temp1 is really the core temperature.)

Thanks for the insights.
The universe is not only stranger than you imagine, it is stranger than you can imagine.

ID: 77682 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Polian
Avatar

Send message
Joined: 21 Sep 05
Posts: 152
Credit: 10,141,266
RAC: 0
Message 77683 - Posted: 23 Nov 2014, 1:12:25 UTC

Sorry that I couldn't be of more specific help. I've never seen this behavior on any of my computers as long as I've been a Rosetta cruncher. Considering that no one else has reported any specific similarities to the issue you're seeing I've got nothing else better to offer than it's an aberration of some sort with your PC.
ID: 77683 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Mark
Avatar

Send message
Joined: 1 Dec 12
Posts: 10
Credit: 20,184
RAC: 0
Message 77684 - Posted: 23 Nov 2014, 3:42:04 UTC - in response to Message 77683.  

Sorry that I couldn't be of more specific help. I've never seen this behavior on any of my computers as long as I've been a Rosetta cruncher. Considering that no one else has reported any specific similarities to the issue you're seeing I've got nothing else better to offer than it's an aberration of some sort with your PC.

No worries. Thanks for trying :)
The universe is not only stranger than you imagine, it is stranger than you can imagine.

ID: 77684 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 77687 - Posted: 24 Nov 2014, 9:30:08 UTC - in response to Message 77682.  

I use speedfan to graph temperatures. It updates the temps every second. I typically see 1-2 degrees of jitter in the readings. The only time I see this 20 degree jump is near the beginning of a R@H unit and during a backup (Acronis True Image) lasting 5-10 seconds. Sometimes I see this twice, near each other, in a R@H unit. It happens 3 times during a backup. I pause Boinc during backups. In all cases the max temp is different, 83C was an average.

Are you crunching on all cores when "crunching at full speed"?
.
ID: 77687 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 77692 - Posted: 27 Nov 2014, 15:33:05 UTC - in response to Message 77670.  
Last modified: 27 Nov 2014, 16:24:39 UTC

modsense:
Well, in general terms, R@h is a very intense application. I've heard some say it can put more stress on the overall system then many benchmarks and stress tests. This is because it uses a lot of memory, and has intense floating point operations going on. It will make full use of L2 cache too. Many benchmarks and tests do some things and not others, but you don't get all of it happening at the same time.

Having said that, I wouldn't think a 5-10 second temp. spike would be of too much concern. Some other methods of addressing would be to reduce the % of CPU used by BOINC (although that would reduce your throughputs all day long, not just during the temp. spike); or to bring in a throttle mechanism that reduces CPU speed briefly during periods of high temp.; or to have more than one backup project, to reduce the likelihood of having a large number of tasks with the same runtime properties running at the same time.



agree r@h is sort of comparable (or more intense) than some of those benchmark and stress apps in its heavy 'weight'-ness, and of all things it is a *real* one compared to those synthetic benchmarks lol

as i'm running linux, i used 'cpupower frequency-set' to set the max frequency the cpu runs when r@h is running. i'm not too sure what is an equivalent utility in MS Windows (speedfan?). in a way i'm throttling it. i'd guess one could use a similar utility to manage that.

one of the factors which i think may lead to initial higher temperatures / frequencies is the intel's 'turboboost' or such equivalent features. 'turboboost' basically is the cpu internal overclocking mechanism that sets internal limits based on TDP power.

if i leave 'turboboost' on i've seen r@h push the envelope of some 75-85 deg C. I'm not sure if in the original post's case the high temperature may be caused by such similar feature followed by automatic throttling in the cpu. (i think 'turboboost' may be disabled in the bios setup screens)

to avoid this situation, i set the max cpu frequency/speed such that it runs in the norm of 60-65 deg C. it still produce pretty good throughput for the r@h jobs running on all cores

the other issue may be to check the heatsink and fan etc, could it be that the fan is running at low speeds or even *stopped* when r@h launches? that may point to perhaps the cpu pwm fan control app (bios?) or perhaps some of its parameter settings.
ID: 77692 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 77696 - Posted: 28 Nov 2014, 3:59:17 UTC - in response to Message 77692.  

modsense:
Well, in general terms, R@h is a very intense application. I've heard some say it can put more stress on the overall system then many benchmarks and stress tests. This is because it uses a lot of memory, and has intense floating point operations going on. It will make full use of L2 cache too. Many benchmarks and tests do some things and not others, but you don't get all of it happening at the same time.

Having said that, I wouldn't think a 5-10 second temp. spike would be of too much concern. Some other methods of addressing would be to reduce the % of CPU used by BOINC (although that would reduce your throughputs all day long, not just during the temp. spike); or to bring in a throttle mechanism that reduces CPU speed briefly during periods of high temp.; or to have more than one backup project, to reduce the likelihood of having a large number of tasks with the same runtime properties running at the same time.



agree r@h is sort of comparable (or more intense) than some of those benchmark and stress apps in its heavy 'weight'-ness, and of all things it is a *real* one compared to those synthetic benchmarks lol

as i'm running linux, i used 'cpupower frequency-set' to set the max frequency the cpu runs when r@h is running. i'm not too sure what is an equivalent utility in MS Windows (speedfan?). in a way i'm throttling it. i'd guess one could use a similar utility to manage that.

one of the factors which i think may lead to initial higher temperatures / frequencies is the intel's 'turboboost' or such equivalent features. 'turboboost' basically is the cpu internal overclocking mechanism that sets internal limits based on TDP power.

if i leave 'turboboost' on i've seen r@h push the envelope of some 75-85 deg C. I'm not sure if in the original post's case the high temperature may be caused by such similar feature followed by automatic throttling in the cpu. (i think 'turboboost' may be disabled in the bios setup screens)

to avoid this situation, i set the max cpu frequency/speed such that it runs in the norm of 60-65 deg C. it still produce pretty good throughput for the r@h jobs running on all cores

the other issue may be to check the heatsink and fan etc, could it be that the fan is running at low speeds or even *stopped* when r@h launches? that may point to perhaps the cpu pwm fan control app (bios?) or perhaps some of its parameter settings.


My gaming laptop (ASUS ROG) runs @ 95 C during summer (drops down to around 85 C during winter). It's been running like this for almost 3 years now... Tj. max is 105C, so yeah... you could let your CPU run as fast as it wants. Worst thing that could happen is throttling.
ID: 77696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 77700 - Posted: 28 Nov 2014, 15:23:09 UTC

Always keep in mind that temp. of CPU is one thing, but temp of the disk drive is another. So if having such a hot CPU starts raising the temp on the disk, then that probably shortens life of the disk. But for a few seconds at a time, I don't think the disk will see any change.

FWIW, I'm thinking that what you must be observing is the unzipping of some of the data that is used to process a task. I believe that's one of the first things a task does as it starts, and it would be rather intensive if the tasks gets to run flat out (i.e. no higher priority tasks preempting it).

But I'd double check for dust bunnies. Esp. between the fan and the heat sink.
Rosetta Moderator: Mod.Sense
ID: 77700 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,606,783
RAC: 3,431
Message 77712 - Posted: 2 Dec 2014, 16:03:44 UTC

As I continue my overclocking experiments I've been watching temperatures at start-up pretty closely - on an AMD FX8120 if that makes a difference. Just to say I'm not seeing any spike at the start and restart of tasks.

I upgraded my CPU fan some months ago and while temperatures improved they weren't as much as I expected tbh. By the nature of these things I boosted my clock-speed a little more and am back where I started on temperatures up to the point I had the PSU incident I mentioned earlier.

After a brief chat with the guy who does my hardware upgrades he casually mentioned that case fans (rather than CPU fans) make a difference too. I thought that'd only be marginal, but they're cheap so I thought I'd cover that angle.

All temps dropped 10C! I dropped their speed to take advantage of their quiet options and was still 8C better off. I think I found out why I wasn't getting the expected benefit from the CPU fan earlier in the year.

Overall this year, I've increased the multiplier from 16.5 (already OC'd from 15.5 stock) to 18.0 at much the same voltages as before and temps around 49C - anything from 12-15C lower than I started.

The lesson being, there's a lot that can be done cheaply to balance up cooling rather than obsessing on symptoms

Using:
Arctic Cooling Freezer A30 AMD CPU Cooler - £27$42
Cooler Master: JetFlo 120 - £10$16
ID: 77712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Temperature spike in beginning of Rosetta WU



©2025 University of Washington
https://www.bakerlab.org