Message boards : Number crunching : Strange problem with dual Xeon machine
Previous · 1 · 2
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Mod.Sense - can you have a look at this as the scores don't look right. Any ideas? I don't have any additional means to look in to this either. But let's compare your 2.66 Ghz Xeon with your 3Ghz P4. I found WUs with the same name and batch number in the results list for each. Xeon WU CPU seconds: 8,074 Models: 2 claimed: 33.27 granted: 10.39 P4 WU CPU seconds: 9,430 Models: 3 claimed: 20.60 granted: 15.57 Credit is granted based on the completed models. The Xeon only completed two, and so received 2/3'rds the credit of the P4 which completed 3. But your Xeon is running 8 CPUs, and the P4 only 2. So if you calculated credit per hour of CPU on the above, the Xeon is pulling much more credit per hour. (37.06 vs 11.89). And once it has been reporting in work consistently for 2 weeks, you will see this reflected in the RAC for the Xeon. Rosetta Moderator: Mod.Sense |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Thanks; regarding the memory, that's a big relief. I keep thinking that something is grossly wrong in my CMOS, but for the most part I have the defaults set. I do have it set so that the CPU doesn't slow down if it gets hot, and I installed a 4-fan controller and are keeping the case fans on max. I still can't get over why the Integer speed is so low. Thanks for the link to CPUID! Running that, it says the CPU is running at 2.66Ghz, but I was surpised that the max bandwidth of the FBDIMM PC2-5300 DDR2 667 ram is only 333Mhz. I guess they split the bandwidth for each CPU? I have one 4-Gig Ram chip in Slot 0 of each of 4 banks. This RAM was recommended by ASUS as being compatible, but I just couldn't afford faster RAM. |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Thank you very, very much! I'm sure that my tweaking bios settings a couple times a day isn't helping at all, so I'm trying to do it as little as possible. I really appreciate you taking the time to search for identical work units across my systems and compare them. I hadn't thought of that! Mod.Sense - can you have a look at this as the scores don't look right. Any ideas? |
![]() Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,920,601 RAC: 3,042 ![]() |
those Xeons are Core2/Penryn based and should get at least 90% of the credit per core that my core2 duo gets per hour (my C2D is 3.2GHz, but has only 2/3rds of the cache per core and is the slightly slower Conroe rather than Penryn). As it stands it is getting around 10 credits per 10000-second task where it should be getting more like 55 credits. Something not right... I think that either Sandra benchmarks will flag it up or there's something wrong with your BOINC/Rosetta installation maybe? |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
I downloaded Sandra Lite this morning, and the Benchmark test is coming up with the same Integer speed as Rosetta Benchmarks-approx 48563 Dhrystone MIPS. Strangely, one of the comparison CPU's in Sandra is a Clovertown 2.33 Ghz model (E5345) which shows 84426 MIPs. Whetstone is similar: 41932 MFLOPS for mine (2.66 Ghz E5430 Harpertown) and 58749 MFLOPS for the 2.33 Clovertown. Since the 2.33 Ghz Clovertown was actually cheaper than my Harpertown processors, I'm scratching my head at these results.... Could the difference be because he's using XP PRO/64 while I'm using Pro/32? |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Further, comparing it to another 2.66 Dual CPU Xeon X5355 machine (although not the same exact model) on BOINC showed that computer (owned by ROBiie) showed a FPS of 2531.02 but an astounding 8193.16 Integer speed! While win-64 will probably have little or no effect in Rosetta@home, didn't you say your system has 16 GB ram? Meaning, over 75% of the installed memory can't be used as long as runs 32-bit... The BOINC-benchmark seems to be higher on 64-bit, especially integer, but this isn't a good indication of actual performance-increase. I've no idea if Rosetta@home is influenced by cache-size and memory-bandwidth, so is possibly memory-bandwidth-limited then tries to run 8 instances even with large cache-size... One method to test this would be to see how performs then running only 1 instance, 2 instances, 3 ... upto 8. But, with the large variations in rosetta-wu's, should preferably test this by running the exact same wu on all cores... Hmm, it would be possible to test if single/dual-channel-memory has any effect, even if runs 8 different wu's... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Thank you!! I completely forgot about the memory limitation ceiling for 32-bit machine. That explains why it only shows 4 gig of memory even though I have 16 gig installed. I was looking in my BIOS to configure the memory. They have several different options: Rank Interleaving 1:1, 2:1 and 4:1. The default is 4:1, and that's where I left it. It also allows Branch Sparing, but it's default is disabled, so that's where I kept it too. I'll have to see whether I get more credit per WU by running less WU's at once, but in the long run I wonder if it would just balance out--less WUs complete, but fewer ones completed faster for more credit per WU....
|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
I wonder if it would just balance out--less WUs complete, but fewer ones completed faster for more credit per WU.... I think the point was that if a test running say 2 WUs shows better credit per hour per core (with 2 cores) then when running 8 WUs at once, it would basically prove that memory is constrained and therefore it would indicate it might be worth installing an OS that can support all of your memory. Then you could expect to run all 8 at once and yield the same (better) credit per core you saw while running 2. Rosetta Moderator: Mod.Sense |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
I see. I didn't catch that. Thank you! I'll give it a try right away. I wonder if it would just balance out--less WUs complete, but fewer ones completed faster for more credit per WU.... |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
I wonder if it would just balance out--less WUs complete, but fewer ones completed faster for more credit per WU.... Yes, if running 1 instance shows example 50 credit/hour per core and 8 instances shows 30 credit/hour per core, it's a clear indication your computer is memory-bandwidth-limited in Rosetta@home. In this example Rosetta@home is likely maxed-out at 5 or 6 cores, meaning it's probably possible to find another non-memory-bandwidth-limited BOINC-project and run this for 25% of the time "for free", since Rosetta@home will get the same credit/day regardless of uses all 8 cores or only 6 of them... Or, switching to faster memory would increase Rosetta@home-production... If on the other hand 1 instance gives example 50 credit/hour per core and 8 instances gives 47 credit/hour per core, it indicates Rosetta@home is not memory-bandwidth-limited, and there's likely another reason for your computers mediocre Rosetta@home-production... As for switching to 64-bit OS, this should be done regardless of whatever test-results you're getting, since running an OS that can't use 78% of installed memory doesn't make much sence... Wouldn't expect OS-switch will change anything significantly for Rosetta@home... ... Except... I've no idea if it's true or not for some mainboards, but if you're running 4x 4 GB-memory-sticks, it's maybe possible your mainboard somehow only uses the 1st. memory-stick in win32, so in practice it's single-channel-mode in win32... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Running the memory Bandwidth test in Sandra revealed a memory issue I'm trying to figure out. There were two notes after running the benchmark (My system was at the bottom of the chart as far as performance goes). The first note says, "System bandwidth appears FSB limited. Attempt to increase FSB." The second note says, "Low bandwith efficiency (advanced). Check memory timings and settings." I changed the memory configuration from Branch Sequencing to Branch Interleaving under the Northbridge Chipset Configuration menu, and I no longer get the "Low bandwidth efficiency" error message, but I still get the FSB Limited error message. Where do I change FSB? In the BIOS under Advanced CPU Settings , there is no option for changing FSB. There IS an option for Ratio CMOS Setting, and I've set it to the max of 8. It was originally the default of 6. I also have Virtualization Technology disabled. I don't know what would be limiting my FSB. All BIOS options to slow the CPU down for overheating (CPU TM, Speedstep) are disabled. There is also a Rank Interleaving option under the Northbridge Chipset Configuration. The default is 4:1, and that's what mine is set at. There is also a 1:1 and a 2:1 option, but I haven't tried those options yet. Would changing this improve performance, or is it already set at the best setting? My CPUs are 1333 FSB capable, and I'm sure the ASUS DSEB-D16/SAS Mobo is also. There ia a note in the Mobo manual that says, "The FBDIMM 800 Mhz has to work with the 1600FSB CPU or above. Otherwise, the memory module downgrades and runs at the speed of 667Mhz." I'm using PC2-5300 DDR667 FBDIMMs anyway, and since the Harpertown CPUs only run at 1333FSB, the memory shouldn't be downgrading. Sandra shows my memory timings at 5.0-5-5-15. One final note. After changing the Northbridge chipset from Branch Sequencing to Branch Interleaving, and forcing the BIOS Ratio CMOS setting from 6 to 8, I re-ran the Processor Arithmatic Test on Sandra, and this time my 2.66 Harpertown is beating the 2.33Ghz Clovertown comparison CPU. My numbers are now Dhrystone ALU 89834 MIPS and Whetstone iSSE3 77535 MFLOPS. I'm still getting the bandwidth FSB Limited error message under the Memory Bandwidth test, however. I wonder if it would just balance out--less WUs complete, but fewer ones completed faster for more credit per WU.... |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
Wow, so much to learn... Many thanks to everyone for helping answer my myriad questions regarding system performance!!! I'm running two WU's right now and have about 3 hours till they are completed. Then I'll see what credit I received for them. Yes, I was considering removing the other 3 sticks of 4-gig RAM; it's just that I don't have another FBDIMM-capable motherboard laying around....However, if it's somehow slowing me down leaving them in there, then I'll remove them! Please see my rather lengthy post on Sandra testing this morning. I got some significant improvements in CPU performance; although memory bandwidth due to FSB is still a problem. I'm still trying to figure out how to increase FSB in my BIOS. I think it's the Ratio CMOS setting, and that's now set to the max of 8 (it was at 6). Actually, according to Sandra's Computer Overview section, my FSB is running at 1.33Ghz. Why I'm still getting an FSB-limited error under Memory Bandwidth testing is very confusing.
|
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
I just noticed that my memory Bandwidth Efficiency % in Sandra is only 52.48!!! The program mentioned the risks of sharing bandwidth with the on-board VGA adapter (just like Paul D. Buck mentioned earlier in this thread), so when the new PCIExpress-2.0 card arrives Monday it will be interesting to see how it impacts the memory bandwidth efficiency. I was really shocked to see how low it really is on my machine! I removed two banks of 4G FBDIMMs, making sure they were not Slots 0 & 1. My Bandwidth Efficiency dropped to 37.03%!! So, even though XP Pro/32 cannot utilize anything above 4 gig, apparently the system can use the DIMMs to increase bandwidth. Interesting!!!
|
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
Wow, so much to learn... Many thanks to everyone for helping answer my myriad questions regarding system performance!!! Well, if system has only 1 stick, the memory-bandwidth should drop to half (or possibly 1/4th) of currently, meaning much worse than currently. If it's not dropping, it atleast indicates win32 is only using 1 memory-stick...
Taking a look on the manual, wow, 16 memory-slots, 4 channels, so should atleast in theory have a ton of memory-bandwidth... Hmm, with 4 sticks, the optimal is to put one in each "channel". Make sure it's in "DIMM_00", "DIMM_10", "DIMM_20", "DIMM_30", it's likely easy to put it wrong... Hmm, would guess the BIOS "System Memory Information" will show there each stick is placed? In BIOS, on "NorthBridge Chipset Configuration", would guess the optimal is: "MCH Branch Mode" - Interleaving "*** sparing" - disabled "Branc 0/1" - enabled "Rank Interleaving" - good question... Hmm, not sure if 1:1 is best here or not, with 2 sticks in each "branch"... You'll have to test this... "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
![]() Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
You can also get odd results in the case of using "oversized" ram sticks ... in other words, but using the 4G sticks the system is fumbling around confused by the "excess" ... it has been so long since I have struggled with these issues that I cannot offer more. You COULD also try over at the SAH boards though you are interested in RAH processing. Some of the Over-Clocking crowd over there may be able to offer more sage advice as to settings ... Though it is limiting me as to projects in the long run, and I am still struggling with some issues with the new system ... I am sure glad I am moving to Mac dominance in this house ... far less odd issues ... Even better, my brother is getting a half a ton of old PC parts I have been lugging about for like forever ... |
Ingleside Send message Joined: 25 Sep 05 Posts: 107 Credit: 1,514,472 RAC: 0 |
You can also get odd results in the case of using "oversized" ram sticks ... in other words, but using the 4G sticks the system is fumbling around confused by the "excess" ... it has been so long since I have struggled with these issues that I cannot offer more. According to manual the board supports upto 128 GB memory, with 8 GB-dimms, but it's possible win32 gets confused by so much memory... Long time since last, Paul. :) "I make so many mistakes. But then just think of all the mistakes I don't make, although I might." |
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
I verified they are in DIMM_00, etc. I've been changing BIOS settings one at a time, rebooting, then re-running Sandra. Branch Interleaving gives the best performance regarding bandwidth efficiency, and sparing is disabled. As for Rank Interleaving, I tried all three choices and 4:1 was the best, but even the worst didn't lower my bandwidth efficiency by more than 10%. As of right now, it's showing 52% efficient and that's the best I've been able to reach. It would be interesting swapping out all 4 4-gig DIMMS and replacing them with 1-gig DIMMS, but that would be an expensive test... I think I'll wait for the video card to arrive and see how that improves bandwidth. I've been rebooting the machine so often this morning that I question the validity of the last two results posted for that machine. Rosetta says the claimed credit was approx 15, but the granted credit was 32. I wonder if that's because I boosted the Ratio CMOS Setting from 6 to 8, and now the CPU is performing much better even though memory bandwidth is still suffering. The Rosetta benchmarks were based on the CPU before I made the changes... Now I'm back to running 6 processes. I noticed that whenever I change the number of CPUs available, Rosetta re-runs the benchmark test. We'll see what this next batch shows in a few hours. I've just about changed every BIOS setting I can find, so I think I've run out of options there to improve memory bandwidth efficiency. Interestingly enough, I just finished running Sandra on my two other 2.4 Ghz Quad machines. They both have 4 gigs of DDR800 RAM installed (so no excess RAM for XP Pro32 to be confused over), and both have PCI Express V1 video cards (Intel D975XBX2KR Mobos). They both show memory bandwidth efficiencies of 56 and 58%. So, it seems that my memory bandwidth deficiencies are not limited to the Dual Xeon machine alone. Granted credits for those two machines are in the low 50's. My head hurts.... ;o)
|
Dusty Send message Joined: 1 Mar 08 Posts: 41 Credit: 2,667,354 RAC: 0 |
WOOOOOOOTTT! Ok, the last series of 8 WU's were in the 10ksec range and had granted credits in the mid-50s. Much better than the 10-15 I was getting before!!! Thanks to everyone for all the great advice. I learned a whole lot about CPU and memory performance, not to mention all those pesky BIOS settings that can really cripple a system. I'm still looking forward to seeing if my memory bandwidth efficiency improves with the addition of a video card instead of using the MB video, but otherwise I'm much happier with the results! You can also get odd results in the case of using "oversized" ram sticks ... in other words, but using the 4G sticks the system is fumbling around confused by the "excess" ... it has been so long since I have struggled with these issues that I cannot offer more. |
Message boards :
Number crunching :
Strange problem with dual Xeon machine
©2025 University of Washington
https://www.bakerlab.org