Message boards : Number crunching : System Restarts Win 7 Intel i7
Author | Message |
---|---|
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
All: I have R@H running on several systems with no issues. I just installed BOINC on Win 7 running on a Core i7 and it is having lots of problems. The system restarts, most of my WUs go to 100% with computation error, nothing is working correctly here. I did not see any posts so I assume everyone else is fine and I need to take a closer look at this system. I am not sure where to look. If you have similar issues, please post. thx Thx! Paul |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,626 RAC: 3,201 |
All: I do not have these problems but will help start the troubleshooting process....how many projects are you running on that pc? It is best to try and stick with one until you get this solved if possible. If more than one project do you have the setting "Leave applications in memory while suspended? (suspended applications will consume swap space if 'yes')" set to yes? It is under your account, computing preferences and then in the top section. Do you have the pc hyper-threaded meaning using all 8 cpu's or are you using just 4 cpu's? |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
I only crunch R@H. When I look at the failed WUs, they all have some failure to find file message in them. I have leave application in memory when suspended. I have 9GB of RAM and Win 7 64-bit so I don't think I am using much swap space. I am thinking about setting my swap space to 0K. HyperThreading is on so I get 8 WUs. thx for the help. Thx! Paul |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Failure to find a file would tend to point to either a network problem, where the BOINC client was unable to download the file; or to an authority problem where the file downloaded, but now the BOINC client is not authorized to access it. Wow Paul, that's a lot of machines! Which host ID is having problems? ...oh here it is, the only Win7 machine: https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1187725 I see some tasks had the file problem as you indicated: app_version download error: couldn't get input files: <file_xfer_error> <file_name>minirosetta_database_rev33769.zip</file_name> <error_code>-120</error_code> <error_message>signature verification failed</error_message> But others ran for over an hour before generating an exception. Which would imply that you now did get a good copy of the database. Which would seem like progress. At this point, the machine has consumed it's full quota of work for the day, and won't be able to download more until a good result is returned, or a day passes. Have you installed on Win7 elsewhere? There were some new "features" from M$ that were catching people there as I recall. Rosetta Moderator: Mod.Sense |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
Thx for the response. I finally had a successful WU. Maybe I just had a bad batch, maybe it is an AV thing, I don't know. I currently have BOINC limited to 60% of the CPU. It looks like I have 3 more WUs that will complete in the next few min. If all of them succeed, I will increase the max CPU percentage to 80%. Maybe I have a CPU or heat issue. I will keep watching things. I gotta my Q6600 back on line today as well. work work work Thx! Paul |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
No, shouldn't be a heat issue. If you think about it, none of the failures even ran long enough to make a heat issue :) Yes, perhaps AV quarantined files (i.e. moved them from their expected location) Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,626 RAC: 3,201 |
Thx for the response. Limiting the cpu to a percentage like you do has been a problem in some cases. It is better to reduce your cache than limit your cpu percentage. Open her up but limit your cache to 0.01, that way you should only get 1 unit, per cpu, at a time, hopefully you are not on a pay as you go cable plan. On the thing about AV you should exclude the Boinc directories from your AV as sometimes they get over aggressive and cause problems. Periodically you will see threads on all the boards about AV such and such is causing crashes with Boinc, then the next month it is a different AV doing it. On the heat issue do you have the 'cool and quiet' or whatever Intel calls it turned on in the bios? If so turn it off, it too can be over aggressive. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,673,616 RAC: 11,118 |
I'd say the first thing to check is prime95. There's a 64-bit version. It was failing for me on my most recent build because of a memory error that memtest86+ wasn't picking up. It ran fine when I clocked the memory down... |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
This is really weird. I had no issues all day yesterday but it looks like a 1:30 AM, I had a reboot?? Does Win 7 have a way to record the crash? I looked at the failed work units and it does not look like a CPU issue but it is odd that the computer completely restarts. I have smartfan enabled because the thing is super loud if I don't. I added two case fans to keep this cool and they don't make much noise. It looks like the CPUs are staying cool. I am going to cap the cpu at 60% for a few days and see if that fixes the problem. If so, I will increase that limit over time to see where the problems begin. Troubleshooting - ugh Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
I just looked at the failed units and hope someone can point me in the right direction. Any insight is greatly appreciated Thx! Paul |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,673,616 RAC: 11,118 |
Rosetta might validate the results but it doesn't mean the computer isn't making mistakes. My Phenom II submitted a couple of Rosetta tasks successfully but would crash or restart occasionally, and wouldn't pass Prime95. Prime95 showed that the problem only went away when I dropped the DDR3 from 667MHz (auto) to 400 or 533MHz... If it doesn't pass Prime95 then something is wrong and it is likely to be submitting incorrect results. P.S. all versions (inc 64-bit) are here: http://www.mersenne.org/freesoft/ |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
no overclocking on this system at all I will try prime95 and let you know. Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
No memtest on this computer yet. It is interesting that it can go for hours between reboots. 9GB of RAM provides lots of workspace. BOINC now shares this computer with Folding @ home because the ATI HD4850 needs something to do. No reboot last night but I did have 1 failed WU on R&H with a computation error. Is there a good memtest tool for Win 7 that can check 9GB - 12GB of RAM? It would be good get to the root of the problem. thx Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
The last failed WUs had the same message: Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x009946BF read attempt to address 0x98D67424 Engaging BOINC Windows Runtime Debugger... Anyone have ideas as to what I can do to make this go away? Thx! Paul |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,673,616 RAC: 11,118 |
No memtest on this computer yet. It is interesting that it can go for hours between reboots. 9GB of RAM provides lots of workspace. There is - memtest86+ (note the +) will do 12GB: http://www.memtest.org/download/4.00/memtest86+-4.00.iso.zip but it didn't find my memory errors - but it said in memtest that teh memory was running at 475MHz (or something around there) and in Windows it was running at 667MHz and it was the speed causing the problems! Prime95 is the most reliable method to test for stability. If that fails then you can change settings or remove memory until it passes to isolate. |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
Thanks for all of the suggestions. I encountered a couple of STOP errors today and all of them had to do with page file corruption issues. It looks like this is usually caused by a ill behaved driver. All of the drivers are now current so I will let things run for a few days and see what happens. Windows 7 does not find the most recent drivers, just drivers that worked once in the past. Thanks again for all the help and keep crunching! Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
All of my failed work units indicate Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x005763AF write attempt to address 0x00000027 It looks like Minirosetta 1.98 had a similar issue with Win 7. I will move my comments to the Minirosetta 2.00 bug thread. Thx! Paul |
Paul Send message Joined: 29 Oct 05 Posts: 193 Credit: 66,418,517 RAC: 9,548 |
Updated my video and storage drivers and no reboots for 24 hours. Now the Core i7 is starting to show some progress. It would be great to get this system up to 2,500 - 3,000 credits a day. It might not make it with Folding@Home running but I can't ignore the ATI HD 4850 and it needs something to do. What kind of credit should I expect from this system? thx Thx! Paul |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,626 RAC: 3,201 |
Updated my video and storage drivers and no reboots for 24 hours. Well since you are running Boinc 6.10.18 you can attach to Collatz and get that gpu crunching on their units. Here is the website: http://boinc.thesonntags.com/collatz/ If you go into the website setting for them you can say you only want gpu units and then you can still crunch here with the cpu. Collatz is a math problem project and works thru Boinc, so no extra software needed like Folding. As for credits you should be able to get in the 30,000+ RAC range with your gpu alone over there. You can use all 8 of your cpus here and your gpu there for a total of 9 units crunching all at once. |
DJStarfox Send message Joined: 19 Jul 07 Posts: 145 Credit: 1,250,162 RAC: 0 |
I only crunch R@H. When I look at the failed WUs, they all have some failure to find file message in them. 9 GB is a lot of ram... must be 3gb per stick. What is your memory's speed? If you're overclocking your ram, or the ram has errors, it can cause a computer to spontaneously reboot (ECC exceptions). I recommend: Download a copy of memtest86++ and burn the ISO to a CD. Let us know if it finds any errors (probably 2 hours to run). http://www.memtest86.com/download.html |
Message boards :
Number crunching :
System Restarts Win 7 Intel i7
©2024 University of Washington
https://www.bakerlab.org