Message boards : Number crunching : Having problems with client and compute errors.
Author | Message |
---|---|
NUTNDUN Send message Joined: 31 Dec 08 Posts: 11 Credit: 1,047,811 RAC: 0 |
I don't know what I need to do other then to back it off to stock speed but it runs prime95 for hours with no problems. The machine I am having problems with is a Q9550 quad core running at 3.4 right now. I had it running at 3.7 on Sunday through the night then on monday that is when I started having the compute errors. When I got home last night I added the second hard drive I was waiting on and set up a raid 0 array and continued to reinstall the os which is windows 7. Once the os was installed I got boinc up and running and I had the cpu running at 3.4 all night. I didn't have any problems until today for some reason it just sent in a couple more compute error work units. Besides setting it to run at stock speeds is there anything else that may be causing the problem? Could the raid 0 have an affect on boinc or rosetta? I don't understand how the system can bench fine on prime but be erroring all the time on rosetta. Could my problem more so be the memory? I have the voltage on it set to auto which is 1.8v but if I am not mistaken it is spec'd for 2.1. It is 2 x 2gb OCZ platinum. Thanks in advance for any help. |
Dagorath Send message Joined: 20 Apr 06 Posts: 32 Credit: 29,176 RAC: 0 |
When you change 2 things at the same time (for example the OC and the RAID) you make it difficult to discover the cause of the problem via process of elimination. Leave the RAID as it is and don't change ANYTHING else but the OC. Put the clocks and voltages back to stock. Ignore compute errors on any tasks that started crunching when it was OC'd for they may have been tainted by the OC before they errored. Let it run on stock settings for several days not just several hours. If you still get compute errors on BOINC tasks then leave it at stock speeds and tweak other things one at a time until it runs right. It could be the RAID, BOINC settings, dirty power supply or 100 other things but don't make the mistake of thinking that if it stills gives errors at stock speeds then the problem can't be OC. When you eventually get it running right at stock speed for several weeks then try a conservative OC for at least a week. If that works then try a little more OC. Prime95 is only 1 test and lots of people think it's not as rigorous/difficult a test as many BOINC projects are. Passing prime95 doesn't mean it will pass other tests. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
NUTNDUN Send message Joined: 31 Dec 08 Posts: 11 Credit: 1,047,811 RAC: 0 |
I agree with your point. I am going to set it back to stock speeds and see how things go. I copied part of one of the errored work units, I will paste it here. I don't know if maybe someone can tell from it if it was the cpu or the memory or maybe something else. Task ID 223951384 Name 1nkuA_BOINC_MPZN_with_zinc_abrelax_cs_frags_6231_93299_0 Workunit 204102693 Created 26 Jan 2009 23:11:36 UTC Sent 26 Jan 2009 23:14:04 UTC Received 27 Jan 2009 16:29:34 UTC Server state Over Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) Computer ID 989975 Report deadline 5 Feb 2009 23:14:04 UTC CPU time 6529.999 stderr out <core_client_version>6.4.5</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 1-27 8:45: 5:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Starting work on structure: _00002 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x7730B19B read attempt to address 0x8C5799DC Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.5.0 Dump Timestamp : 01/27/09 10:34:03 Install Directory : Data Directory : C:BOINC_DATA Project Symstore : https://boinc.bakerlab.org/rosetta/symstore LoadLibraryA( C:BOINC_DATAdbghelp.dll ): GetLastError = 126 Loaded Library : dbghelp.dll LoadLibraryA( C:BOINC_DATAsymsrv.dll ): GetLastError = 126 LoadLibraryA( symsrv.dll ): GetLastError = 126 LoadLibraryA( C:BOINC_DATAsrcsrv.dll ): GetLastError = 126 LoadLibraryA( srcsrv.dll ): GetLastError = 126 LoadLibraryA( C:BOINC_DATAversion.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:BOINC_DATAslots2;C:BOINC_DATAprojectsboinc.bakerlab.org_rosetta;srv*C:BOINC_DATAprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:BOINC_DATAprojectsboinc.bakerlab.org_rosettasymbols*https://boinc.bakerlab.org/rosetta/symstore;srv*C:BOINC_DATAprojectsboinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore ModLoad: 00400000 00724000 C:BOINC_DATAprojectsboinc.bakerlab.org_rosettaminirosetta_1.54_windows_x86_64.exe (-nosymbols- Symbols Loaded) Linked PDB Filename : D:boinc_buildminirosetta_windowsminiVisual StudioBoincReleaseminirosetta_1.54_windows_intelx86.pdb |
Dagorath Send message Joined: 20 Apr 06 Posts: 32 Credit: 29,176 RAC: 0 |
One clue is here:
There was an Access Violation which means the application tried to read an address that was not part of its assigned address space. Programs are not allowed to read addresses that are not in their assigned memory space. Access Violations are often caused by pointers gone awry. Faulty memory, OC'd or faulty CPU or programming errors can cause pointers to go awry and point to an address outside of the program's assigned address space. The error report itself doesn't give enough info to favor any one of those possible causes over the others. Experience with OC, however, tells us that if you're OCing and getting Access Violations then the first thing you should do is drop back to stock settings. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
NUTNDUN Send message Joined: 31 Dec 08 Posts: 11 Credit: 1,047,811 RAC: 0 |
I am going to drop it back as soon as I get home from work. I called my wife and had her shut the pc down for now. I have a feeling it is the memory, it didn't like even +1 on the fsb over stock 1066 and I still had it on auto voltage which puts it at 1.8 volts and I am pretty sure it needs to be at 2.1. I will drop it back and get it running steady again for a couple weeks and slowly start upping it. I know it will do 4ghz easily but it is a matter of me finding the right settings. I just need to realize it is going to take some time. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I am going to drop it back as soon as I get home from work. I called my wife and had her shut the pc down for now. I have a feeling it is the memory, it didn't like even +1 on the fsb over stock 1066 and I still had it on auto voltage which puts it at 1.8 volts and I am pretty sure it needs to be at 2.1. what your machine can do and what Rosie likes are two different things. i noticed with einstein i could push my machine to the max, but rosie didn't like that speed at all. i was OC'ing my machine pretty hard and ran into access violations within minutes of completion or halfway through the task. i dropped my speed back about 25mhz and the slowly crept it back up to just below the crash point and everything is fine now. so you can use higher than stock speed, but you need to find how high you can go, but still be below the speed you are at now. |
NUTNDUN Send message Joined: 31 Dec 08 Posts: 11 Credit: 1,047,811 RAC: 0 |
I think I was trying to go to far too quick. I just built my two new folders last week and got the last of the parts yesterday to finish mine even though it was already running. I have a lot to learn with the newer systems and so many options as far as voltages and oc options. Hopefully I can get her under control and running right. Thank you all for your help. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
I think I was trying to go to far too quick. I just built my two new folders last week and got the last of the parts yesterday to finish mine even though it was already running. I have a lot to learn with the newer systems and so many options as far as voltages and oc options. Hopefully I can get her under control and running right. Thank you all for your help. If you got any Asus boards you should use the AI suite and just use the clock speed adjust and don't mess with voltages. |
NUTNDUN Send message Joined: 31 Dec 08 Posts: 11 Credit: 1,047,811 RAC: 0 |
If you got any Asus boards you should use the AI suite and just use the clock speed adjust and don't mess with voltages. I have gigabyte ud3p boards. I could use easy tune to do it but I don't like have a bunch of extra programs running. I will just take my time and learn to be patient. Only change one thing at a time so I can keep track of what messes things up. |
NUTNDUN Send message Joined: 31 Dec 08 Posts: 11 Credit: 1,047,811 RAC: 0 |
I just wanted to update my thread. I got everything taken care of. I didn't have my voltages set correctly and now everything is good to go and it is currently at 4.03. Thanks for the help. |
Gil Send message Joined: 10 Oct 06 Posts: 16 Credit: 30,279,035 RAC: 0 |
Hi guys, I am getting computation errors regularly. They all dispay the following message: Computation for ´TASK ID´ finished Output file ´TASK ID´ absent anyone knows what could be wrong? ( i have an OC QX9650 @ 400 FBS *9 ) hope thats not the issue though Thanks in advance for a helpful answer! Gil |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Gil, the 1.54 version has really stabilized the running of Rosetta on most environments. See my advice on OC here. Prove to yourself that OC is not the problem. If you have further problems, I'd suggest you post in the thread on Number Crunching board for the Rosetta version that was run when the problem occurred. Rosetta Moderator: Mod.Sense |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
Gil, When I looked at the tasks in your computer I saw about 6-8 with random failures at various addresses. This is a fairly low number as these things might go ... except that the indication SEEMS to me to be probably related to the OC ... perhaps the memory is not quite making it ... So, as Mod.Sense says, that is the first step. Turn off the OC back to stock and run a hundred off ... if they are still erroring out ... we go to step two ... |
Message boards :
Number crunching :
Having problems with client and compute errors.
©2024 University of Washington
https://www.bakerlab.org