Message boards : Number crunching : compute errors
Author | Message |
---|---|
Bubba Send message Joined: 2 Aug 07 Posts: 2 Credit: 616,815 RAC: 0 |
Reinstalled boinc and tried several setups. I am now only crunching 4 failed units a day! Running Win7 64 bit and boinc 64 bit. 455390035 Name jsr_decoys_cst_2i6c_abrelax_34261_380_0 Workunit 415606757 Created 12 Oct 2011 7:41:19 UTC Sent 12 Oct 2011 7:52:19 UTC Received 12 Oct 2011 22:03:49 UTC Server state Over Outcome Client error Client state New Exit status 0 (0x0) Computer ID 1482956 Report deadline 22 Oct 2011 7:52:19 UTC CPU time 10317.08 stderr out <core_client_version>6.12.34</core_client_version> <![CDATA[ <stderr_txt> [2011-10-12 2:42:15:] :: BOINC:: Initializing ... ok. [2011-10-12 2:42:15:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/2i6cA_cst.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _00001 Starting work on structure: _00002 Starting work on structure: _00003 ====================================================== DONE :: 1 starting structures 10316.1 cpu seconds This process generated 3 decoys from 3 attempts ====================================================== BOINC :: WS_max 2.91103e+008 BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down cleanly ... called boinc_finish </stderr_txt> ]]> Validate state Invalid Claimed credit 85.0680325680518 Granted credit 85.0680325680518 application version --- |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,905,033 RAC: 1,827 |
Bad RAM, excessive overclock, excessive temperatures, bad BOINC/rosetta file, or faulty PSU (often very difficult to catch) would be where I'd start looking, in that order... Prime95 stress test is a good starting point. |
Bubba Send message Joined: 2 Aug 07 Posts: 2 Credit: 616,815 RAC: 0 |
Does not seem to be the issue running Gpugrid and WCG. IMHO I think it is rosetta! |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Rosetta utilizes memory more intensively than many other BOINC applications. And comparisons to an application running primarily on another processor in the machine (i.e. GPU) are really not meaningful. Look at it the other way around, if the is with Rosetta, then why are there not more people having such a problem? Rosetta Moderator: Mod.Sense |
Symeon Send message Joined: 14 Sep 11 Posts: 1 Credit: 3,166 RAC: 0 |
i'm also having this error, my cpu and ram is overclocked but pass all stress test, i dont see why my cpu/ram would make faulty calculation only with Rosetta and not the stres stest and Memtest86+... |
P . P . L . Send message Joined: 20 Aug 06 Posts: 581 Credit: 4,865,274 RAC: 0 |
i'm also having this error, my cpu and ram is overclocked but pass all stress test, i dont see why my cpu/ram would make faulty calculation only with Rosetta and not the stres stest and Memtest86+... Well put your system back to stock and see if you still get errors, if not then you've got your answer! To easy. my 2c worth. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,905,033 RAC: 1,827 |
i'm also having this error, my cpu and ram is overclocked but pass all stress test, i dont see why my cpu/ram would make faulty calculation only with Rosetta and not the stres stest and Memtest86+... Your errors appear to be related to the default.out problem that's mentioned in these threads: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5833 https://boinc.bakerlab.org/rosetta/forum_thread.php?id=5835 So nothing to worry about if those are the only errors. Danny |
robertmiles Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,370,910 RAC: 1,389 |
i'm also having this error, my cpu and ram is overclocked but pass all stress test, i dont see why my cpu/ram would make faulty calculation only with Rosetta and not the stres stest and Memtest86+... From what I've seen, the error messages referring to default.out only mean that, due to an earlier error, there was no output file named default.out. Therefore, you need to compare the earlier error messages as well. |
TPCBF Send message Joined: 29 Nov 10 Posts: 111 Credit: 5,203,269 RAC: 3,147 |
Got the same kind of compute errors now too, claiming problems with the .out file, which I think is a red herring. This and a number of validate errors started to show up since the update to 3.17... Ralf |
Rocco Moretti Send message Joined: 18 May 10 Posts: 66 Credit: 585,745 RAC: 0 |
Got the same kind of compute errors now too, claiming problems with the .out file, which I think is a red herring. This and a number of validate errors started to show up since the update to 3.17... cmiles talked about the reason in another thread Basically, one of the changes that happened during 3.17 is that one of the protein movers used in protein-protein interface design changed names. (This was done to avoid a name collision with another protein mover which was added to Rosetta.) This meant that runs which worked perfectly well during previous versions of Rosetta@Home now crash. Theoretically, these sorts of issues should be discovered when we test new versions of the client on RALPH@home, but this one happened to slip through, and wasn't discovered until a large number of jobs using the renamed mover were launched. The fallout is that the validators and assimilators on the servers are swamped with the large number of jobs which are sent out and almost immediately come back with errors. We've killed the bad jobs with as much firepower as we can reasonably bring to bear, but unfortunately it'll take a little while for the servers to work through the backlog of bad jobs that have been sent out. Thanks for your patience. |
Message boards :
Number crunching :
compute errors
©2025 University of Washington
https://www.bakerlab.org