Message boards : Number crunching : Problems with Minirosetta v1.54
Previous · 1 . . . 8 · 9 · 10 · 11 · 12 · 13 · 14 . . . 15 · Next
Author | Message |
---|---|
trick@planet3dnow Send message Joined: 21 Feb 09 Posts: 8 Credit: 53,370 RAC: 0 |
hi! as already posted here: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4771 on my pc (this one here): https://boinc.bakerlab.org/rosetta/show_host_detail.php?hostid=1012657 i get lots of validate errors and several client errors (too much to link each of them here). the usual symptome is greatly increased processing time. the work units should run 3 hours, but they run for 7 hours. when i notice that a work unit takes much too long, should i abort it? or let it run until it fails to validate after 7 hours? |
Alberthuang Send message Joined: 5 Dec 05 Posts: 6 Credit: 171,257 RAC: 0 |
My computer's OS is Windows XP SP3, using the BOINC manager version 5.10.45. It computed two workunits (1hz6A_BOINC_ABINITIO_IGNORE_THE_REST-MOO18-S25-9-S3-9--1hz6A-_7873_76 and lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652) with minirosetta version 1.54, and both of them showed compute error at last. Of course both of these workunits were invalid. The former one (workunit 1hz6A_BOINC_ABINITIO_IGNORE_THE_REST-MOO18-S25-9-S3-9--1hz6A-_7873_76) spent more than 4.5 hours CPU time in my computer. And a windows message showed that Windows C++ Runtime error when this workunit crashed. When this condition happened, I was using Mozilla Firefox browser V 3.0. And the Mozilla Firefox browser also accidently closed almost at the same time. The task detail is in the following: Task ID 234173364 Name 1hz6A_BOINC_ABINITIO_IGNORE_THE_REST-MOO18-S25-9-S3-9--1hz6A-_7873_76_0 Workunit 213483545 Created 9 Mar 2009 7:21:46 UTC Sent 9 Mar 2009 7:23:00 UTC Received 17 Mar 2009 8:07:24 UTC Server state Over Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) Computer ID 224205 Report deadline 19 Mar 2009 7:23:00 UTC CPU time 17563.45 stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 3-16 14:16:21:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _MOO18U9X9X_00001 # cpu_run_time_pref: 21600 Starting work on structure: _MOO18U9X9X_00002 Starting work on structure: _MOO18U9X9X_00003 Starting work on structure: _MOO18U9X9X_00004 BOINC:: Initializing ... ok. [2009- 3-17 11:23:26:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 21600 Starting work on structure: _MOO18U9X9X_00004 Continuing computation from checkpoint: chk_S_MOO18U9X9X_00000004_ClassicAbinitio__stage_1 ... success! Continuing computation from checkpoint: chk_S_MOO18U9X9X_00000004_ClassicAbinitio__stage_2 ... success! Starting work on structure: _MOO18U9X9X_00005 Starting work on structure: _MOO18U9X9X_00006 Starting work on structure: _MOO18U9X9X_00007 Starting work on structure: _MOO18U9X9X_00008 Starting work on structure: _MOO18U9X9X_00009 Starting work on structure: _MOO18U9X9X_00010 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.5.0 Dump Timestamp : 03/17/09 16:01:02 Install Directory : C:Program FilesBOINC Data Directory : C:Program FilesBOINC Project Symstore : Loaded Library : C:Program FilesBOINC\dbghelp.dll Loaded Library : C:Program FilesBOINC\symsrv.dll Loaded Library : C:Program FilesBOINC\srcsrv.dll LoadLibraryA( C:Program FilesBOINC\version.dll ): GetLastError = 126 Loaded Library : version.dll Debugger Engine : 4.0.5.0 Symbol Search Path: C:Program FilesBOINCslots1;C:Program FilesBOINCprojectsboinc.bakerlab.org_rosetta;srv*C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://msdl.microsoft.com/download/symbols;srv*C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettasymbols*http://boinc.berkeley.edu/symstore ModLoad: 00400000 00724000 C:Program FilesBOINCprojectsboinc.bakerlab.org_rosettaminirosetta_1.54_windows_intelx86.exe (-nosymbols- Symbols Loaded) Linked PDB Filename : D:boinc_buildminirosetta_windowsminiVisual StudioBoincReleaseminirosetta_1.54_windows_intelx86.pdb ModLoad: 7c920000 00094000 C:WINDOWSsystem32ntdll.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : ntdll.pdb File Version : 5.1.2600.5512 (xpsp.080413-2111) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 7c800000 0011f000 C:WINDOWSsystem32kernel32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : kernel32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2111) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 77d10000 0008f000 C:WINDOWSsystem32USER32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : user32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 77ef0000 00049000 C:WINDOWSsystem32GDI32.dll (5.1.2600.5698) (PDB Symbols Loaded) Linked PDB Filename : gdi32.pdb File Version : 5.1.2600.5698 (xpsp_sp3_gdr.081022-1932) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5698 ModLoad: 77da0000 000a7000 C:WINDOWSsystem32ADVAPI32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : advapi32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 77e50000 00092000 C:WINDOWSsystem32RPCRT4.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : rpcrt4.pdb File Version : 5.1.2600.5512 (xpsp.080413-2108) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5512 ModLoad: 77fc0000 00011000 C:WINDOWSsystem32Secur32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : secur32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5512 ModLoad: 76300000 0001d000 C:WINDOWSsystem32IMM32.DLL (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : imm32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5512 ModLoad: 621f0000 00009000 C:WINDOWSsystem32LPK.DLL (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : lpk.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5512 ModLoad: 73fa0000 0006b000 C:WINDOWSsystem32USP10.dll (1.420.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : usp10.pdb File Version : 1.0420.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft(R) Uniscribe Unicode script processor Product Version : 1.0420.2600.5512 ModLoad: 76cb0000 00020000 C:WINDOWSsystem32NTMARTA.DLL (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : ntmarta.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 77be0000 00058000 C:WINDOWSsystem32msvcrt.dll (7.0.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : msvcrt.pdb File Version : 7.0.2600.5512 (xpsp.080413-2111) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 7.0.2600.5512 ModLoad: 76990000 0013d000 C:WINDOWSsystem32ole32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : ole32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2108) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 71b70000 00013000 C:WINDOWSsystem32SAMLIB.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : samlib.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5512 ModLoad: 76f30000 0002c000 C:WINDOWSsystem32WLDAP32.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : wldap32.pdb File Version : 5.1.2600.5512 (xpsp.080413-2113) Company Name : Microsoft Corporation Product Name : Microsoft(R) Windows(R) Operating System Product Version : 5.1.2600.5512 ModLoad: 0b610000 00115000 C:Program FilesBOINCdbghelp.dll (6.6.7.5) (PDB Symbols Loaded) Linked PDB Filename : dbghelp.pdb File Version : 6.6.0007.5 (debuggers(dbg).051021-1446) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.6.0007.5 ModLoad: 0b830000 00083000 C:Program FilesBOINCsymsrv.dll (6.6.7.5) (PDB Symbols Loaded) Linked PDB Filename : symsrv.pdb File Version : 6.6.0007.5 (debuggers(dbg).051021-1446) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.6.0007.5 ModLoad: 0b8c0000 0003a000 C:Program FilesBOINCsrcsrv.dll (6.6.7.5) (PDB Symbols Loaded) Linked PDB Filename : srcsrv.pdb File Version : 6.6.0007.5 (debuggers(dbg).051021-1446) Company Name : Microsoft Corporation Product Name : Debugging Tools for Windows(R) Product Version : 6.6.0007.5 ModLoad: 77bd0000 00008000 C:WINDOWSsystem32version.dll (5.1.2600.5512) (PDB Symbols Loaded) Linked PDB Filename : version.pdb File Version : 5.1.2600.5512 (xpsp.080413-2105) Company Name : Microsoft Corporation Product Name : Microsoft® Windows® Operating System Product Version : 5.1.2600.5512 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 4199, Write: 0, Other 4119 - I/O Transfers Counters - Read: 0, Write: 283156, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 29464, QuotaPeakPagedPoolUsage: 29484 QuotaNonPagedPoolUsage: 3856, QuotaPeakNonPagedPoolUsage: 5104 - Virtual Memory Usage - VirtualSize: 288505856, PeakVirtualSize: 294109184 - Pagefile Usage - PagefileUsage: 177410048, PeakPagefileUsage: 180875264 - Working Set Size - WorkingSetSize: 44548096, PeakWorkingSetSize: 142151680, PageFaultCount: 4153040 *** Dump of thread ID 1256 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 929636736.000000, User Time: 118402555904.000000, Wait Time: 1696694.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024 - Registers - eax=097646c8 ebx=097646cc ecx=038ffe20 edx=038ffe20 esi=097646a0 edi=00000000 eip=0055b8c1 esp=0012c02c ebp=0ab9f938 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00010202 - Callstack - ChildEBP RetAddr Args to Child 0012c048 0061138e 00000000 fa25f9aa 0b235e50 097646a0 minirosetta_1.54_windows_intelx!+0x0 0012c068 006113fe 0b235e50 fa25f98a 00000001 097646a0 minirosetta_1.54_windows_intelx!+0x0 00000000 00000000 00000000 00000000 00000000 00000000 minirosetta_1.54_windows_intelx!+0x0 *** Dump of thread ID 672 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 1902736.000000, User Time: 6909936.000000, Wait Time: 1696719.000000 - Registers - eax=0164fb44 ebx=00000000 ecx=fa3739f2 edx=00000000 esi=00000000 edi=0164ff70 eip=7c92e4f4 esp=0164ff40 ebp=0164ff98 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 0164ff3c 7c92d1fc 7c8023f1 00000000 0164ff70 00000000 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0] 0164ff40 7c8023f1 00000000 0164ff70 00000000 7c802446 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0] 0164ff98 7c802455 00000064 00000000 0164ffec 00411a7b kernel32!_SleepEx@8+0x0 0164ffa8 00411a7b 00000064 00000000 7c80b713 00000000 kernel32!_Sleep@4+0x0 0164ffec 00000000 00411a70 00000000 00000000 2f73fcd8 minirosetta_1.54_windows_intelx!+0x0 *** Dump of thread ID 1808 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 100144.000000, User Time: 0.000000, Wait Time: 1696642.000000 - Registers - eax=0272fe28 ebx=021c4a01 ecx=0272e734 edx=00001f9a esi=00000000 edi=0272fdf8 eip=7c92e4f4 esp=0272fdc8 ebp=0272fe20 cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202 - Callstack - ChildEBP RetAddr Args to Child 0272fdc4 7c92d1fc 7c8023f1 00000000 0272fdf8 00000122 ntdll!_KiFastSystemCallRet@0+0x0 FPO: [0,0,0] 0272fdc8 7c8023f1 00000000 0272fdf8 00000122 09778748 ntdll!_NtDelayExecution@8+0x0 FPO: [2,0,0] 0272fe20 7c802455 000007d0 00000000 7c802446 0079aa61 kernel32!_SleepEx@8+0x0 0272fe30 0079aa61 000007d0 f845c7b2 0012bfe0 021c4a38 kernel32!_Sleep@4+0x0 0272fe38 f845c7b2 0012bfe0 021c4a38 0272ff6c 021c4a38 minirosetta_1.54_windows_intelx!+0x0 0272fe3c 0012bfe0 021c4a38 0272ff6c 021c4a38 00000001 minirosetta_1.54_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = 'f845c7b2' 0272ff3c 7c937de9 7c937ea0 7c800000 0272ff7c 00000000 minirosetta_1.54_windows_intelx!+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '0012bfe0' 0272ffe0 7c80b71f 00000000 00000000 00000000 0041eb46 ntdll!_LdrpGetProcedureAddress@20+0x0 SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '7c937de9' 0272ffe4 00000000 00000000 00000000 0041eb46 021c4a38 kernel32!_BaseThreadStart@8+0x0 FPO: [0,0,0] SymFromAddr(): GetLastError = '126' SymGetLineFromAddr(): GetLastError = '126' SymGetModuleInfo(): GetLastError = '126' Address = '7c80b71f' *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Validate state Invalid Claimed credit 32.9406239634204 Granted credit 0 application version 1.54 The other one (workunit lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652) only spent less than a half hour in my computer, but the error message did not show when it crashed. And I also used Mozilla Firefox browser V 3.0 then, strangely the Mozilla Firefox browser did not accidently closed at the same time. The task detail is in the following: Task ID 236172160 Name lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652_1 Workunit 215347031 Created 17 Mar 2009 8:05:59 UTC Sent 17 Mar 2009 8:07:24 UTC Received 20 Mar 2009 17:36:16 UTC Server state Over Outcome Client error Client state Compute error Exit status -1073741819 (0xc0000005) Computer ID 224205 Report deadline 27 Mar 2009 8:07:24 UTC CPU time 1436.896 stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> - exit code -1073741819 (0xc0000005) </message> <stderr_txt> BOINC:: Initializing ... ok. [2009- 3-21 1: 5:10:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/mtyka_lr5_D_score12.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/mtyka_lr5_D_score12.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/lr5_2hsb.out.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/lr5_2hsb.out.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Initializing score function: Initializing relax mover: Starting protocol... Silent Output Mode Jobdist startup.. BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: S_shuffle_00001 <--- S_00002_0000216_0_test_6.0.out Fullatom mode .. # cpu_run_time_pref: 21600 Unhandled Exception Detected... - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024 Engaging BOINC Windows Runtime Debugger... ******************** BOINC Windows Runtime Debugger Version 6.5.0 Dump Timestamp : 03/21/09 01:34:14 Install Directory : C:Program FilesBOINC Data Directory : C:Program FilesBOINC Project Symstore : LoadLibraryA( C:Program FilesBOINC\dbghelp.dll ): GetLastError = 1455 LoadLibraryA( dbghelp.dll ): GetLastError = 1455 *** Dump of the Process Statistics: *** - I/O Operations Counters - Read: 10715, Write: 0, Other 3493 - I/O Transfers Counters - Read: 0, Write: 200794, Other 0 - Paged Pool Usage - QuotaPagedPoolUsage: 29464, QuotaPeakPagedPoolUsage: 29464 QuotaNonPagedPoolUsage: 4416, QuotaPeakNonPagedPoolUsage: 5664 - Virtual Memory Usage - VirtualSize: 288079872, PeakVirtualSize: 296271872 - Pagefile Usage - PagefileUsage: 192016384, PeakPagefileUsage: 208936960 - Working Set Size - WorkingSetSize: 136130560, PeakWorkingSetSize: 213221376, PageFaultCount: 366777 *** Dump of thread ID 1164 (state: Waiting): *** - Information - Status: Wait Reason: UserRequest, , Kernel Time: 93334208.000000, User Time: 14287143936.000000, Wait Time: 2525130.000000 - Unhandled Exception Record - Reason: Access Violation (0xc0000005) at address 0x0055B8C1 write attempt to address 0x00000024 *** Dump of thread ID 3344 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 300432.000000, User Time: 300432.000000, Wait Time: 2525124.000000 *** Dump of thread ID 2416 (state: Waiting): *** - Information - Status: Wait Reason: ExecutionDelay, , Kernel Time: 0.000000, User Time: 100144.000000, Wait Time: 2524973.000000 *** Debug Message Dump **** *** Foreground Window Data *** Window Name : Window Class : Window Process ID: 0 Window Thread ID : 0 Exiting... </stderr_txt> ]]> Validate state Invalid Claimed credit 2.76822531352161 Granted credit 2.76822531352161 application version 1.54 And another computer computed this workunit also computed error. The message is in the following: Task ID 236168980 Name lr5_E_01_hbond_bb_sc_rlbd_2hsb_SAVE_ALL_OUT_8261_652_0 Workunit 215347031 Created 17 Mar 2009 7:49:09 UTC Sent 17 Mar 2009 7:50:56 UTC Received 17 Mar 2009 8:05:56 UTC Server state Over Outcome Client error Client state Compute error Exit status -185 (0xffffff47) Computer ID 868926 Report deadline 27 Mar 2009 7:50:56 UTC CPU time 0 stderr out <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> Input file minirosetta_1.54_windows_intelx86.exe missing or invalid: -163 </message> ]]> Validate state Invalid Claimed credit 0 Granted credit 0 application version 1.54 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
hi! I can only tell you that the v1.54 mini version now includes code both to end such tasks sooner, and to report information useful to help determine why those models are running so long. Prior to these enhancements, the watchdog would wait until the task ran for 3 or 4 times longer then the runtime preference, and the results when such a watchdog end was made were not as useful in studying what occurred. I've been asking why such tasks are not receiving credit from the nightly credit granting script, but have not yet received any word. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
I just tried resetting the Rosetta@home project and got these error messages (with no Rosetta@home workunit running, none downloaded but not run, and the last one already reported): 3/22/2009 2:03:10 AM|rosetta@home|Resetting project 3/22/2009 2:03:16 AM|rosetta@home|[error] Couldn't delete file projects/boinc.bakerlab.org_rosetta/minirosetta_1.54_windows_intelx86.exe Attempts to delete the file manually also failed, with error messages about being unable to move it to the deleted items folder. I currently have Rosetta@home on no new tasks, to keep it this way until you can give me some usable advice about how to finish the reset. I run BOINC 6.2.28 under Vista SP1. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
this task 5croA_BOINC_ABINITIO_IGNORE_THE_REST-MOO56-S25-11-S3-13--5croA-_7876_63 crashed on 2 computers and did not reply on another. I got a validate error, another person got a compute error and the third never replied with the task error or completion. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
robertmiles Sounds like a reboot is in order to clear all of the locks. I've never heard of that happening before. Perhaps something like anti-virus software has taken a lock on the file to perform a scan? Curious, why were you resetting the project? Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
robertmiles A reboot may have helped - it was part of the procedure I described trying over on Ralph@home, and was able to remove the lockfiles for a while. I was resetting the project because that's what the error messages from the lockfile problem suggest I may need to do. However, it doesn't seem to have helped enough, since the first Rosetta@home workunit my machine completed since the reset had the lockfile problem again: https://boinc.bakerlab.org/rosetta/result.php?resultid=237629070 Two more Rosetta@home workunits that started later aren't finished, but at least don't seem to have run into the lockfile problem yet. My antivirus program, and also my three antispyware programs, are able to finish scanning a file in much less time than it needs for Rosetta@home and Ralph@home workunits to fail due to too many restarts from a lockfile problem, so I'd expect a lock from any of them to cause lockfile error messages for only a short time, followed by a successful minirosetta restart. A suggestion - modify minirosetta to check for the lockfile as it starts up (preferably before any effort to create one), report the results of this check if it can, and if this first check for the lockfile finds one, don't waste as much time restarting over and over before declaring the workunit failed. Another suggestion - modify minirosetta to report which slot it ran in, since the problem looks like it may be specific to workunits assigned to specific slots, due to what looks like its inability to remove lockfiles left by previous workunits assigned to the same slot but already completed since the last reboot. I leave BOINC running nearly 24 hours a day, often days between reboots, which may have something to do with why I'm seeing the lockfile problem as often as I do. I'm still using BOINC 6.2.28 under 32-bit Vista SP1. |
Snags Send message Joined: 22 Feb 07 Posts: 198 Credit: 2,888,320 RAC: 0 |
robertmiles You might be interested in this announcement by Bernd over at Einstein@home. He has made an Einstein Windows app specifically to collect more info on the CPU throttling=too many exits/can't acquire lockfile errors. Hopefully his discoveries will prove useful here on rosetta@home as well. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
This task is currently using 496MB on my machine. Max was 536MB. It is called 2P09A_BOINC_MPZN_vanilla_abrelax_9106_6681_0 What is the status now that the minimum recommended memory is 512MB? Are there still WUs created that will only go to systems with more? My machine has 2GB. But was wondering if this task is using more then planned. That task seems to be running normally otherwise. It is 22hrs in to my 24hr preference. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
ERROR: dis==0 in pairtermderiv! ERROR:: Exit from: src/core/scoring/methods/PairEnergy.cc line: 338 Task ID:237330352 |
robertmiles Send message Joined: 16 Jun 08 Posts: 1232 Credit: 14,281,662 RAC: 1,150 |
A workunit that ran for a while, then ran into the lockfile problem: https://boinc.bakerlab.org/rosetta/result.php?resultid=238431267 Two of the five subdirectories under the slots directory contain a large number of files, and appear to be for the two workunits now in progress. Two are empty. The other subdirectory contains only 3 files, and appears to be left over from this failed workunit. File boinc_lockfile appears to be empty, since its size is zero. It's marked as still is use, though, so I can't check this. The contents of stderr.txt start with this: BOINC:: Initializing ... ok. [2009- 3-25 22:55: 2:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing core... Initializing options.... ok Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip <unzip> <-oq> <../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev26003.zip> <-d./> Firstarg=true; pp=-d./ firstarg: <-d./> End of unzipping. Setting database description ... Setting up checkpointing ... Setting up folding (abrelax) ... Beginning folding (abrelax) ... BOINC:: Worker startup. Starting watchdog... Watchdog active. Starting work on structure: _U9X3X_00001 # cpu_run_time_pref: 43200 Starting work on structure: _U9X3X_00002 Starting work on structure: _U9X3X_00003 Starting work on structure: _U9X3X_00004 Starting work on structure: _U9X3X_00005 Starting work on structure: _U9X3X_00006 Starting work on structure: _U9X3X_00007 Starting work on structure: _U9X3X_00008 Starting work on structure: _U9X3X_00009 Starting work on structure: _U9X3X_00010 Starting work on structure: _U9X3X_00011 Starting work on structure: _U9X3X_00012 Starting work on structure: _U9X3X_00013 Starting work on structure: _U9X3X_00014 Starting work on structure: _U9X3X_00015 Starting work on structure: _U9X3X_00016 Starting work on structure: _U9X3X_00017 Starting work on structure: _U9X3X_00018 Starting work on structure: _U9X3X_00019 Starting work on structure: _U9X3X_00020 BOINC:: Initializing ... ok. Can't acquire lockfile - exiting BOINC:: Initializing ... ok. Can't acquire lockfile - exiting BOINC:: Initializing ... ok. Can't acquire lockfile - exiting BOINC:: Initializing ... ok. Can't acquire lockfile - exiting The contents of stdout.txt are: Created shared memory segment Created semaphore Do these results mean that Rosetta@home never tries to clear up these three files for failed workunits? Should it? They appear to prevent any workunits from Rosetta@home or Ralph@home from being able to run in this slot until the next reboot - often meaning a few days for me. I haven't seen them have a similar effect on workunits from other BOINC projects, though. |
Hammeh Send message Joined: 11 Nov 08 Posts: 63 Credit: 211,283 RAC: 0 |
Can anyone shed some light on this WU, I just started crunching for Rosetta, it didn't report any client side errors. 217630163 Thanks |
Chilean Send message Joined: 16 Oct 05 Posts: 711 Credit: 26,694,507 RAC: 0 |
|
Hammeh Send message Joined: 11 Nov 08 Posts: 63 Credit: 211,283 RAC: 0 |
Nope here is some system info: Amd Phenom x4 9600 (not overclocked) 3GB RAM Windows Vista Home Premium 32-bit BOINC version 6.4.7 |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Validate error on this workunit 218443282 on Mac. cc_natcst_1_8_nocstinrelax_hb_t327__IGNORE_THE_REST_2FSWA_7_9505_20_1 An unlikely 99 decoys from 99 attempts: a wingman had the same problem. Starting work on structure: _2FSWA_7_00098 Starting work on structure: _2FSWA_7_00099 ====================================================== DONE :: 1 starting structures 145.451 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== BOINC :: Watchdog shutting down... BOINC :: BOINC support services shutting down... called boinc_finish </stderr_txt> |
Paul D. Buck Send message Joined: 17 Sep 05 Posts: 815 Credit: 1,812,737 RAC: 0 |
|
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
frb_1_8_bestfrag_hb_t313___IGNORE_THE_REST_1F9TA_5_9696_15_0 7 hours running (3hr default), no decoys, Validate Error. I've been noticing these "frb" WUs are singularly unsuccessful. What are the stats on their successful completion? I'd say they were minimal. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2125 Credit: 41,249,734 RAC: 8,235 |
frb_1_8_bestfrag_hb_t313___IGNORE_THE_REST_1F9TA_5_9696_15_0 Oh, I don't know... frb_1_8_ecut_hb_t322___IGNORE_THE_REST_1VPMA_12_9712_12_0 # cpu_run_time_pref: 14400 CPU time 14099.2 Claimed credit 69.0173659142213 Granted credit 229.296476006251 No complaints here!!! :) |
Path7 Send message Joined: 25 Aug 07 Posts: 128 Credit: 61,751 RAC: 0 |
Success & Error on the same WU Hello all, This WU: frb_0_8_el_chosen_hb_t312___IGNORE_THE_REST_1XV2A_15_9667_54_0 has official been reported as: Outcome = Success. However the WU ran only for 4309.559 seconds, cpu_run_time_pref: 21600 and ended with an error: Starting work on structure: _1XV2A_15_00008 interpolate rotamers bin out of range: GLN -107.207 180 -7e-005 -6.1e-005 -5.1e-005 34 36 8 9 37 2 0.2793 0 ERROR:: Exit from: d:boinc_buildminirosetta_windowsminisrccore/scoring/dunbrack/RotamericSingleResidueDunbrackLibrary.tmpl.hh line: 593 called boinc_finish Have a nice day, Path7. |
Murasaki Send message Joined: 20 Apr 06 Posts: 303 Credit: 511,418 RAC: 0 |
Another WU with 99 successful decoys ala_2he4_p40-1.ala.ppk_dock_random.xml_RANDOM12_BOUND_DOCK_9895_843_0 # cpu_run_time_pref: 21600 ====================================================== DONE :: 1 starting structures 6841.62 cpu seconds This process generated 99 decoys from 99 attempts ====================================================== My preferred run time is 6 hours, but this one completed in less than 2. Either this is an extremely quick model or something odd occurred. |
Message boards :
Number crunching :
Problems with Minirosetta v1.54
©2024 University of Washington
https://www.bakerlab.org