Questions and Answers : Unix/Linux : WUs fail with code 131
Author | Message |
---|---|
Kenneth Larsen Send message Joined: 17 Sep 05 Posts: 3 Credit: 112,217 RAC: 0 |
Most of my work units have started failing the last few days with this code: 4.43 process exited with code 131 (0x83) [0x864ff8f] [0x86c183c] [0xb7f5d420] [0x8511831] [0x85148b6] [0x80d4914] [0x823a774] [0x8232651] [0x8363677] [0x86c6e84] [0x8048121] SIGSEGV: segmentation violationStack trace (11 frames): Exiting... I'm using Boinc v4.43 under Fedora Core 4 with an Athlon XP3000+ with 512MB ram (no overclocking). Any idea what's wrong? |
ralic Send message Joined: 22 Sep 05 Posts: 16 Credit: 46,481 RAC: 0 |
I've also had one do this. Looking at the logs, it was just at the point where BOINC had instructed the wu's to be removed from RAM prior to starting the 5day benchmark. Relevant log extract below: 2005-09-24 09:17:43 [---] Suspending computation and network activity - running CPU benchmarks 2005-09-24 09:17:43 [rosetta@home] Pausing result 1pvaA_abrelax_18978_0 (removed from memory) 2005-09-24 09:17:43 [rosetta@home] Pausing result 1pvaA_abrelax_18992_0 (removed from memory) 2005-09-24 09:17:44 [rosetta@home] Unrecoverable error for result 1pvaA_abrelax_18992_0 (process exited with code 131 (0x83)) 2005-09-24 09:17:44 [---] request_reschedule_cpus: process exited 2005-09-24 09:17:45 [---] Running CPU benchmarks 2005-09-24 09:18:13 [---] Aborting CPU benchmarks, one or more active tasks are still running. 2005-09-24 09:18:13 [---] Resuming computation and network activity 2005-09-24 09:18:13 [---] request_reschedule_cpus: Resuming activities 2005-09-24 09:18:13 [rosetta@home] Deferring communication with project for 31 seconds 2005-09-24 09:18:13 [rosetta@home] Computation for result 1pvaA_abrelax_18992_0 finished 2005-09-24 09:18:13 [rosetta@home] Starting result 1pvaA_abrelax_18993_0 using rosetta version 4.77 2005-09-24 09:18:14 [---] ACTIVE_TASK_SET::check_app_exited(): pid 9550 not found 2005-09-24 09:18:15 [---] ACTIVE_TASK_SET::check_app_exited(): pid 9551 not found [edit] Here's a link to the result id: 28059 [/edit] |
Juerschi Send message Joined: 17 Sep 05 Posts: 8 Credit: 14,145 RAC: 0 |
My linux host had quite a smiliar problem like ralics host. WU was removed from memory but wasn't errored out. Benchmarking started but was aborted because of one or ore active tasks. Error message ACTIVE_TASK_SET was quite the same like ralic posted, only pid number is different |
Desti Send message Joined: 16 Sep 05 Posts: 50 Credit: 3,018 RAC: 0 |
|
ralic Send message Joined: 22 Sep 05 Posts: 16 Credit: 46,481 RAC: 0 |
I've also had one do this. Well, I've had another one do it, and benchmarks were not in the picture this time. Relevant log extract below: 26/09/2005 04:37:30|rosetta@home|Starting result 1pvaA_abrelax_16851_1 using rosetta version 4.77 26/09/2005 04:38:02|rosetta@home|Unrecoverable error for result 1pvaA_abrelax_16851_1 (process exited with code 131 (0x83)) 26/09/2005 04:38:02||request_reschedule_cpus: process exited 26/09/2005 04:38:02|rosetta@home|Deferring communication with project for 1 minutes and 0 seconds 26/09/2005 04:38:02|rosetta@home|Computation for result 1pvaA_abrelax_16851_1 finished result id: 28244 The error message in the result is slightly different this time, since it includes the following line: No heartbeat from core client for 31 sec - exiting |
daniele Send message Joined: 12 Oct 06 Posts: 18 Credit: 20,328 RAC: 0 |
This night I had the same error from one WU, but nothing remarkable in stderr.txt. I have other 2 WUs with nearly the same name, in few hours I'll see if they get the same error. |
Questions and Answers :
Unix/Linux :
WUs fail with code 131
©2024 University of Washington
https://www.bakerlab.org