some WU's stop executing on linux

Questions and Answers : Unix/Linux : some WU's stop executing on linux

To post messages, you must log in.

AuthorMessage
NilsB

Send message
Joined: 6 May 06
Posts: 1
Credit: 821
RAC: 0
Message 18539 - Posted: 12 Jun 2006, 21:02:08 UTC

There are several WUs that hang:

Work unit ID:
19785174, 19758621

the symtom ist, BOINC don't spent time on this WU after 1:28 hours. It simply stops executing, the CPU consumptions goes to 0. Even if I let BOINC run for several hours.

Another WUs on this systems works fine, the same for other projects.

BOINC Manager 5.4.9 on linux
ID: 18539 · Rating: 3 · rate: Rate + / Rate - Report as offensive    Reply Quote
hugothehermit

Send message
Joined: 26 Sep 05
Posts: 238
Credit: 314,893
RAC: 0
Message 19027 - Posted: 21 Jun 2006, 0:32:49 UTC

G'day NilsB

Welcome to Rosetta@Home

Rosetta does occasionally have Linux errors (3.52% last time I saw).

You can of course abort them if you see them, but the programme will eventually stop itself. The programme will also send debugging information about work unit that failed, so the Rosetta@Home team can reduce these errors even further.

Hope that helps

Hugo.
ID: 19027 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian

Send message
Joined: 24 Nov 05
Posts: 1
Credit: 221,416
RAC: 0
Message 19126 - Posted: 22 Jun 2006, 20:44:37 UTC

I have the same problem on 2 Linux machines...
ID: 19126 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 188
Credit: 6,417,521
RAC: 5,608
Message 30286 - Posted: 30 Oct 2006, 12:07:05 UTC - in response to Message 19027.  

G'day NilsB

Welcome to Rosetta@Home

Rosetta does occasionally have Linux errors (3.52% last time I saw).

You can of course abort them if you see them, but the programme will eventually stop itself. The programme will also send debugging information about work unit that failed, so the Rosetta@Home team can reduce these errors even further.

Hope that helps

Hugo.


I also have this problem. I noticed it yesterday and it is still stuck today.

Work unit 1n0u_HIGHFREQ_ABRELAX_7_1_NATIVe_ONLY_BARCODE__1312_9043_0.
It has accumullated 00:58:44. The BOINC client gives it an hour of CPU from time-to-time and it seems to use none of it.

I am running Red Hat Enterprise Linux 3 ES (up to date) on a dual 3.06 GHz Xeon hyperthreaded processor with 8 GBytes RAM, and this leaves one hyperthreaded processor idle all the time it is scheduled. Other Rosetta applications run just fine and one completed sometime yesterday.

You say "the programme will eventually stop itself." How long is eventually? Because eventually I will wish to abort it.
ID: 30286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bozho

Send message
Joined: 21 Dec 05
Posts: 1
Credit: 46,904
RAC: 0
Message 38621 - Posted: 29 Mar 2007, 13:00:56 UTC - in response to Message 30286.  

I have similar problem
Rosetta hangs:

6275 ? SN 59:56 rosetta_5.54_i686-pc-linux-gnu aa z025 _ -relax -looprlx -nstruct 5 -farlx -ex1 -ex2 -random_loop
-loop_model -termini -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -farlx_cycle_ratio 1.0 -idl_no_chain_break -vary
_omega -output_silent_gz -output_chi_silent -protein_name_prefix hom002_ -frags_name_prefix boinc_hom002_ -s z025_4_1g1cA__9
6.pdb -paths paths_200_z025.txt -do_farlx_checkpointing -checkpointing_interval 10 -fix_disulf disulf -cpu_run_time 10800 -w
atchdog -constant_seed -jran 3770064
6276 ? SN 0:00 rosetta_5.54_i686-pc-linux-gnu aa z025 _ -relax -looprlx -nstruct 5 -farlx -ex1 -ex2 -random_loop
-loop_model -termini -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -farlx_cycle_ratio 1.0 -idl_no_chain_break -vary
_omega -output_silent_gz -output_chi_silent -protein_name_prefix hom002_ -frags_name_prefix boinc_hom002_ -s z025_4_1g1cA__9
6.pdb -paths paths_200_z025.txt -do_farlx_checkpointing -checkpointing_interval 10 -fix_disulf disulf -cpu_run_time 10800 -w
atchdog -constant_seed -jran 3770064

6277 ? SN 0:00 rosetta_5.54_i686-pc-linux-gnu aa z025 _ -relax -looprlx -nstruct 5 -farlx -ex1 -ex2 -random_loop
-loop_model -termini -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -farlx_cycle_ratio 1.0 -idl_no_chain_break -vary
_omega -output_silent_gz -output_chi_silent -protein_name_prefix hom002_ -frags_name_prefix boinc_hom002_ -s z025_4_1g1cA__9
6.pdb -paths paths_200_z025.txt -do_farlx_checkpointing -checkpointing_interval 10 -fix_disulf disulf -cpu_run_time 10800 -w
atchdog -constant_seed -jran 3770064

6278 ? SN 0:00 rosetta_5.54_i686-pc-linux-gnu aa z025 _ -relax -looprlx -nstruct 5 -farlx -ex1 -ex2 -random_loop
-loop_model -termini -short_range_hb_weight 0.50 -long_range_hb_weight 1.0 -farlx_cycle_ratio 1.0 -idl_no_chain_break -vary
_omega -output_silent_gz -output_chi_silent -protein_name_prefix hom002_ -frags_name_prefix boinc_hom002_ -s z025_4_1g1cA__9
6.pdb -paths paths_200_z025.txt -do_farlx_checkpointing -checkpointing_interval 10 -fix_disulf disulf -cpu_run_time 10800 -w
atchdog -constant_seed -jran 3770064

And process have to be killed manualy. (I thing a week is enough time to wait)

OS - slackware 11, updated to current,
rosetta_5.54_i686-pc-linux-gnu
ID: 38621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Unix/Linux : some WU's stop executing on linux



©2024 University of Washington
https://www.bakerlab.org