Message boards : Number crunching : Minirosetta 3.14
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Robert, those are the odd ones. The watchdog can't get at them, because it doesn't get any CPU either. If you exit and restart BOINC, they will generally straighten themselves out. If 600MB was more then comfortable for your machine, then it does little harm to cancel one once and a while. Rosetta Moderator: Mod.Sense |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,637 |
Another 3.14 workunit that stopped using any CPU time. https://boinc.bakerlab.org/rosetta/result.php?resultid=442045657 T0610_3ot2.pdb_boinc_lr_control_nativechainA_loopbuild_threading_cst_relax_wangyr_IGNORE_THE_REST_30423_909 Max RAM usage 95 MB CPU time at last checkpoint 00:24:30 CPU time 00:24:33 Elapsed time 08:52:22 Estimated time remaining 26:52:02 Fraction done 3.325% Virtual memory size 325.62 MB Working set size 333.91 MB Note the large difference between Max RAM usage and the Working set size. Peak working set 341.920 MB BOINC 6.12.33 64-bit Windows Vista Home Premium SP2 8 GB memory; BOINC allowed to use 40% of it Leave applications in memory when suspended Tthrottle64 V4.20 running, but only to display the temperatures Already aborted, rather than wait for an answer. Rosetta@Home is on No new tasks; probably will stay there until I see some signs on RALPH@Home that something is being done about this. 600 MB is reasonable on this computer; going many hours doing nothing useful is not. |
Paul van Dijken Send message Joined: 21 Jun 10 Posts: 2 Credit: 1,123,717 RAC: 0 |
After running WU rb_08_23_25236_50085_rs_stg0_lrlxMultiCst_t000__casp9__aln2_SAVE_ALL_OUT_30565_13_0 for 13+ hours and no progress beyond 12.700%, I aborted it. This was the 3rd time in a few days it happened. I stopped downloading Rosetta. Any estimate when this issue is going to be solved? |
Ed Send message Joined: 2 Aug 11 Posts: 31 Credit: 662,563 RAC: 0 |
I am having a different problem. My BOINC is set to run 65% Seti and 35% Rosetta, but it is constantly running Rosetta in priority mode. This has been going on for days. It is like every WU is coming down and immediately goes into priority. I have had no SETI WU for days so Rosetta has been getting all the cycle. Now that Seti is sending out WU again I expect things to balance back out but it is not happening. Seti is getting no CPU time at all. I have suspended Rosetta to give SETI some run time. this better stop. Anyone have any suggestions? |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,637 |
I am having a different problem. My BOINC is set to run 65% Seti and 35% Rosetta, but it is constantly running Rosetta in priority mode. This has been going on for days. It is like every WU is coming down and immediately goes into priority. Do the Rosetta workunits happen to have due dates before those for SETI? Is the total expected time to run all the Rosetta workunit greater than 35% of the time to their due dates? Have you tried setting both Rosetta and SETI on No New Tasks until close to finishing all the downloaded workunits, then unsetting this for SETI first and getting a few SETI workunits, then unsetting it for Rosetta as well? |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The thing I would try, is leaving it alone. The lack of work from SETI is causing the BOINC Manager to get work from Rosetta, but then when it looks at the 35% resource share, and the debt to SETI, it starts to worry about completing the tasks on time, so it sets them to run first (which is all that "high priority" means after all). You shouldn't have to suspend projects and micro-manage things to get the resource share you have selected... when work is available. When work is not available, it gets work from where it can... and makes it up to the other project when it starts producing work again. Rosetta Moderator: Mod.Sense |
Ed Send message Joined: 2 Aug 11 Posts: 31 Credit: 662,563 RAC: 0 |
Thanks guys! I am going to leave it alone, but I have set Rosetta to "no new WU" When it runs dry the Seti will get its time again. It could be that, during the time when Rosetta had no work and Seti was getting all the time that a "debt" was built up and it is now being worked off. Who knows, but I think you all for you analysis and recommendations. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
Thanks guys! by no new work you will create a debt again and the next time your start the project you will get an overload of rosetta work and seti will shut down until the debt is settled. best thing to do is to set your percentage of rosetta much lower than seti. then the deb issue will not be a factor and rosetta work will take a back seat to seti until seti dries up again. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,637 |
Thanks guys! I've tried something similar, and found that if you set some BOINC project to such a low percentage that giving it only that percentage will not allow all the workunits you have already downloaded fron that project to complete on time, at least one of those workunits will almost immediately go into high priority mode. Shortening the queue of already downloaded workunits, if appropriate, before making any such change, works better. |
Greg_BE Send message Joined: 30 May 06 Posts: 5691 Credit: 5,859,226 RAC: 0 |
[quote]Thanks guys! so then what would happen if he set both projects to no new work, let the tasks clear out. redo his percentages and extra days to what he thinks will work and then allow new work to come in? This way he could start clean and let Boinc Mgr figure out what to do based on the new parameters. I've tried something similar, and found that if you set some BOINC project to such a low percentage that giving it only that percentage will not allow all the workunits you have already downloaded fron that project to complete on time, at least one of those workunits will almost immediately go into high priority mode. Shortening the queue of already downloaded workunits, if appropriate, before making any such change, works better. |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,637 |
[quote]Thanks guys! I've tried that also. Can start with an imbalance in the workunits, with the first project that asks for workunits getting more that its share. Generally not as bad an imbalance, though. |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The best way for debt to balance out, is to leave it alone. Now that SETI has work coming, the BOINC Manager will figure out what it needs to do to balance the debt and deliver the resource shares you have selected. As you say, there may have been a debt owed to Rosetta. If the two started equal, and SETI work dried up, you get a larger than average pile of Rosetta work. Then SETI comes back with work, and BOINC will figure out that it needs to both complete the tasks it has from Rosetta, and begin getting more from SETI than Rosetta to achieve the desired resource share. All of the adjusting of resource shares, flagging as no new work, etc. is simply making it impossible for BOINC to figure out what you want. Rosetta Moderator: Mod.Sense |
Ed Send message Joined: 2 Aug 11 Posts: 31 Credit: 662,563 RAC: 0 |
Looks like BOINC has finaly balanced out as I am now getting a more normal distribution of processing time between the two projects. Thanks for the recommendations. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Task 448806358 (T590_cc_rs_stg0_lrlxMultiCst_t000__casp9__aln1_SAVE_ALL_OUT_31304_125_0) failed on Mac ERROR: seqpos >=1 && seqpos <= size() ERROR:: Exit from: src/core/conformation/Conformation.hh line: 268 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish </stderr_txt> ]]> |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
A couple of tasks called test_needle* failed in the middle of computation under W7 with the same error message, ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6 ERROR:: Exit from: ......srccoreposesymmetryutil.cc line: 740 BOINC:: Error reading and gzipping output datafile: default.out called boinc_finish The tasks were 449325696 and 449325673 ------- Another test needle* task, 449325408 ran for an excessive length of time (>7 hours on a 3 hour preference) generating 2 decoys. The result was valid but there's a message in the log about an H-bond being tripped. Hbond tripped: [2011- 9-20 10:24:55:] BOINC:: CPU time: 25291.3s, 14400s + 10800s[2011- 9-20 15: 1:52:] :: BOINC InternalDecoyCount: 2 ====================================================== DONE :: 2 starting structures 25291.3 cpu seconds This process generated 2 decoys from 2 attempts ====================================================== called boinc_finish |
alpha Send message Joined: 4 Nov 06 Posts: 27 Credit: 1,550,107 RAC: 0 |
Compute error for work unit 408829805: https://boinc.bakerlab.org/rosetta/result.php?resultid=450197233 The only problem I see is: upload failure: <file_xfer_error> <file_name>1AI8.ppk1.nobb_docking_benchmark_8Sep2011_30843_72_1_0</file_name> <error_code>-131</error_code> </file_xfer_error> |
[AF>france>pas-de-calais]symaski62 Send message Joined: 19 Sep 05 Posts: 47 Credit: 33,871 RAC: 0 |
https://boinc.bakerlab.org/rosetta/result.php?resultid=452608726 Task ID 452608726 Name Aug20_needle_13start_test_SAVE_ALL_OUT__31431_61348_0 <core_client_version>6.12.33</core_client_version> <![CDATA[ <stderr_txt> [2011-10- 1 0:22:16:] :: BOINC:: Initializing ... ok. [2011-10- 1 0:22:16:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Aug20_13start_needle.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 86400 [2011-10- 1 10:18:19:] :: BOINC:: Initializing ... ok. [2011-10- 1 10:18:19:] :: BOINC :: boinc_init() BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/boinc.bakerlab.org_rosetta/minirosetta_database_rev42272.zip Unpacking WU data ... Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/Aug20_13start_needle.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... BOINC:: Worker startup. Starting watchdog... Watchdog active. # cpu_run_time_pref: 86400 Continuing computation from checkpoint: chk_S_00008_FragmentSampler__stage1 ... success! ERROR: std::abs( coordsys_rot.det() - 1.0 ) < 1e-6 ERROR:: Exit from: ......srccoreposesymmetryutil.cc line: 740 called boinc_finish </stderr_txt> ]]> |
entigy Send message Joined: 2 Nov 05 Posts: 5 Credit: 990,830 RAC: 0 |
I've just reconnected to Rosetta after some time away, and the 2 units I've completed both have a 'validation error'. Is this going to happen with all the remaining Mini 3.14 units I have ? If so, I might as well detach again ...... |
robertmiles Send message Joined: 16 Jun 08 Posts: 1233 Credit: 14,324,975 RAC: 3,637 |
I've just reconnected to Rosetta after some time away, and the 2 units I've completed both have a 'validation error'. My three computers have already been on No New Tasks for Rosetta for weeks, but due to a different 3.14 problem. On some computers, including those, 3.14 workunits tend to crash in a way that does not manage to tell BOINC that the workunit is no longer running and some other workunit can now be started. I'm getting better 3.14 results on RALPH@Home, though, so the developers may be working out a way to change the workunit inputs in a way that gives better results without changing the 3.14 program yet. Therefore, I'd suggest setting Rosetta on No New Tasks for now, but letting the remaining workunits run to see if they will all at least finish properly. |
svincent Send message Joined: 30 Dec 05 Posts: 219 Credit: 12,120,035 RAC: 0 |
Task Aug20_needle_9start_test_SAVE_ALL_OUT__31432_91316_0 (452661954) failed on W7 after taking 7 hours on a 3 hour preference. Watchdog active. Hbond tripped: [2011-10- 5 14: 9:21:] BOINC:: CPU time: 25478.3s, 14400s + 10800s[2011-10- 5 20:46:31:] :: BOINC WARNING! cannot get file size for default.out.gz: could not open file. Output exists: default.out.gz Size: -1 InternalDecoyCount: 0 (GZ) |
Message boards :
Number crunching :
Minirosetta 3.14
©2024 University of Washington
https://www.bakerlab.org