Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 71 · 72 · 73 · 74 · 75 · 76 · 77 . . . 318 · Next
Author | Message |
---|---|
Mr P Hucker![]() Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,743,381 RAC: 15,221 ![]() |
Is this expected?Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead. Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error"). ![]() |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
What’s not clear is whether those tasks legitimately need all the memory they’re allocating (in which case they should run to completion on a system with sufficient resources) – or whether it’s a bug, and the tasks will simply keep requesting more memory (however much the system makes available) until they fail. I disable the paging file on every machine I operate. If that causes anything to fail to run, I need more RAM! |
![]() Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,372,156 RAC: 211 |
Is this expected?Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead. You appear to be assuming that the available swap space is all of the unused space on the hard drive. I doubt if this is correct. However, I just tried checking my Windows 10 computer, and it appears that Windows 10 no longer uses anything called swap space. I remember that some previous versions of Windows had a reserved area on the hard drive known as swap space, |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems. Windows calls it a ‘paging file’, and it’s still very much present in Windows 10. (Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group) |
Mr P Hucker![]() Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,743,381 RAC: 15,221 ![]() |
What’s not clear is whether those tasks legitimately need all the memory they’re allocating (in which case they should run to completion on a system with sufficient resources) – or whether it’s a bug, and the tasks will simply keep requesting more memory (however much the system makes available) until they fail. If you have an SSD, paging isn't as bad as it used to be with those rust spinners, where if it started being used, the interface became so slow you weren't sure if what you clicked was being done slowly or it hadn't noticed, so you click again then it ends up loading 5 of them slowing it down even further, and you spend so much time waiting for the computer you have no time to go to work to earn money to buy more RAM. ![]() |
Mr P Hucker![]() Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,743,381 RAC: 15,221 ![]() |
Is this expected?Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead. On this Windows 10 computer (and all of them unless you change it), paging is automatic. It will increase it if it needs it, but doesn't reserve a chunk of disk for it. ![]() |
![]() Send message Joined: 16 Jun 08 Posts: 1235 Credit: 14,372,156 RAC: 211 |
‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems. Thanks. Windows 10 made it hard to find the right control panel if you haven't used it lately! Try Windows System / System and Security / System for the first few steps. About 4,8 GB currently on my computer. |
Mr P Hucker![]() Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,743,381 RAC: 15,221 ![]() |
‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems. Not sure why Microsoft has split all the controls up and put them in stupid places that only they think make sense. I often have to use the search to find the simplest of things. I want the old control panel with a sensible list. It's usually actually quicker to google the problem. That 4.8GB should automatically increase as required though. At least we never get "out of memory" messages like in very early Windows. ![]() |
Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0 |
Thanks to all who replied to my previous message. I aborted that task and all has returned to normal :) Except I notice some tasks are awarded very few credits for the same Gflops as other tasks that are give an order of magnitude more. For example, just 35.12 credits for the first one but 326.58 for the second. It is to do with whether you're reporting the task result first or second? Name rb_09_10_37482_36690__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1010270_821_0 Workunit 1127882972 Created 12 Sep 2020, 0:02:14 UTC Sent 12 Sep 2020, 0:07:00 UTC Report deadline 15 Sep 2020, 0:07:00 UTC Received 14 Sep 2020, 4:25:44 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 4466108 Run time 12 hours 55 min 41 sec CPU time 11 hours 4 min Validate state Valid Credit 35.12 Device peak FLOPS 3.09 GFLOPS Application version Rosetta v4.20 x86_64-pc-linux-gnu Peak working set size 1,135.22 MB Peak swap size 1,276.45 MB Peak disk usage 29.31 MB Name pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_8ae9bz6m_1009438_2_0 Workunit 1127762369 Created 11 Sep 2020, 18:09:26 UTC Sent 11 Sep 2020, 19:34:11 UTC Report deadline 14 Sep 2020, 19:34:11 UTC Received 14 Sep 2020, 2:58:16 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 4466108 Run time 11 hours 49 min 45 sec CPU time 9 hours 59 min 46 sec Validate state Valid Credit 326.58 Device peak FLOPS 3.09 GFLOPS Application version Rosetta v4.20 x86_64-pc-linux-gnu Peak working set size 1,010.59 MB Peak swap size 1,143.78 MB Peak disk usage 30.09 MB |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Rosetta tasks are almost never sent to more than one computer. The low credit one looks like a bug which crops up occasionally: somehow the task ran more than once, and only the last (very short) run received credit, even though it was the first run that did almost all the work. |
Mr P Hucker![]() Send message Joined: 12 Aug 06 Posts: 1600 Credit: 12,743,381 RAC: 15,221 ![]() |
Rosetta tasks are almost never sent to more than one computer. The low credit one looks like a bug which crops up occasionally: somehow the task ran more than once, and only the last (very short) run received credit, even though it was the first run that did almost all the work. Is this just a credit bug? Is the data from the big first run still saved? ![]() |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Good question. I don’t know whether the results from the second run are added to, or overwrite, those from the first. |
![]() Send message Joined: 8 Apr 20 Posts: 30 Credit: 12,995,617 RAC: 1,024 |
I don't really know if this is a persistent issue. I noticed today that some finished foldit WUs stopped at 45,78% during results upload transfers, they allow re-attempts to send results, but doing so manually causes a very slow upload even w/o any upload transfer limits. Anyway, just thought team R@h should know. |
Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0 |
Thanks for your feedback. For clarification the two jobs are as far as I can tell unrelated but have very different credit scores - I did not mean to imply they were first and second runs of the same task. I know some projects send out jobs to multiple computers, I did not know if Rosetta was one of these. |
Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0 |
tx |
![]() Send message Joined: 16 May 13 Posts: 4 Credit: 160,977 RAC: 0 |
I'm still having issues with the graphics on Ubuntu 18.04. It shows "No shared mem". ![]() |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,302,201 RAC: 1,069 |
Seeing some tasks with a large log with these lines: AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. [ ERROR ]: Caught exception: File: ......srcprotocolsmotif_graftingmoversMotifGraftMover.cc:537 For this scaffold there are not suitable scaffold grafts within your constrains ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. https://boinc.bakerlab.org/rosetta/result.php?resultid=1263012644 https://boinc.bakerlab.org/rosetta/result.php?resultid=1263011946 In those examples, one had the number of decoys at the end and the other one didn't. They are validating but I sure hope this isn't wasted electricity. epcam_breaker_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_2lt3jd5h_1009432_4_0 pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_7es6gq8a_1009506_4_0 EDIT: On another PC https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076666 https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076045 Seems limited to epcam_breaker and pdl1_graft tasks from what I can tell. |
Detto![]() Send message Joined: 10 Apr 20 Posts: 2 Credit: 788,565 RAC: 0 |
For the 3rd time since April I only got 3 credits for a work unit : https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248 any insights? |
![]() Send message Joined: 28 Mar 20 Posts: 1766 Credit: 18,534,891 RAC: 65 |
For the 3rd time since April I only got 3 credits for a work unit :Nope. The system completed a Task of exactly the same type without issue. Is there only 1 instance of BOINC running on that system? The difference between CPU time and Runtime indicates the system is doing a fair bit of work other than processing BOINC Tasks, but it's nowhere near as big a difference as other systems that aren't having low Credit issues. <core_client_version>7.16.11</core_client_version> <![CDATA[ <stderr_txt> command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_struct_profile_layered_design_less_IVYW_wt1_091_c1__0.05_0018_8ffb15d87a6b0ee88cff77a7acba3bea_BJH8LOZG_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2711465 Using database: database_357d5d93529_n_methyl/minirosetta_database ====================================================== DONE :: 1 starting structures 28683.4 cpu seconds This process generated 133 decoys from 133 attempts ====================================================== BOINC :: WS_max 4.60468e+08 11:45:31 (22004): called boinc_finish(0) </stderr_txt> ]]> <core_client_version>7.16.6</core_client_version> <![CDATA[ <stderr_txt> command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_plus_struct_profile_091_c1_barrel6_3_c0851511e59b6__0.05_0009_8c3dcb16fc078c91ae0a41d4b95a66fc_M0Z4KTXS_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2841497 Using database: database_357d5d93529_n_methyl/minirosetta_database ====================================================== DONE :: 1 starting structures 28729.4 cpu seconds This process generated 130 decoys from 130 attempts ====================================================== BOINC :: WS_max 4.43654e+08 19:49:09 (1228): called boinc_finish(0) ====================================================== DONE :: 1 starting structures 28916.2 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: WS_max 2.25206e+08 19:58:00 (3178): called boinc_finish(0) </stderr_txt> ]]> It's the usual cause- the Task finished, and yet continued to run and produced one more Decoy from another starting structure. That processing time was added to the earlier processing time, but that final Decoy wasn't, so added to the previous Decoys produced, so Credit was granted based on that one Decoy, and none of the previous work. Grant Darwin NT |
Falconet Send message Joined: 9 Mar 09 Posts: 354 Credit: 1,302,201 RAC: 1,069 |
DELETED |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org