Problems and Technical Issues with Rosetta@home

Author	Message
Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1602 Credit: 13,010,866 RAC: 338	Message 98994 - Posted: 13 Sep 2020, 18:02:21 UTC - in response to Message 98989. Last modified: 13 Sep 2020, 18:03:11 UTC Is this expected? Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead. Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error"). ID: 98994 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 98996 - Posted: 13 Sep 2020, 19:21:12 UTC - in response to Message 98994. Last modified: 13 Sep 2020, 19:55:46 UTC What’s not clear is whether those tasks legitimately need all the memory they’re allocating (in which case they should run to completion on a system with sufficient resources) – or whether it’s a bug, and the tasks will simply keep requesting more memory (however much the system makes available) until they fail. I disable the paging file on every machine I operate. If that causes anything to fail to run, I need more RAM! ID: 98996 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1262 Credit: 14,421,737 RAC: 0	Message 98997 - Posted: 13 Sep 2020, 19:41:11 UTC - in response to Message 98994. Is this expected? Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead. Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error"). You appear to be assuming that the available swap space is all of the unused space on the hard drive. I doubt if this is correct. However, I just tried checking my Windows 10 computer, and it appears that Windows 10 no longer uses anything called swap space. I remember that some previous versions of Windows had a reserved area on the hard drive known as swap space, ID: 98997 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 98998 - Posted: 13 Sep 2020, 19:55:30 UTC - in response to Message 98997. Last modified: 13 Sep 2020, 19:58:09 UTC ‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems. Windows calls it a ‘paging file’, and it’s still very much present in Windows 10. (Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group) ID: 98998 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1602 Credit: 13,010,866 RAC: 338	Message 98999 - Posted: 13 Sep 2020, 19:56:07 UTC - in response to Message 98996. What’s not clear is whether those tasks legitimately need all the memory they’re allocating (in which case they should run to completion on a system with sufficient resources) – or whether it’s a bug, and the tasks will simply keep requesting more memory (however much the system makes available) until they fail. I disable the page file on every machine I operate. If that causes anything to fail to run, I need more RAM! If you have an SSD, paging isn't as bad as it used to be with those rust spinners, where if it started being used, the interface became so slow you weren't sure if what you clicked was being done slowly or it hadn't noticed, so you click again then it ends up loading 5 of them slowing it down even further, and you spend so much time waiting for the computer you have no time to go to work to earn money to buy more RAM. ID: 98999 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1602 Credit: 13,010,866 RAC: 338	Message 99000 - Posted: 13 Sep 2020, 19:58:08 UTC - in response to Message 98997. Is this expected? Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead. Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error"). You appear to be assuming that the available swap space is all of the unused space on the hard drive. I doubt if this is correct. However, I just tried checking my Windows 10 computer, and it appears that Windows 10 no longer uses anything called swap space. I remember that some previous versions of Windows had a reserved area on the hard drive known as swap space, On this Windows 10 computer (and all of them unless you change it), paging is automatic. It will increase it if it needs it, but doesn't reserve a chunk of disk for it. ID: 99000 · Rating: 0 · rate: / Reply Quote

robertmiles Send message Joined: 16 Jun 08 Posts: 1262 Credit: 14,421,737 RAC: 0	Message 99001 - Posted: 13 Sep 2020, 22:52:17 UTC - in response to Message 98998. ‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems. Windows calls it a ‘paging file’, and it’s still very much present in Windows 10. (Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group) Thanks. Windows 10 made it hard to find the right control panel if you haven't used it lately! Try Windows System / System and Security / System for the first few steps. About 4,8 GB currently on my computer. ID: 99001 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1602 Credit: 13,010,866 RAC: 338	Message 99002 - Posted: 13 Sep 2020, 23:11:16 UTC - in response to Message 99001. Last modified: 13 Sep 2020, 23:12:02 UTC ‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems. Windows calls it a ‘paging file’, and it’s still very much present in Windows 10. (Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group) Thanks. Windows 10 made it hard to find the right control panel if you haven't used it lately! Try Windows System / System and Security / System for the first few steps. About 4,8 GB currently on my computer. Not sure why Microsoft has split all the controls up and put them in stupid places that only they think make sense. I often have to use the search to find the simplest of things. I want the old control panel with a sensible list. It's usually actually quicker to google the problem. That 4.8GB should automatically increase as required though. At least we never get "out of memory" messages like in very early Windows. ID: 99002 · Rating: 0 · rate: / Reply Quote

Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0	Message 99006 - Posted: 14 Sep 2020, 22:49:50 UTC Thanks to all who replied to my previous message. I aborted that task and all has returned to normal :) Except I notice some tasks are awarded very few credits for the same Gflops as other tasks that are give an order of magnitude more. For example, just 35.12 credits for the first one but 326.58 for the second. It is to do with whether you're reporting the task result first or second? Name rb_09_10_37482_36690__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1010270_821_0 Workunit 1127882972 Created 12 Sep 2020, 0:02:14 UTC Sent 12 Sep 2020, 0:07:00 UTC Report deadline 15 Sep 2020, 0:07:00 UTC Received 14 Sep 2020, 4:25:44 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 4466108 Run time 12 hours 55 min 41 sec CPU time 11 hours 4 min Validate state Valid Credit 35.12 Device peak FLOPS 3.09 GFLOPS Application version Rosetta v4.20 x86_64-pc-linux-gnu Peak working set size 1,135.22 MB Peak swap size 1,276.45 MB Peak disk usage 29.31 MB Name pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_8ae9bz6m_1009438_2_0 Workunit 1127762369 Created 11 Sep 2020, 18:09:26 UTC Sent 11 Sep 2020, 19:34:11 UTC Report deadline 14 Sep 2020, 19:34:11 UTC Received 14 Sep 2020, 2:58:16 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 4466108 Run time 11 hours 49 min 45 sec CPU time 9 hours 59 min 46 sec Validate state Valid Credit 326.58 Device peak FLOPS 3.09 GFLOPS Application version Rosetta v4.20 x86_64-pc-linux-gnu Peak working set size 1,010.59 MB Peak swap size 1,143.78 MB Peak disk usage 30.09 MB ID: 99006 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 99007 - Posted: 14 Sep 2020, 23:18:55 UTC - in response to Message 99006. Rosetta tasks are almost never sent to more than one computer. The low credit one looks like a bug which crops up occasionally: somehow the task ran more than once, and only the last (very short) run received credit, even though it was the first run that did almost all the work. ID: 99007 · Rating: 0 · rate: / Reply Quote

Mr P Hucker Send message Joined: 12 Aug 06 Posts: 1602 Credit: 13,010,866 RAC: 338	Message 99009 - Posted: 15 Sep 2020, 10:35:23 UTC - in response to Message 99007. Rosetta tasks are almost never sent to more than one computer. The low credit one looks like a bug which crops up occasionally: somehow the task ran more than once, and only the last (very short) run received credit, even though it was the first run that did almost all the work. Is this just a credit bug? Is the data from the big first run still saved? ID: 99009 · Rating: 0 · rate: / Reply Quote

Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0	Message 99010 - Posted: 15 Sep 2020, 11:53:13 UTC - in response to Message 99009. Good question. I don’t know whether the results from the second run are added to, or overwrite, those from the first. ID: 99010 · Rating: 0 · rate: / Reply Quote

Breno Send message Joined: 8 Apr 20 Posts: 31 Credit: 14,927,335 RAC: 1,250	Message 99012 - Posted: 15 Sep 2020, 16:46:43 UTC I don't really know if this is a persistent issue. I noticed today that some finished foldit WUs stopped at 45,78% during results upload transfers, they allow re-attempts to send results, but doing so manually causes a very slow upload even w/o any upload transfer limits. Anyway, just thought team R@h should know. ID: 99012 · Rating: 0 · rate: / Reply Quote

Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0	Message 99013 - Posted: 15 Sep 2020, 19:39:22 UTC - in response to Message 99006. Thanks for your feedback. For clarification the two jobs are as far as I can tell unrelated but have very different credit scores - I did not mean to imply they were first and second runs of the same task. I know some projects send out jobs to multiple computers, I did not know if Rosetta was one of these. ID: 99013 · Rating: 0 · rate: / Reply Quote

Aravah Send message Joined: 12 Apr 20 Posts: 6 Credit: 1,101,172 RAC: 0	Message 99014 - Posted: 15 Sep 2020, 19:39:24 UTC - in response to Message 99006. Last modified: 15 Sep 2020, 19:40:31 UTC tx ID: 99014 · Rating: 0 · rate: / Reply Quote

bormolino Send message Joined: 16 May 13 Posts: 4 Credit: 160,977 RAC: 0	Message 99022 - Posted: 16 Sep 2020, 18:10:43 UTC I'm still having issues with the graphics on Ubuntu 18.04. It shows "No shared mem". ID: 99022 · Rating: 0 · rate: / Reply Quote

Falconet Send message Joined: 9 Mar 09 Posts: 355 Credit: 1,669,337 RAC: 396	Message 99061 - Posted: 20 Sep 2020, 15:43:12 UTC Last modified: 20 Sep 2020, 15:57:29 UTC Seeing some tasks with a large log with these lines: AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. [ ERROR ]: Caught exception: File: ......srcprotocolsmotif_graftingmoversMotifGraftMover.cc:537 For this scaffold there are not suitable scaffold grafts within your constrains ------------------------ Begin developer's backtrace ------------------------- BACKTRACE: ------------------------- End developer's backtrace -------------------------- AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS. https://boinc.bakerlab.org/rosetta/result.php?resultid=1263012644 https://boinc.bakerlab.org/rosetta/result.php?resultid=1263011946 In those examples, one had the number of decoys at the end and the other one didn't. They are validating but I sure hope this isn't wasted electricity. epcam_breaker_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_2lt3jd5h_1009432_4_0 pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_7es6gq8a_1009506_4_0 EDIT: On another PC https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076666 https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076045 Seems limited to epcam_breaker and pdl1_graft tasks from what I can tell. ID: 99061 · Rating: 0 · rate: / Reply Quote

Detto Send message Joined: 10 Apr 20 Posts: 2 Credit: 788,565 RAC: 0	Message 99062 - Posted: 20 Sep 2020, 17:46:33 UTC For the 3rd time since April I only got 3 credits for a work unit : https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248 any insights? ID: 99062 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1925 Credit: 18,534,891 RAC: 0	Message 99063 - Posted: 20 Sep 2020, 18:52:22 UTC - in response to Message 99062. For the 3rd time since April I only got 3 credits for a work unit : https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248 any insights? Nope. The system completed a Task of exactly the same type without issue. Is there only 1 instance of BOINC running on that system? The difference between CPU time and Runtime indicates the system is doing a fair bit of work other than processing BOINC Tasks, but it's nowhere near as big a difference as other systems that aren't having low Credit issues. <core_client_version>7.16.11</core_client_version> <![CDATA[ <stderr_txt> command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_struct_profile_layered_design_less_IVYW_wt1_091_c1__0.05_0018_8ffb15d87a6b0ee88cff77a7acba3bea_BJH8LOZG_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2711465 Using database: database_357d5d93529_n_methyl/minirosetta_database ====================================================== DONE :: 1 starting structures 28683.4 cpu seconds This process generated 133 decoys from 133 attempts ====================================================== BOINC :: WS_max 4.60468e+08 11:45:31 (22004): called boinc_finish(0) </stderr_txt> ]]> <core_client_version>7.16.6</core_client_version> <![CDATA[ <stderr_txt> command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_plus_struct_profile_091_c1_barrel6_3_c0851511e59b6__0.05_0009_8c3dcb16fc078c91ae0a41d4b95a66fc_M0Z4KTXS_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2841497 Using database: database_357d5d93529_n_methyl/minirosetta_database ====================================================== DONE :: 1 starting structures 28729.4 cpu seconds This process generated 130 decoys from 130 attempts ====================================================== BOINC :: WS_max 4.43654e+08 19:49:09 (1228): called boinc_finish(0) ====================================================== DONE :: 1 starting structures 28916.2 cpu seconds This process generated 1 decoys from 1 attempts ====================================================== BOINC :: WS_max 2.25206e+08 19:58:00 (3178): called boinc_finish(0) </stderr_txt> ]]> It's the usual cause- the Task finished, and yet continued to run and produced one more Decoy from another starting structure. That processing time was added to the earlier processing time, but that final Decoy wasn't, so added to the previous Decoys produced, so Credit was granted based on that one Decoy, and none of the previous work. Grant Darwin NT ID: 99063 · Rating: 0 · rate: / Reply Quote

Falconet Send message Joined: 9 Mar 09 Posts: 355 Credit: 1,669,337 RAC: 396	Message 99064 - Posted: 20 Sep 2020, 18:56:41 UTC - in response to Message 99062. Last modified: 20 Sep 2020, 18:59:36 UTC DELETED ID: 99064 · Rating: 0 · rate: / Reply Quote