Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 71 · 72 · 73 · 74 · 75 · 76 · 77 . . . 318 · Next

AuthorMessage
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,743,417
RAC: 15,187
Message 98994 - Posted: 13 Sep 2020, 18:02:21 UTC - in response to Message 98989.  
Last modified: 13 Sep 2020, 18:03:11 UTC

Is this expected?
Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead.


Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error").
ID: 98994 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98996 - Posted: 13 Sep 2020, 19:21:12 UTC - in response to Message 98994.  
Last modified: 13 Sep 2020, 19:55:46 UTC

What’s not clear is whether those tasks legitimately need all the memory they’re allocating (in which case they should run to completion on a system with sufficient resources) – or whether it’s a bug, and the tasks will simply keep requesting more memory (however much the system makes available) until they fail.

I disable the paging file on every machine I operate. If that causes anything to fail to run, I need more RAM!
ID: 98996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,372,156
RAC: 211
Message 98997 - Posted: 13 Sep 2020, 19:41:11 UTC - in response to Message 98994.  

Is this expected?
Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead.


Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error").

You appear to be assuming that the available swap space is all of the unused space on the hard drive. I doubt if this is correct.

However, I just tried checking my Windows 10 computer, and it appears that Windows 10 no longer uses anything called swap space.

I remember that some previous versions of Windows had a reserved area on the hard drive known as swap space,
ID: 98997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 98998 - Posted: 13 Sep 2020, 19:55:30 UTC - in response to Message 98997.  
Last modified: 13 Sep 2020, 19:58:09 UTC

‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems.

Windows calls it a ‘paging file’, and it’s still very much present in Windows 10.
(Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group)
ID: 98998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,743,417
RAC: 15,187
Message 98999 - Posted: 13 Sep 2020, 19:56:07 UTC - in response to Message 98996.  

What’s not clear is whether those tasks legitimately need all the memory they’re allocating (in which case they should run to completion on a system with sufficient resources) – or whether it’s a bug, and the tasks will simply keep requesting more memory (however much the system makes available) until they fail.

I disable the page file on every machine I operate. If that causes anything to fail to run, I need more RAM!


If you have an SSD, paging isn't as bad as it used to be with those rust spinners, where if it started being used, the interface became so slow you weren't sure if what you clicked was being done slowly or it hadn't noticed, so you click again then it ends up loading 5 of them slowing it down even further, and you spend so much time waiting for the computer you have no time to go to work to earn money to buy more RAM.
ID: 98999 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,743,417
RAC: 15,187
Message 99000 - Posted: 13 Sep 2020, 19:58:08 UTC - in response to Message 98997.  

Is this expected?
Several users have reported fold_and_dock tasks trying (and usually eventually failing) to allocate tens of gigabytes of memory. If you have a vast amount of swap space they might be able to complete, but you’re probably as well just aborting them and doing something else instead.


Don't most people have way more than 10s of GB of swap space? Most people have their OS set to the default of using whatever is required. One of my machines has 2TB free, two have 1TB free, and the other three have about 200GB free. I assume those tasks would have a shot at running on my machines. I don't know if I've had any, and may not have noticed. I only notice if something isn't using the processor very well (the CPU time and wall time are drastically different in Boinctasks), or if it's crashing (the task changes colour and shows "computation error").

You appear to be assuming that the available swap space is all of the unused space on the hard drive. I doubt if this is correct.

However, I just tried checking my Windows 10 computer, and it appears that Windows 10 no longer uses anything called swap space.

I remember that some previous versions of Windows had a reserved area on the hard drive known as swap space,


On this Windows 10 computer (and all of them unless you change it), paging is automatic. It will increase it if it needs it, but doesn't reserve a chunk of disk for it.
ID: 99000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,372,156
RAC: 211
Message 99001 - Posted: 13 Sep 2020, 22:52:17 UTC - in response to Message 98998.  

‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems.

Windows calls it a ‘paging file’, and it’s still very much present in Windows 10.
(Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group)

Thanks.

Windows 10 made it hard to find the right control panel if you haven't used it lately!

Try Windows System / System and Security / System for the first few steps.

About 4,8 GB currently on my computer.
ID: 99001 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,743,417
RAC: 15,187
Message 99002 - Posted: 13 Sep 2020, 23:11:16 UTC - in response to Message 99001.  
Last modified: 13 Sep 2020, 23:12:02 UTC

‘Swap space’ is the name for disk-space-as-memory in UNIX-family operating systems.

Windows calls it a ‘paging file’, and it’s still very much present in Windows 10.
(Control Panel > System > Advanced system settings > Advanced tab > Performance group > Advanced tab > Virtual memory group)

Thanks.

Windows 10 made it hard to find the right control panel if you haven't used it lately!

Try Windows System / System and Security / System for the first few steps.

About 4,8 GB currently on my computer.


Not sure why Microsoft has split all the controls up and put them in stupid places that only they think make sense. I often have to use the search to find the simplest of things. I want the old control panel with a sensible list. It's usually actually quicker to google the problem.

That 4.8GB should automatically increase as required though. At least we never get "out of memory" messages like in very early Windows.
ID: 99002 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aravah

Send message
Joined: 12 Apr 20
Posts: 6
Credit: 1,101,172
RAC: 0
Message 99006 - Posted: 14 Sep 2020, 22:49:50 UTC

Thanks to all who replied to my previous message. I aborted that task and all has returned to normal :) Except I notice some tasks are awarded very few credits for the same Gflops as other tasks that are give an order of magnitude more.
For example, just 35.12 credits for the first one but 326.58 for the second. It is to do with whether you're reporting the task result first or second?

Name rb_09_10_37482_36690__t000__1_C1_SAVE_ALL_OUT_IGNORE_THE_REST_1010270_821_0
Workunit 1127882972
Created 12 Sep 2020, 0:02:14 UTC
Sent 12 Sep 2020, 0:07:00 UTC
Report deadline 15 Sep 2020, 0:07:00 UTC
Received 14 Sep 2020, 4:25:44 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 4466108
Run time 12 hours 55 min 41 sec
CPU time 11 hours 4 min
Validate state Valid
Credit 35.12
Device peak FLOPS 3.09 GFLOPS
Application version Rosetta v4.20
x86_64-pc-linux-gnu
Peak working set size 1,135.22 MB
Peak swap size 1,276.45 MB
Peak disk usage 29.31 MB

Name pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_8ae9bz6m_1009438_2_0
Workunit 1127762369
Created 11 Sep 2020, 18:09:26 UTC
Sent 11 Sep 2020, 19:34:11 UTC
Report deadline 14 Sep 2020, 19:34:11 UTC
Received 14 Sep 2020, 2:58:16 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 4466108
Run time 11 hours 49 min 45 sec
CPU time 9 hours 59 min 46 sec
Validate state Valid
Credit 326.58
Device peak FLOPS 3.09 GFLOPS
Application version Rosetta v4.20
x86_64-pc-linux-gnu
Peak working set size 1,010.59 MB
Peak swap size 1,143.78 MB
Peak disk usage 30.09 MB
ID: 99006 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99007 - Posted: 14 Sep 2020, 23:18:55 UTC - in response to Message 99006.  

Rosetta tasks are almost never sent to more than one computer. The low credit one looks like a bug which crops up occasionally: somehow the task ran more than once, and only the last (very short) run received credit, even though it was the first run that did almost all the work.
ID: 99007 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mr P Hucker
Avatar

Send message
Joined: 12 Aug 06
Posts: 1600
Credit: 12,743,417
RAC: 15,187
Message 99009 - Posted: 15 Sep 2020, 10:35:23 UTC - in response to Message 99007.  

Rosetta tasks are almost never sent to more than one computer. The low credit one looks like a bug which crops up occasionally: somehow the task ran more than once, and only the last (very short) run received credit, even though it was the first run that did almost all the work.


Is this just a credit bug? Is the data from the big first run still saved?
ID: 99009 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Nixon

Send message
Joined: 12 Apr 20
Posts: 293
Credit: 8,432,366
RAC: 0
Message 99010 - Posted: 15 Sep 2020, 11:53:13 UTC - in response to Message 99009.  

Good question. I don’t know whether the results from the second run are added to, or overwrite, those from the first.
ID: 99010 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Breno

Send message
Joined: 8 Apr 20
Posts: 30
Credit: 12,995,617
RAC: 1,024
Message 99012 - Posted: 15 Sep 2020, 16:46:43 UTC

I don't really know if this is a persistent issue. I noticed today that some finished foldit WUs stopped at 45,78% during results upload transfers, they allow re-attempts to send results, but doing so manually causes a very slow upload even w/o any upload transfer limits. Anyway, just thought team R@h should know.
ID: 99012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aravah

Send message
Joined: 12 Apr 20
Posts: 6
Credit: 1,101,172
RAC: 0
Message 99013 - Posted: 15 Sep 2020, 19:39:22 UTC - in response to Message 99006.  

Thanks for your feedback.
For clarification the two jobs are as far as I can tell unrelated but have very different credit scores - I did not mean to imply they were first and second runs of the same task. I know some projects send out jobs to multiple computers, I did not know if Rosetta was one of these.
ID: 99013 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aravah

Send message
Joined: 12 Apr 20
Posts: 6
Credit: 1,101,172
RAC: 0
Message 99014 - Posted: 15 Sep 2020, 19:39:24 UTC - in response to Message 99006.  
Last modified: 15 Sep 2020, 19:40:31 UTC

tx
ID: 99014 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile bormolino

Send message
Joined: 16 May 13
Posts: 4
Credit: 160,977
RAC: 0
Message 99022 - Posted: 16 Sep 2020, 18:10:43 UTC

I'm still having issues with the graphics on Ubuntu 18.04. It shows "No shared mem".

ID: 99022 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,302,201
RAC: 1,069
Message 99061 - Posted: 20 Sep 2020, 15:43:12 UTC
Last modified: 20 Sep 2020, 15:57:29 UTC

Seeing some tasks with a large log with these lines:

AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



[ ERROR ]: Caught exception:


File: ......srcprotocolsmotif_graftingmoversMotifGraftMover.cc:537
For this scaffold there are not suitable scaffold grafts within your constrains
------------------------ Begin developer's backtrace -------------------------
BACKTRACE:
------------------------- End developer's backtrace --------------------------


AN INTERNAL ERROR HAS OCCURED. PLEASE SEE THE CONTENTS OF ROSETTA_CRASH.log FOR DETAILS.



https://boinc.bakerlab.org/rosetta/result.php?resultid=1263012644
https://boinc.bakerlab.org/rosetta/result.php?resultid=1263011946

In those examples, one had the number of decoys at the end and the other one didn't.
They are validating but I sure hope this isn't wasted electricity.

epcam_breaker_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_2lt3jd5h_1009432_4_0
pdl1_graft_v1_SAVE_ALL_OUT_IGNORE_THE_REST_7es6gq8a_1009506_4_0

EDIT: On another PC https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076666
https://boinc.bakerlab.org/rosetta/result.php?resultid=1262076045

Seems limited to epcam_breaker and pdl1_graft tasks from what I can tell.
ID: 99061 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Detto
Avatar

Send message
Joined: 10 Apr 20
Posts: 2
Credit: 788,565
RAC: 0
Message 99062 - Posted: 20 Sep 2020, 17:46:33 UTC

For the 3rd time since April I only got 3 credits for a work unit :

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248

any insights?
ID: 99062 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1766
Credit: 18,534,891
RAC: 65
Message 99063 - Posted: 20 Sep 2020, 18:52:22 UTC - in response to Message 99062.  

For the 3rd time since April I only got 3 credits for a work unit :

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1130847248

any insights?
Nope.
The system completed a Task of exactly the same type without issue.
Is there only 1 instance of BOINC running on that system?

The difference between CPU time and Runtime indicates the system is doing a fair bit of work other than processing BOINC Tasks, but it's nowhere near as big a difference as other systems that aren't having low Credit issues.


<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_struct_profile_layered_design_less_IVYW_wt1_091_c1__0.05_0018_8ffb15d87a6b0ee88cff77a7acba3bea_BJH8LOZG_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2711465
Using database: database_357d5d93529_n_methyl/minirosetta_database
======================================================
DONE ::     1 starting structures  28683.4 cpu seconds
This process generated    133 decoys from     133 attempts
======================================================
BOINC :: WS_max 4.60468e+08
11:45:31 (22004): called boinc_finish(0)

</stderr_txt>
]]>





<core_client_version>7.16.6</core_client_version>
<![CDATA[
<stderr_txt>
command: rosetta_4.20_x86_64-apple-darwin -abinitio::fastrelax 1 -ex2aro 1 -frag3 00001.200.3mers.index -in:file:native 00001.pdb -silent_gz 1 -frag9 00001.200.9mers.index -out:file:silent default.out -ex1 1 -abinitio::rsd_wt_loop 0.5 -relax::default_repeats 5 -abinitio::use_filters false -abinitio::increase_cycles 10 -abinitio::rsd_wt_helix 0.5 -beta 1 -abinitio::rg_reweight 0.5 -in:file:boinc_wu_zip Norn_pssm_plus_struct_profile_091_c1_barrel6_3_c0851511e59b6__0.05_0009_8c3dcb16fc078c91ae0a41d4b95a66fc_M0Z4KTXS_data.zip -out:file:silent default.out -silent_gz -mute all -nstruct 10000 -cpu_run_time 28800 -boinc:max_nstruct 20000 -checkpoint_interval 120 -database minirosetta_database -in::file::zip minirosetta_database.zip -boinc::watchdog -boinc::cpu_run_timeout 36000 -run::rng mt19937 -constant_seed -jran 2841497
Using database: database_357d5d93529_n_methyl/minirosetta_database
======================================================
DONE ::     1 starting structures  28729.4 cpu seconds
This process generated    130 decoys from     130 attempts
======================================================
BOINC :: WS_max 4.43654e+08
19:49:09 (1228): called boinc_finish(0)
======================================================
DONE ::     1 starting structures  28916.2 cpu seconds
This process generated      1 decoys from       1 attempts
======================================================
BOINC :: WS_max 2.25206e+08
19:58:00 (3178): called boinc_finish(0)

</stderr_txt>
]]>


It's the usual cause- the Task finished, and yet continued to run and produced one more Decoy from another starting structure. That processing time was added to the earlier processing time, but that final Decoy wasn't, so added to the previous Decoys produced, so Credit was granted based on that one Decoy, and none of the previous work.
Grant
Darwin NT
ID: 99063 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Falconet

Send message
Joined: 9 Mar 09
Posts: 354
Credit: 1,302,201
RAC: 1,069
Message 99064 - Posted: 20 Sep 2020, 18:56:41 UTC - in response to Message 99062.  
Last modified: 20 Sep 2020, 18:59:36 UTC

DELETED
ID: 99064 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 71 · 72 · 73 · 74 · 75 · 76 · 77 . . . 318 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org