Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 277 · 278 · 279 · 280 · 281 · 282 · 283 . . . 316 · Next

AuthorMessage
MStenholm

Send message
Joined: 18 Apr 20
Posts: 19
Credit: 27,931,768
RAC: 57,823
Message 109255 - Posted: 16 May 2024, 5:10:07 UTC - in response to Message 109250.  

You ran out of memory. Six jobs of 2.6 GB and you have 16 GB.
ID: 109255 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109256 - Posted: 16 May 2024, 6:41:47 UTC - in response to Message 109255.  
Last modified: 16 May 2024, 6:50:26 UTC

You ran out of memory. Six jobs of 2.6 GB and you have 16 GB.
That might do it.
I've got half that many cores/threads & twice that amount of RAM and over the last couple of days when i had mostly Rosetta_VS Tasks there have been times i've had over 60% of my RAM in use.
Even without the 2GB + Tasks, there were plenty of others using 1-1.5GB.


But normally if lack of RAM is an issue, the Taks should have suspended with a "Waiting for memory" note. It shouldn't cause things to crash & burn.
Grant
Darwin NT
ID: 109256 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 412
Credit: 12,535,740
RAC: 13,702
Message 109257 - Posted: 16 May 2024, 8:03:43 UTC - in response to Message 109255.  

You ran out of memory. Six jobs of 2.6 GB and you have 16 GB.


Ach, I thought I had 32gb.

I remember now, the 2 sticks wouldn't play with each other :-(

The other machine has 64gb, I'll update this one to match

Thanks
ID: 109257 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109268 - Posted: 22 May 2024, 9:13:28 UTC
Last modified: 22 May 2024, 9:21:47 UTC

Looks like the boinc-process server is having issues yet again- Rosetta beta Validator & Assimilator are down (along with a few other processes). How far behind witll the Validator get this time?
Presently 11,825 Workunits waiting for Validation.
Grant
Darwin NT
ID: 109268 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109269 - Posted: 22 May 2024, 10:09:59 UTC - in response to Message 109268.  

Looks like the boinc-process server is having issues yet again- Rosetta beta Validator & Assimilator are down (along with a few other processes). How far behind witll the Validator get this time?
Presently 11,825 Workunits waiting for Validation.
Backlog is now 20,000, but Validator now shows as running. Will have to wait a while to see if it actually is.
Grant
Darwin NT
ID: 109269 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,081,426
RAC: 12,283
Message 109270 - Posted: 22 May 2024, 12:06:54 UTC - in response to Message 109269.  

Backlog is now 20,000, but Validator now shows as running. Will have to wait a while to see if it actually is.

Now is 0. Validator queue is empty.
ID: 109270 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 28 May 06
Posts: 87
Credit: 278,104
RAC: 136
Message 109272 - Posted: 22 May 2024, 16:07:03 UTC

Me & all wingman Seeing lots of errors on Android due to what appears to be misconfigured Rosetta task:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1395840251
[ ERROR ]: Caught exception:


File: src/core/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: nan
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 


ID: 109272 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109273 - Posted: 23 May 2024, 6:17:05 UTC - in response to Message 109272.  

Me & all wingman Seeing lots of errors on Android due to what appears to be misconfigured Rosetta task:
https://boinc.bakerlab.org/rosetta/workunit.php?wuid=1395840251
[ ERROR ]: Caught exception:


File: src/core/pack/dunbrack/SingleResidueDunbrackLibrary.hh:306
chi angle must be between -180 and 180: nan
 ------------------------ Begin developer's backtrace ------------------------- 
BACKTRACE:
 ------------------------- End developer's backtrace -------------------------- 
Been a problem for years now.
Grant
Darwin NT
ID: 109273 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,081,426
RAC: 12,283
Message 109275 - Posted: 23 May 2024, 8:40:54 UTC - in response to Message 109273.  

chi angle must be between -180 and 180: nan

Been a problem for years now.


A great classic!!
ID: 109275 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
dcs1955

Send message
Joined: 2 Dec 22
Posts: 13
Credit: 6,721,648
RAC: 14,682
Message 109277 - Posted: 23 May 2024, 16:20:17 UTC

Waiting for Memory.... For the past two weeks I have had one of four core processes held up for needing memory.. It happens on two of my desktops with 16 GRAM. In over 8 years crunching WCG and Rosetta I have not had this happen. Since all the work is Rosetta Beta 6.04. Is this a known issue??
ID: 109277 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 276
Credit: 523,512
RAC: 744
Message 109278 - Posted: 23 May 2024, 16:22:47 UTC
Last modified: 23 May 2024, 16:23:27 UTC

RosettaVS tasks use more memory than 8a_hal
ID: 109278 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
dcs1955

Send message
Joined: 2 Dec 22
Posts: 13
Credit: 6,721,648
RAC: 14,682
Message 109279 - Posted: 24 May 2024, 0:08:38 UTC - in response to Message 109278.  

Thanks.. Do you know if it is significantly more memory? Currently, 50% of my tasks are VS.

Two VS are running (others are 8a-e__hal ) one of 4 processes is using 1.8-2.2G the others are using 100-300M
ID: 109279 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109280 - Posted: 24 May 2024, 5:13:07 UTC - in response to Message 109279.  

Thanks.. Do you know if it is significantly more memory? Currently, 50% of my tasks are VS.

Two VS are running (others are 8a-e__hal ) one of 4 processes is using 1.8-2.2G the others are using 100-300M
You just answered your own question.
Generally they need between 500MB & 2.5GB, depending on the Task.1-1.5GB tends to be more common.
Grant
Darwin NT
ID: 109280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
dcs1955

Send message
Joined: 2 Dec 22
Posts: 13
Credit: 6,721,648
RAC: 14,682
Message 109285 - Posted: 25 May 2024, 3:38:39 UTC - in response to Message 109278.  
Last modified: 25 May 2024, 3:40:22 UTC

Thanks I tweaked the computer preferences to up the memory use percentage. Something I have not needed to do before.
ID: 109285 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109286 - Posted: 25 May 2024, 3:58:14 UTC - in response to Message 109285.  

I've had mine set to
When computer is in use, use at most     95 %
When computer is not in use, use at most 95 %
without issues (with fair bit more RAM per core/thread than your 8GB RAM systems).
Grant
Darwin NT
ID: 109286 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
dcs1955

Send message
Joined: 2 Dec 22
Posts: 13
Credit: 6,721,648
RAC: 14,682
Message 109289 - Posted: 25 May 2024, 7:34:24 UTC - in response to Message 109286.  

I wimped out and stopped at 90%. :)
ID: 109289 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
äxl
Avatar

Send message
Joined: 30 Dec 08
Posts: 11
Credit: 497,080
RAC: 0
Message 109291 - Posted: 26 May 2024, 7:44:10 UTC

Rosetta Beta 6.05
I've had to put RAM usage to 25% for now since it would crash my PC. (Could be faulty modules.)
I even aborted 3 of 4 WUs since they would stay in RAM and I don't think I could have finished them anyway.
The one I kept is still at 24% and it says Elapsed Time ~5h, Remaining Time ~6h, Deadline is in ~6h.

It's running through ScienceUnited so here are the WUs if someone cares:
RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_1_3192_2978231_3

The ones I stopped:
RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_3_1857_2978234_3_0
RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_6_7045_2978237_3_0
RosettaVS_SAVE_ALL_OUT_NOJRAN_UBA5_3H8V_fulldb_IGNORE_THE_REST_WwaHIZ_4_6655_2978235_3_0

It's an old computer:
https://scienceunited.org/su_hosts.php?action=detail&host_id=87101
Running BOINC because:
1) I'm using 100% green energy (no certificates or other non-sense)
2) My computer runs mostly anyway (due to BT and other non-sense)
3) To help
ID: 109291 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 473
Message 109292 - Posted: 26 May 2024, 10:34:08 UTC - in response to Message 109291.  

The fact that it's through Science United makes it impossible to see what's going on (we can't view you computer without being logged in to your account there), and will probably affect what you're able to do about it.

I've had to put RAM usage to 25% for now since it would crash my PC. (Could be faulty modules.)
I even aborted 3 of 4 WUs since they would stay in RAM and I don't think I could have finished them anyway.
The one I kept is still at 24% and it says Elapsed Time ~5h, Remaining Time ~6h, Deadline is in ~6h.

Under Preferences, Computing preferences, make sure Memory, "Leave non-GPU tasks in memory while suspended" is not selected. When running more than one project, no cache is best. Less chance of deadline issues.
Preferences, Computing Preferences, Other,
Store at least            0.1 days of work
Store up to an additional 0.01 days of work

Run Memtest on the system to see if there is an issue with the memory, most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse.
Luckily, there have been very few of those Tasks released in the last 24hrs or so.

Also check your completed Valid Tasks and compare the Run time to the CPU time- if there's more than a few minutes difference, it means you're using your system a bit. If there's 30min or so then you're using it a lot.
Hours+, you or something else on the computer is making a huge use of your CPU's time.
Grant
Darwin NT
ID: 109292 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 202
Credit: 6,881,503
RAC: 10,850
Message 109293 - Posted: 26 May 2024, 12:09:16 UTC - in response to Message 109292.  

Run Memtest on the system to see if there is an issue with the memory, most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse.
Luckily, there have been very few of those Tasks released in the last 24hrs or so.


I have not run memtest in years. Back when I had 8 GBytes of RAM and dual Intel Xeon processors, it took almost a day to run memtest. Now that this machine has 128 GBytes of RAM, it would probably take over a week to run it. This machine has 8 memory modules, and when I raised it from 64 GBytes to 128 GByte it was a little flakey, but it was pretty easy to find which module it was and the RAM supplier replaced it free of charge.

As far as RosettaVS tasks are concerned, I have only two of them waiting to start out of 22 tasks on the machine. At times, half of the tasks on my machine have been RosettaVS, and sometimes two of them have run at the same time. Right now, I have one Rosetta 4.20 Task waiting to run. The biggest tasks I have run have been CPDN like this one:

Task 22317868
Name oifs_43r3_bl_a4ck_2016092300_15_991_12212423_2
Workunit 12212423
Created 15 Apr 2023, 5:23:15 UTC
Sent 15 Apr 2023, 5:24:02 UTC
Report deadline 14 Jun 2023, 5:24:02 UTC
Received 15 Apr 2023, 12:23:18 UTC
Server state Over
Outcome Success
Client state Done
Exit status 0 (0x00000000)
Computer ID 1511241
Run time 6 hours 18 min 49 sec
CPU time 6 hours 13 min 2 sec
Validate state Valid
Credit 1,813.14
Device peak FLOPS 6.06 GFLOPS
Application version OpenIFS 43r3 Baroclinic Lifecycle v1.11
x86_64-pc-linux-gnu
Peak working set size 5,592.19 MB
Peak swap size 5,930.79 MB
Peak disk usage 1,277.90 MB
ID: 109293 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
äxl
Avatar

Send message
Joined: 30 Dec 08
Posts: 11
Credit: 497,080
RAC: 0
Message 109299 - Posted: 27 May 2024, 8:08:12 UTC - in response to Message 109292.  
Last modified: 27 May 2024, 8:12:08 UTC

Grant (SSSF) wrote:
The fact that it's through Science United makes it impossible to see what's going on (we can't view you computer without being logged in to your account there), and will probably affect what you're able to do about it.

Even I can't see much. I can't see done WUs for example.

Under Preferences, Computing preferences, make sure Memory, "Leave non-GPU tasks in memory while suspended" is not selected.

Yes, that helped.

When running more than one project, no cache is best. Less chance of deadline issues.
Preferences, Computing Preferences, Other,
Store at least            0.1 days of work
Store up to an additional 0.01 days of work

Yes, this is the default, isn't it?

Run Memtest on the system to see if there is an issue with the memory,

I'm running memtester on 1GB since yesterday. I don't think it covers much but it's a start, I guess.

most likely it's a lack of memory on the system as most of the RosettaVS_ and Rosetta 4.20 Tasks need plenty of RAM- 500GB to 2.5GB (1-1.5GB tends to be most common). And reducing the amount of memory that BOINC can use, will just make things worse.
Luckily, there have been very few of those Tasks released in the last 24hrs or so.

I've finished the 1 WU ~3h before deadline. (I think the only thing that got me into trouble was that the system froze and then I didn't have time over the weekend.) But you're saying because I didn't do parts 2 to 4 it's bad for the project?

Also check your completed Valid Tasks and compare the Run time to the CPU time- if there's more than a few minutes difference, it means you're using your system a bit. If there's 30min or so then you're using it a lot.
Hours+, you or something else on the computer is making a huge use of your CPU's time.

I can at least check the running WUs. Are you saying that if the difference is too big I shouldn't crunch at all?


Jean-David Beyer wrote:
it was pretty easy to find which module it was

You mean by turning the computer off, pulling a module, turning the computer on, turning it off again etc.?
Running BOINC because:
1) I'm using 100% green energy (no certificates or other non-sense)
2) My computer runs mostly anyway (due to BT and other non-sense)
3) To help
ID: 109299 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 277 · 278 · 279 · 280 · 281 · 282 · 283 . . . 316 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org