minirosetta 2.15

Message boards : Number crunching : minirosetta 2.15

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Yifan Song
Volunteer moderator
Project developer
Project scientist

Send message
Joined: 26 May 09
Posts: 62
Credit: 7,322
RAC: 0
Message 67961 - Posted: 5 Oct 2010, 3:28:17 UTC

After David Kim and TJ looked into this, we did find a problem with large memory usage with the 2.15 version. I'll do a revert first thing tomorrow. (Too tired to get it started now :p)
ID: 67961 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
deesy58

Send message
Joined: 20 Apr 10
Posts: 75
Credit: 193,831
RAC: 0
Message 67964 - Posted: 5 Oct 2010, 7:16:24 UTC - in response to Message 67939.  

Greetings:

Unfortunately, just like everyone else who's been posting recently, I'm starting to have similar difficulty with the new version of Rosetta.

I'm using a Toshiba Satellite notebook computer with nearly a hundred gigabytes of memory.

Yet, I just now woke up to see the Rosetta screensaver graphic displaying a blank form, and frozen in place.

An alert on my task bar indicated I was out of virtual memory.

Regretfully, I'm going to have to disconnect from the Rosetta project.

Thank you.


I think you might have "nearly a hundred gigabytes" of hard disk space capacity, but I seriously doubt that you have that much RAM memory (Random Access Memory). Virtual memory is a combination of both types of memory, and a shortage of virtual memory is often an indication that your hard disk has become nearly full. How much disk space is available on your machine? To find out, open your "Computer" icon, then right click on "Local Disk (C:)" and select "Properties." This should tell you how much space is available on your hard disk. If you have insufficient space left, you can use the "Disk Cleanup" utility (carefully) to remove files that might no longer be needed. All of this assumes that you are using the Microsoft Windows Operating System, of course, and it appears that you are.

BTW, those "reverse slashes" you referred to in your original post are, indeed, normal in the Microsoft environment.

deesy
ID: 67964 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 67966 - Posted: 5 Oct 2010, 10:11:25 UTC - in response to Message 67961.  

After David Kim and TJ looked into this, we did find a problem with large memory usage with the 2.15 version. I'll do a revert first thing tomorrow. (Too tired to get it started now :p)



take a look at this thread when looking at the ram issues.
ID: 67966 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
P . P . L .

Send message
Joined: 20 Aug 06
Posts: 581
Credit: 4,865,274
RAC: 0
Message 67975 - Posted: 6 Oct 2010, 1:27:58 UTC

This one failed after 15sec.

https://boinc.bakerlab.org/rosetta/workunit.php?wuid=337685670

rb_10_04_377_958_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22344_973_1


Unpacking data: ../../projects/boinc.bakerlab.org_rosetta/rb_10_04_377_958_rs_stg0_lrlxjcst_t000__casp9.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...

ERROR: Error in traceback: pointer doesn't go anywhere!

ERROR:: Exit from: src/core/sequence/Aligner.cc line: 79
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

</stderr_txt>
]]>

ID: 67975 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 67988 - Posted: 7 Oct 2010, 23:22:01 UTC

Here is a whole batch of errors.
Pretty sure one of these errors caused a Windows BSOD.
Wingman also died on these tasks.
No fancy URL links, just raw data

T0605_t2_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22177_298_1
https://boinc.bakerlab.org/rosetta/result.php?resultid=367677827
Incorrect function. (0x1) - exit code 1 (0x1)
ERROR: Error in traceback: pointer doesn't go anywhere!
ERROR:: Exit from: ....srccoresequenceAligner.cc line: 79
BOINC:: Error reading and gzipping output datafile: default.out


fix_disulf_v4_NMR_1eig_CONTROL__BOINC_abrelax.score12.fastrelax.v2_SAVE_ALL_OUT_22291_788_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=367990336
Incorrect function. (0x1) - exit code 1 (0x1)
ERROR: rsd_type_list.size()
ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

fix_disulf_v4_NMR_1m12_CONTROL__BOINC_abrelax.score12.fastrelax.v2_SAVE_ALL_OUT_22291_788_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=367990356
Incorrect function. (0x1) - exit code 1 (0x1)
ERROR: rsd_type_list.size()
ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


mem_abinitio_bench_run01_A_BRD7_SAVE_ALL_OUT_IGNORE_THE_REST_22294_290_1
https://boinc.bakerlab.org/rosetta/result.php?resultid=367990547
Incorrect function. (0x1) - exit code 1 (0x1)
ERROR: Cannot open PDB file "input_BRD7BRD4.pdb"
ERROR:: Exit from: ....srccoreiopdbpose_io.cc line: 182
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


fix_disulf_v4_NMR_1xu6_DISULF__BOINC_abrelax.score12.fastrelax.v2_SAVE_ALL_OUT_22292_695_1
https://boinc.bakerlab.org/rosetta/result.php?resultid=367990548
Incorrect function. (0x1) - exit code 1 (0x1)
ERROR: rsd_type_list.size()
ERROR:: Exit from: ....srccorefragmentFrame.cc line: 62
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish


Ross2X3_SAVE_ALL_OUT_r006_010_22296_138_0
https://boinc.bakerlab.org/rosetta/result.php?resultid=368110199
- exit code -1073741819 (0xc0000005)
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x004CF209 read attempt to address 0x20EEE6E5

Engaging BOINC Windows Runtime Debugger..

T0591_t3_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22223_1177_1
https://boinc.bakerlab.org/rosetta/result.php?resultid=368248774
The system cannot find the path specified. (0x3) - exit code 3 (0x3)
Couple of No heartbeat from client errors messages

That's a pretty big laundry list for 1 or 2 days.
ID: 67988 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mad_Max

Send message
Joined: 31 Dec 09
Posts: 209
Credit: 26,262,530
RAC: 19,111
Message 67996 - Posted: 9 Oct 2010, 2:29:06 UTC

A have few Runtime errors and crashes to destop from minirosetta_2.15_windows_intelx86.exe last days (never seen them before on previous versions of minirosetta, only "standart" errors) and even 1 BSOD too (I had forgotten BSODs since the transition from windows 98 to XP).

On this computer: https://boinc.bakerlab.org/rosetta/results.php?hostid=1252064
Not sure what concrete job caused a BSOD, there are a whole bundle of bads. Some of them with a runtime error, one with BSOD and a few were killed by BOINC for exceeding the limit of memory (after quickly after start grew up to ~ 1 GB)

Like this:
08/10/2010 23:58:52 rosetta@home Aborting task rs_stg0_lrlx_t363__run1_SAVE_ALL_OUT_19372_805_0: exceeded memory limit 1353.20MB > 1223.80MB
09/10/2010 05:51:57 rosetta@home Aborting task lr5_combined_torsion_it01_run01_A_rlbd_256b_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2770_1: exceeded memory limit 1291.75MB > 1223.80MB
09/10/2010 05:53:25 rosetta@home Aborting task rs_stg0_lrlx_t311__run1_SAVE_ALL_OUT_19356_6624_1: exceeded memory limit 1292.80MB > 1223.80MB
09/10/2010 05:55:04 rosetta@home Aborting task lr5_combined_torsion_it01_run01_A_rlbd_1eyv_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2636_1: exceeded memory limit 1269.07MB > 1223.80MB


P.S.
2.15 is the most problematic and buggy version of all that I've seen (since the connection to the project at start of this year inc. 5.98 2.03 2.05 2.10 2.11 2.14) On the forum of my team, just a lot of complaints about this version from other members too.
ID: 67996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Brian Priebe

Send message
Joined: 27 Nov 09
Posts: 16
Credit: 33,020,247
RAC: 0
Message 67997 - Posted: 9 Oct 2010, 5:25:48 UTC - in response to Message 67996.  

2.15 is the most problematic and buggy version of all that I've seen

Out of my most recent 100 WU's, 22% of them blew up on one or another of the errors already posted here. The "Unusual Termination" dialog box from MSVC seems to be becoming more frequent.
ID: 67997 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jim Martin

Send message
Joined: 9 Oct 05
Posts: 23
Credit: 1,443,682
RAC: 1,636
Message 67998 - Posted: 9 Oct 2010, 5:51:18 UTC

Two enclosures:
1) system boot info.
2) error report.

* * *
10/8/2010 11:29:55 PM Starting BOINC client version 6.10.58 for windows_intelx86
10/8/2010 11:29:55 PM log flags: file_xfer, sched_ops, task
10/8/2010 11:29:55 PM Libraries: libcurl/7.19.7 OpenSSL/0.9.8l zlib/1.2.3
10/8/2010 11:29:55 PM Data directory: C:ProgramDataBOINC
10/8/2010 11:29:55 PM Running under account James
10/8/2010 11:29:56 PM Processor: 2 GenuineIntel Intel(R) Core(TM)2 CPU T7200 @ 2.00GHz [Family 6 Model 15 Stepping 6]
10/8/2010 11:29:56 PM Processor: 4.00 MB cache
10/8/2010 11:29:56 PM Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 nx lm vmx tm2 pbe
10/8/2010 11:29:56 PM OS: Microsoft Windows Vista: Business x86 Edition, Service Pack 2, (06.00.6002.00)
10/8/2010 11:29:56 PM Memory: 2.00 GB physical, 4.23 GB virtual
10/8/2010 11:29:56 PM Disk: 142.71 GB total, 89.40 GB free
10/8/2010 11:29:56 PM Local time is UTC -4 hours
10/8/2010 11:29:56 PM No usable GPUs found
10/8/2010 11:29:57 PM rosetta@home URL https://boinc.bakerlab.org/rosetta/; Computer ID 1324493; resource share 100
10/8/2010 11:29:57 PM climateprediction.net URL http://climateprediction.net/; Computer ID 819110; resource share 100
10/8/2010 11:29:57 PM Einstein@Home URL http://einstein.phys.uwm.edu/; Computer ID 1616831; resource share 50
10/8/2010 11:29:57 PM lhcathome URL http://lhcathome.cern.ch/lhcathome/; Computer ID 9825728; resource share 100
10/8/2010 11:29:57 PM Quake-Catcher Network URL http://qcn.stanford.edu/sensor/; Computer ID 9909; resource share 100
10/8/2010 11:29:57 PM SETI@home URL http://setiathome.berkeley.edu/; Computer ID 5490317; resource share 50
10/8/2010 11:29:57 PM Einstein@Home General prefs: from Einstein@Home (last modified 08-Jul-2010 11:24:21)
10/8/2010 11:29:57 PM Einstein@Home Computer location: home
10/8/2010 11:29:57 PM General prefs: using separate prefs for home
10/8/2010 11:29:57 PM Preferences:
10/8/2010 11:29:57 PM max memory usage when active: 1022.66MB
10/8/2010 11:29:57 PM max memory usage when idle: 1840.78MB
10/8/2010 11:30:10 PM max disk usage: 50.00GB
10/8/2010 11:30:10 PM don't use GPU while active
10/8/2010 11:30:10 PM (to change preferences, visit the web site of an attached project, or select Preferences in the Manager)
10/8/2010 11:30:10 PM Not using a proxy
10/8/2010 11:30:10 PM Quake-Catcher Network Restarting task qcnk_sc300_sta200_087854_0 using qcnsensor version 562
10/8/2010 11:30:40 PM rosetta@home Restarting task rs_stg0_lrlx_t363__run1_SAVE_ALL_OUT_19372_6575_0 using minirosetta version 215
10/8/2010 11:30:40 PM climateprediction.net Sending scheduler request: To send trickle-up message.
10/8/2010 11:30:40 PM climateprediction.net Not reporting or requesting tasks
10/8/2010 11:30:44 PM climateprediction.net Scheduler request completed
10/8/2010 11:30:44 PM climateprediction.net Message from server: Project is temporarily shut down for maintenance
10/8/2010 11:32:27 PM rosetta@home Restarting task mem_widd_run02_Mevn_A_2ksy_SAVE_ALL_OUT_IGNORE_THE_REST_22157_49662_0 using minirosetta version 215
* * *
BoincLogX - History

nr / date error_txt project_name domain_name user_total_credit user_expavg_credit CPU error

1596 2010.10.08 13:10:49 rosetta@home james-pc 183398.626579 169.348558 00:00:59 true
1st Entry:

result_name

rs_stg0_lrlx_t311_run1_SAVE_ALL_OUT_19356_2623_0_0

error_txt

The system cannot find the path specified. (0x3) - exit code 3 (0x3);[2010-10-8 12:32:52]::BOINC::Initializing ...

ok. [2010-10-8 12:32-52]


* * * * *

User note:

The above error/WU failure appeared to be generated, immediately after an outputted window,
"Microsoft Visual C++ Runtime Library", was exited. Prior to deleating it, the system ran
slower than normal. Also, after exiting, normal system speed resumed.

The failures occurred on three occasions, although only one is listed, above. BoincLogX information
was entered, by hand, by this user.

Also, approx. 1.29 GB mem was accessed, max.

* * * * *

2nd entry, with the following:

Rosetta Mini 2.15 Mem_widd_run02_Mevn_A_2key_SAVE_ALL_OUT_IGNORE_THE_REST_22157_49662_0.

Used phys mem was in approx. 51%-67% range. No other WU's were run, except QCN@home.

After approx. 23+% run-time, with mem. usage varying from an average of 465 MB, to a max of 1.29 GB,

a second Rosetta Mini 2.15 WU enabled, even though it had been previously been placed in a "suspended" state:

rs_sth0_lrlx_t363_run1_SAVE_ALL_OUT_19372_6575_0.

It resulted in Used phys mem to increase to 99% range, with mem usage, by ...Mem_widd..., about 1.934GB,
and approx. 393.7MB, for rs_sth0...

The system, temporarily, locked up (cursor movement "froze"), until ...Mem_widd... was halted, by the
pgrm., and placed in a "Waiting for memory" mode.

* * * * *

Summary:

1) Rosetta Mini 2.15 WU memory requirements appear to necessitate restricting all other pgrms. from running,
to avoid maxing out memory.

2) Successful WU run not possible, because of the entry of a 2nd Rosetta WU (reason unk.).

Conclusion:

1) If possible, enable the user the option of allocating memory for Rosetta (ref. Garli, Lattice project).
This might free up memory for other projects, for those with multiple CPU's (my system has two);
Rosetta@home might lose some users, if this issue cannot be resolved. I had to drop Garli, for
this reason.

2) Again, if possible, enable the program to function within the allocated memory.

3) Determine the cause of the activation, from "suspended" state, of another Rosetta WU (If it had not
activated, the original WU pbly. would have successfully completed.).

* * *

Pardon the verbosity; hopefully, it will prove enlightening.

JM
ID: 67998 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
cleaner

Send message
Joined: 22 Aug 10
Posts: 6
Credit: 26,245
RAC: 0
Message 67999 - Posted: 9 Oct 2010, 5:54:56 UTC

I would have to agree with Mad Max about the buginess of the 2.15 Rosetta.Before it was mainly lack of memory issues, but now i am starting to get Runtime errors. Hopefully they will soon come out with an updated version, or else revert back to an earlier, more stable version, because this is getting to be a little ridiculus to me.
ID: 67999 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
bparker

Send message
Joined: 9 May 07
Posts: 1
Credit: 1,316,128
RAC: 4,278
Message 68000 - Posted: 9 Oct 2010, 14:09:51 UTC - in response to Message 67999.  
Last modified: 9 Oct 2010, 14:11:04 UTC

I would have to agree with Mad Max about the buginess of the 2.15 Rosetta.Before it was mainly lack of memory issues, but now i am starting to get Runtime errors. Hopefully they will soon come out with an updated version, or else revert back to an earlier, more stable version, because this is getting to be a little ridiculus to me.


I abandoned Rosetta 2.15 until the bugs are fixed and the version changes. All my other BOINC apps work fine except for this one, and it eventually locks up the computer if I leave it running long enough without rebooting. My Window 7 machine works it without problem. My XP machine chokes on it.
ID: 68000 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Aidan & Liz Hopkins

Send message
Joined: 8 Jan 07
Posts: 1
Credit: 525,167
RAC: 0
Message 68003 - Posted: 9 Oct 2010, 20:02:36 UTC

I keep getting the following:

09/10/2010 20:43:39|rosetta@home|Task lr5_combined_torsion_it01_run01_A_rlbd_2hkv_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2172_1 exited with zero status but no 'finished' file
09/10/2010 20:43:39|rosetta@home|If this happens repeatedly you may need to reset the project.

It appears to be linked to a C++ Runtime Library error message, which has inconveniently vanished. After it had kept happening all afternoon I did 'reset', but it is still happening. Original FAQ instructions were to ignore any "exited with zero status but no 'finished' file" situation, but that was a long time ago, and this may be a different problem.

It appears to be doing its regular contact with the server, and nothing is being sent or received at present.

Do I just ignore it and assume the automated updates will resolve it?
ID: 68003 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2130
Credit: 41,424,155
RAC: 16,102
Message 68012 - Posted: 10 Oct 2010, 4:14:53 UTC
Last modified: 10 Oct 2010, 4:15:35 UTC

With all the other reports of problems I thought I'd check my own tasks:

W7-64bit Intel Core2Duo laptop 4Gb RAM - 1 error out of 36 tasks
CURATED_NMR_1k7b_CONTROL__BOINC_abrelax.score12.fastrelax.v4_SAVE_ALL_OUT_22308_425_0
<core_client_version>6.10.58</core_client_version>
<message>Incorrect function. (0x1) - exit code 1 (0x1)</message>
[...]
ERROR: Assertion failure: runtime_assert( ( begin + size - 1 ) <= pose.total_residue() );
ERROR:: Exit from: ....srcprotocolsabinitioFragmentMover.cc line: 258
BOINC:: Error reading and gzipping output datafile: default.out


Vista64 AMD Phenom 9850 Quad Desktop 8Gb RAM - 2 errors out of 86 tasks
CURATED_NMR_1k7b_disulf__BOINC_abrelax.score12.fastrelax.v4_SAVE_ALL_OUT_22309_663_0

Errors exactly as above

T0591_t3_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22223_1049_1
<core_client_version>6.10.58</core_client_version>
<message> - exit code -1073741819 (0xc0000005)</message>
[...]
- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00581B5C write attempt to address 0x00000024

Not too terrible, but I do have a decent amount of RAM to play with on each machine - possibly makes the difference.
ID: 68012 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2130
Credit: 41,424,155
RAC: 16,102
Message 68021 - Posted: 11 Oct 2010, 0:02:06 UTC

Spoke too soon:

lr5_combined_torsion_it01_run01_A_rlbd_1unp_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2260_0
lr5_combined_torsion_it01_run01_A_rlbd_1e6i_SAVE_ALL_OUT_IGNORE_THE_REST_DECOY_18669_2721_0
rs_stg0_lrlx_t363__run1_SAVE_ALL_OUT_19372_6878_0
All report the same:
<core_client_version>6.10.58</core_client_version>
<message> - exit code -529697949 (0xe06d7363)</message>
[...]
Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Out Of Memory (C++ Exception) (0xe06d7363) at address 0x76A4E124

Out of memory on a machine with 8Gb RAM? I doubt it.

One other:
rb_10_04_377_958_rs_stg0_lrlxjcst_t000__casp9_SAVE_ALL_OUT_22344_1087_1
<core_client_version>6.10.58</core_client_version>
<message>Incorrect function. (0x1) - exit code 1 (0x1)</message>
[...]
ERROR: Error in traceback: pointer doesn't go anywhere!

ERROR:: Exit from: ....srccoresequenceAligner.cc line: 79
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

ID: 68021 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Greg_BE
Avatar

Send message
Joined: 30 May 06
Posts: 5691
Credit: 5,859,226
RAC: 0
Message 68034 - Posted: 11 Oct 2010, 18:36:16 UTC

ehhh...nuts to 2.15
aborting them and moving to 2.16
every few tasks come up with ERROR: Error in traceback: pointer doesn't go anywhere!

ERROR:: Exit from: ....srccoresequenceAligner.cc line: 79
BOINC:: Error reading and gzipping output datafile: default.out
called boinc_finish

or some other rubish
ID: 68034 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
wolfpat

Send message
Joined: 1 May 10
Posts: 4
Credit: 2,409,159
RAC: 2,119
Message 68111 - Posted: 16 Oct 2010, 17:07:16 UTC

I also had to abort all the 2.15 tasks. They would not run at all on my XP machine. The 2.16 tasks are running fine.
ID: 68111 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : minirosetta 2.15



©2024 University of Washington
https://www.bakerlab.org