Low Memory Allowance for BOINC can cause Rosetta Computation Errors

Message boards : Number crunching : Low Memory Allowance for BOINC can cause Rosetta Computation Errors

To post messages, you must log in.

AuthorMessage
Profile machspeed2200

Send message
Joined: 17 Jul 06
Posts: 1
Credit: 283,513
RAC: 0
Message 63904 - Posted: 1 Nov 2009, 5:14:34 UTC

I've recently started up BOINC on a linux box with 512MB physical ram, and 1GB swap. I had the memory limit set to around 250MB but got a good 20 computation errors with Rosetta reporting "exceeded memory limit"!

To temporarily solve this I've upped this limit but it taxes my system like crazy!

CAN ROSETTA BE MADE TO BE TOLERANT TO LOW MEMORY LIMITS????
ID: 63904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63909 - Posted: 1 Nov 2009, 17:24:07 UTC

No, it cannot be changed. Rosetta distinguishes between "high" and "low" memory systems in an effort to only send tasks that are known and expected to take more memory to machines that are suited for it. However, your machine is at the lowest recommended level of memory, so you are already getting the low memory tasks.

It looks like you are describing your Linux box, and it only reports one CPU. So limiting number of CPUs won't help.

I can only suggest that you set things to allow around 90% of memory, but only when the machine is idle. You could attach to another project that uses less memory, and your existing setup with only allowing 50% of memory while the machine is in use should result in the other project running while the machine is in use, and both projects running when it is idle. Also, check the box to leave applications "in memory" (really in the swap space) when preempted. Because when the machine goes from idle to used, you don't want to lose work done.

You will most likely still see a little sluggishness in the first minute or so of using your machine after Rosetta has been running for a while. This is due to the memory pages being written out to the swap file as you load other applications.

Your other alternatives would be to expand the memory, or help the project by recruiting others to run Rosetta on machines with more memory.

I'm going to move this thread to the Number Crunching board where others that are seeing the same issue may search for it.
Rosetta Moderator: Mod.Sense
ID: 63909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 63913 - Posted: 2 Nov 2009, 1:53:02 UTC

I say just add 1GB of RAM to your PC. RAM is very cheap these days.
ID: 63913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 63918 - Posted: 2 Nov 2009, 14:39:07 UTC - in response to Message 63909.  

Rosetta has a history of consuming more memory than it tells BOINC that is uses. This has manifest as many tasks starting up but not freeing memory, if your boinc preferences limit the amount of memory running. I've had a virtual machine completely max out its 1 GB allocation & all of swap, rendering the system unusable. All this with 5 tasks trying to spawn, even with the memory restrictions.

To machspeed2200: You need to install 1 GB of RAM (total) for Rosetta; it will trash your system otherwise. Also, set the "leave applications in memory" OFF in your preferences. Just general advice, but it's good to disable any unused services in your OS. Whether it's Linux or Windows, they all have a few running.
ID: 63918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63921 - Posted: 2 Nov 2009, 15:43:22 UTC

DJ I would say you are being a bit alarmist here. I'm sorry you had difficulties. Glad that you seem to have figured out how to set the preferences in a way that works for you. But I wanted to try and clarify both for future readers, and perhaps for you what I believe may have occurred on your machine...

What you are describing are really BOINC limitations, not specific to Rosetta. The way BOINC enforces the memory limit is by letting a task run up until the point that it exceeds the memory threshold you have set in the BOINC preferences. For Rosetta tasks, much of the memory is allocated to the task within the first 5 minutes of execution. At the point too much memory is used, the task is suspended waiting for memory and BOINC looks for anything else it might be able to work on. Upon suspending that task, it will either be kept in memory (which is your virtual memory) or discarded, depending on your preference for how to handle preempted tasks.

BOINC remembers that the task was consuming xMB of memory, and therefore will not begin running it a second time until it has enough memory available (perhaps when the machine goes idle, or another task frees up some memory). But if you do not keep tasks in memory while preempted, I believe BOINC forgets how much memory was needed for the task. It might then begin running it, only to find it exceeds your memory preference and continue in a loop through all of the tasks, getting started, exceeding memory, and then forgetting all about it.

It can be rather maddening that at a time that BOINC thinks memory is constrained, it starts up and preempts more tasks then it would normally run, and thus potentially worsens the potential for problems with the swap file and disk IO that might tend to already be occurring on a memory constrained system.

So, it sounds like at the time you(DJ) ran what you are describing, your swap file was not large enough to hold all of the suspended tasks, and at the time, your setting must have been to keep preempted tasks in memory. It also sounds like your operating system did not handle the full swap file very well.

So, I hope this sheds some light on why DJ's experience causes them to suggest not keeping preempted tasks in memory, and yet my advice was the opposite. DJ's advice was to try and help you avoid a system failure due to filling the swap file. My advice was given assuming adequate swap file size, and more graceful OS handling of a filled swap file.
Rosetta Moderator: Mod.Sense
ID: 63921 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 63922 - Posted: 2 Nov 2009, 16:54:12 UTC - in response to Message 63921.  

mod.sense,

Actually, that sounds like exactly what happened. I only recovered the machine by logging in via SSH and creating an additional swapfile on the / partition. Problem is, BOINC then completely unloads several applications because the "use swap space" preference was set to 50%. Then, the restart/suspend loop you talked about occurs. I seriously thought about enforcing memory and cpu quotas in Linux on the boinc user, but that seems like such a drastic "hack" to get around a poorly thought-out design for a memory-bound scenario. Such an OS-level enforcement would surely cause Rosetta tasks to fail with errors. I ended up restricting BOINC to 1 CPU on a 1GB machine, and this avoids the whole issue.

In the meantime, I agree that machspeed should have a big swapfile, perhaps even 2GB, just to avoid issues. However, it sounds like the kernel is having to swap active pages frequently just to run one task. That would lead me to conclude that 512MB of memory is not sufficient for Rosetta to operate.

The Rosetta Beta 5.98 application tells boinc that it is rsc_memory_bound by 100,000,000 bytes. That seems like its steady-state memory requirement, but tracking peak and shared memory (library) usage shows that it really needs 180MB.
ID: 63922 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 63923 - Posted: 2 Nov 2009, 18:06:37 UTC

DJ, the rsc_memory_bound that you mention is really not related to the issue you describe. This is more for determination of which machines will be sent which WUs. So that may explain why you feel it is being ignored.

So far as I know, the BOINC Manager periodically monitors the running application to see it's actual reported memory usage, and then totals this for all active tasks and compares to your configured preferences. If you are over, then I believe it then suspends the one using LESS memory (because the one using more memory will have harder time finding periods of time to run) that gets it back under your preference.

If you can spare the time, it would be very helpful to ensure the BOINC folks have this on their "to do" list. "this" whole memory management issue, but specifically the loop that can occur. If you could verify the loop condition still occurs on current BOINC client release, and if the problem is not already reported in the trac wiki, then get an issue opened. And point out that neither of your alternatives are very attractive. Either you keep tasks in memory and fill your swap space, or you purge from memory and do the loop. (I'm assuming that if you keep in memory and have a swap that can hold it all, that it "remembers" how much memory that task was using as I described earlier, but that would be another point to test).
Rosetta Moderator: Mod.Sense
ID: 63923 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
DJStarfox

Send message
Joined: 19 Jul 07
Posts: 145
Credit: 1,250,162
RAC: 0
Message 63938 - Posted: 3 Nov 2009, 16:07:23 UTC - in response to Message 63923.  
Last modified: 3 Nov 2009, 16:07:40 UTC

There is a Trac task very similar, but I don't think the developers really understand the severity of the "looping" condition.

http://boinc.berkeley.edu/trac/ticket/305

I don't have access to add any comments.
ID: 63938 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Low Memory Allowance for BOINC can cause Rosetta Computation Errors



©2025 University of Washington
https://www.bakerlab.org