Anyone Else having problems with too much work?

Message boards : Number crunching : Anyone Else having problems with too much work?

To post messages, you must log in.

AuthorMessage
ChiTownDale

Send message
Joined: 10 Dec 05
Posts: 3
Credit: 57,428
RAC: 0
Message 53280 - Posted: 22 May 2008, 19:14:57 UTC

Since I re-installed BOINC and used the 64 bit version, Rosetta has been dumping work units on me like crazy. Instead of one or two units at a time I am getting a dozen or more work units at a time.
I have had to abort many work units so that my other BOINC projects get some CPU time.

This isn't only happening with Rosetta - it is also happening with SETI, World Community Grid, Spinhenge, Einstein, LHC and just about all of the projects I have under BOINC control.

Is this because of downloading the 64 bit version of BOINC (which doesn't seem to actually be a 64 bit application since VISTA still asks me if I want to run the BOINC application, which it should not do if it actually were a 64 bit application) or because I am running a dual processor?

These are the only two changes I have made since I was previously running and these projects downloaded one work unit at a time so I assume one of these two changes is what is causing this to occur.

Has anyone else experienced this?

Prior to this the only problem I have had with BOINC is that it sometimes wouldn't upload results from completed work units unless I manually updated the projects. So I would be suffering from too little work to do since I would have 4-5 projects sitting there with completed work units waiting to be uploaded so that new work units could then be downloaded.

So it looks like the pendulum has swung from one end (too little work) to the other (too much work).

Will there ever be a happy medium?

I don't want to be kicked out of these projects because I am aborting so many work units (though I only abort those that have not started being processed by my computer, with a few minor exceptions where only 5-6 seconds of CPU time has been expended).


ID: 53280 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 53282 - Posted: 22 May 2008, 20:06:32 UTC

No need to worry about being kicked out of a project. Some aborts are normal and the system is designed to accomodate.

You don't have to complete the upload of one piece of work in order to get another. But when you run several projects like that, BOINC is trying to assure you don't get more work then you can compelte before the deadlines, and so it is basically scheduling a period of time it will not be doing work for each project in turn.

From the sounds of it, you have just increased your cache size, which is either in your general preferences, or local preferences for the PC. The configuration screens have two entries, one for connect to internet every xxxx days, and the other for additional days of work. The default is to connect about every .1 days (i.e. 10 times per day). But if you changed it to 1 or more days, then this will cause more work to be brought down.

The other thing that can happen is if the BOINC estimated runtimes get thrown off, it may start getting too much work. This can happen if a WU finishes much earlier then normal due to an error of some kind, or perhaps in Rosetta's case if you get a task that has especially long running models, then it may end significantly earlier then normal to assure it doesn't exceed your Rosetta preference for runtime per task.

Since you mentioned all the projects are bringing down more work, the above probably covers it, but the other thing specific to Rosetta is the ability to set your preference for how long to run on each task. And it takes BOINC a while to adjust to such changes, so typically you want to make them gradually.
Rosetta Moderator: Mod.Sense
ID: 53282 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
ChiTownDale

Send message
Joined: 10 Dec 05
Posts: 3
Credit: 57,428
RAC: 0
Message 53290 - Posted: 23 May 2008, 3:41:32 UTC - in response to Message 53282.  

No need to worry about being kicked out of a project. Some aborts are normal and the system is designed to accomodate.

From the sounds of it, you have just increased your cache size, which is either in your general preferences, or local preferences for the PC. The configuration screens have two entries, one for connect to internet every xxxx days, and the other for additional days of work. The default is to connect about every .1 days (i.e. 10 times per day). But if you changed it to 1 or more days, then this will cause more work to be brought down.



Well, I checked the connect time and the default reads as zero. So I don't know if that means connect every .1 days or what. Doesn't matter to me, I don't know what that would do to BOINC as far as how much work it believes should be downloaded. I will fiddle around with it and see what might be the ideal amount of time between connections now that I am running a dual processor. I wil; start with .05 days since the Rosetta project units generally require less than an hour to complete each. If BOINC accepts that one then this ought to restrict Rosetta to a single work Unit at a time. If that works out I can bump it up to the point where it supplies two work units at a time so there is no lagging while awaiting a new work unit to be downloaded.

Also, I never changed my cache size but perhaps VISTA computes available cache differently, particularly since I now have 6 Gig of real memory available verses two before I upgraded to the dual processor.

I see that the "Additional Work Buffer" is defaulting to 3 days. That could also be causing this since with such small units of work (less than one hour each for Rosetta) BOINC would assume that I have plenty of room for a lot of work units, which I actually do have. I just don't want so many being worked upon at one time so that all different projects get their fair share of my CPU time. So I will fiddle with that figure as well, reducing it to 1.5 days and seeing what that does to the work stack. If it drops it too much I can slowly increase it until I reach a happy medium.
Thanks for the information and ideas on how to get my arms around this beast. Hopefully it will lead to more work units being completed for the benefit of all of these various projects.



The other thing that can happen is if the BOINC estimated runtimes get thrown off, it may start getting too much work. This can happen if a WU finishes much earlier then normal due to an error of some kind, or perhaps in Rosetta's case if you get a task that has especially long running models, then it may end significantly earlier then normal to assure it doesn't exceed your Rosetta preference for runtime per task.



This is also probably true with the additional computing power I now have. As I said, most of my Rosetta work units have been less than an hour. Now they seem to be a bit larger, running 1.5 to 2 hours in length.
So perhaps after making these two changes I will just let BOINC grind away for a week or two before trying any more adjustments.

Thanks for the help...



ID: 53290 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 53298 - Posted: 23 May 2008, 13:31:03 UTC

Yes, so an additional work buffer of 3 days, and tasks that BOINC's experience shows take about an hour to complete... you probably got quite a pile of tasks there. Not to worry, BOINC balances with your other projects (as best it can) and only ordered 3 days of Rosetta work, not 3 days of your machine's time.

With such a short runtime preference, you will find frequently that Rosetta tasks take more then an hour. I suspect perhaps you hit a spirt of tasks that took less then an hour. And so BOINC felt it needed more tasks on hand to keep that 3 days of work buffered up.

Not to worry, BOINC takes care of getting the next work unit in plenty of time to have one ready before the last is completed. In your case, plenty of time probably includes some work being done for other projects in between Rosetta tasks. So, having no tasks for any one project on hand at any given point and time is nothing to worry about. Conversely, having all these Rosetta tasks isn't going to do any permanent harm to the other projects. BOINC tracks the time on each and balances it out to match your resource shares over time.
Rosetta Moderator: Mod.Sense
ID: 53298 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Anyone Else having problems with too much work?



©2024 University of Washington
https://www.bakerlab.org