Job Queue

Message boards : Number crunching : Job Queue

To post messages, you must log in.

AuthorMessage
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68276 - Posted: 30 Oct 2010, 23:26:18 UTC

Is there a simple way to have BOINC "forget" whatever parameters it uses in determining how many tasks to keep in the queue to satisfy number of days work to keep on hand?

I ask this in the hope that by starting over he might be a little more consistent.

For example, I have a few systems which, although my setting is 3 days, only keep about 4 or 5 tasks in the queue.

But I also have a few systems which keep about 80 tasks in the queue.

All are quad or hex core Phenom II processors except for my xeon in the Mac Pro.
ID: 68276 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 68291 - Posted: 31 Oct 2010, 10:07:18 UTC - in response to Message 68276.  

Is there a simple way to have BOINC "forget" whatever parameters it uses in determining how many tasks to keep in the queue to satisfy number of days work to keep on hand?

I ask this in the hope that by starting over he might be a little more consistent.

For example, I have a few systems which, although my setting is 3 days, only keep about 4 or 5 tasks in the queue.

But I also have a few systems which keep about 80 tasks in the queue.

All are quad or hex core Phenom II processors except for my xeon in the Mac Pro.


I do the adjustments in the Boinc Manager which affects only that pc, so I can fine tune how many tasks I get. But you should also look in the Boinc Manager on that pc and check out the projects tab and then hilite Rosie and then click on properties on the left and all the way at the bottom is a "Duration correction factor" and a number, what is the number? The higher it is the longer it thinks a task will take and therefore the fewer tasks that machine will get. Each machine does its own thing as to how many units to get, with Boinc doing all the calculations on its own.
ID: 68291 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68380 - Posted: 4 Nov 2010, 11:50:06 UTC

OK - as much as it pains me to admit you are right, I'll concede on this one Biscuit Boy. There is definitely the correlation you described between the "correction factor" and the number of jobs in the work queue.

However, I thought that these numbers were smoothed and adjusted on a regular basis - with the exception of the occasional reboot for patches these systems have been running 24/7 for the past few months with no changes in hardware.

How do you get these numbers to more closely match the capacity of the hardware?

ID: 68380 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68388 - Posted: 4 Nov 2010, 16:33:29 UTC

It is one of those things where constant adjustment is not always ideal. If a few tasks in a row fail for some reason, the DCF can get thrown off by thinking tasks finish in less time. If a few tasks hit long-running models, DCF can get thrown off by thinking tasks take a long time to run (and therefore it doesn't take as many to keep the machine busy for an X hour buffer).

Correct me if I'm wrong, but I've always assumed that I can observe the side effect of DCF being off by looking at the estimated time to completion of a task that has not been started yet, and if that is not in-line with my runtime preference, then I know work fetch will be off, and that DCF is not reflecting the machine's ability to complete the work. It will adjust itself again over time, but at the moment, it's off.
Rosetta Moderator: Mod.Sense
ID: 68388 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68401 - Posted: 4 Nov 2010, 23:18:02 UTC

Mod.Sense - thanks for the response - you are partially right - but I don't think we have hit the mark yet.

Let's examine "popeye" - my desktop system which also crunches numbers when I type real slow. CPU ID 127776

Jobs sitting in the "Ready to start" state show a projected run time of about 7:33

My preferences are set to 4 hour execution time.

I have requested three days worth of jobs to be kept in the local queue.

If you look at the output of a typical task - 37692446 - you will see my run_time_preference set to 14400 - or four hours if you prefer.

The output of the tasks shows it completing in 14276 seconds - right on schedule.

This is a 3.4 ghz quad core system. It typically has right around 4 or 5 jobs in the "Ready to start" state.

For Rosetta it has a DFC of 1.8885

Logically speaking on this system, a quad core with a four hour run time and a 3 day work queue I should have about 72 jobs in the queue - give or take.

(4 cores x 6 4-hour tasks per day x 3 days)

Even with a projected run time of 7.5 hours it does not even cover a half day of work.

The only adjustments I have made to the system have been via the Rosetta web site - during the last round of server problems I set my jobs to run at 12 hours just to stay "employed" - they were returned to 4 hours when the servers started functioning normally again.

I have never done a manual edit of any Rosetta / BOINC file.

The under-populated work queue has been an issue for months.

I have the 6.10.56 version of BOINC for Linux running.

This is only a problem on a few of my systems and is no big deal as long as the Rosetta servers continue to run pretty much error free.

Thanks for any suggestions





ID: 68401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68403 - Posted: 5 Nov 2010, 0:48:49 UTC

All I can offer is that I know they are still tinkering with the work fetch rules in the BOINC core client. Hopefully they get the additional buffer feature working better soon. Sometimes it seems to me everything is working well and then they make a bunch of changes, and then all it seems to understand is deadline, and being completely out of work.
Rosetta Moderator: Mod.Sense
ID: 68403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 68405 - Posted: 5 Nov 2010, 0:52:53 UTC - in response to Message 68276.  

Is there a simple way to have BOINC "forget" whatever parameters it uses in determining how many tasks to keep in the queue to satisfy number of days work to keep on hand?

I ask this in the hope that by starting over he might be a little more consistent.

For example, I have a few systems which, although my setting is 3 days, only keep about 4 or 5 tasks in the queue.

But I also have a few systems which keep about 80 tasks in the queue.

All are quad or hex core Phenom II processors except for my xeon in the Mac Pro.


Oh my god, you are over the 4 million mark already?!
ID: 68405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68407 - Posted: 5 Nov 2010, 1:42:55 UTC

OK - I tore into this a little and I think I see what is going on - I don't understand how things got that way, but on each of my systems which have a really short work queue, if you go the file:

global_prefs_override.xml


you see the tag :

"work_buf_additional_days" set to 0.25

On the systems which are functioning normally this value is set to the expected 3.0 days.

I am not sure why this file is not being updated - the permissions are set to 644 and the owner is the same as what BIONC runs under.

Despite the fact that I change my preferences from time to time using the website, the last time this file was updated was back in early August. Although it is an "override" dataset, it is unclear to me how this value would be manually set - I haven't spotted the menu option for it yet which likely means it is looking me right in the eye.

I backed up the file on one system and used vi to manually set the value to 2 - now in the wait and see mode.




ID: 68407 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68408 - Posted: 5 Nov 2010, 1:47:43 UTC

Chile Man said:

Oh my god, you are over the 4 million mark already?!


To be honest, I had not even noticed that until you mentioned it - medical treatments and work have kept me so busy of last that I have been running on auto pilot.

Haven't even really had time to pull "Biscuit Boy's" chain - which is always recreational.

ID: 68408 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chris Holvenstot
Avatar

Send message
Joined: 2 May 10
Posts: 220
Credit: 9,106,918
RAC: 0
Message 68424 - Posted: 6 Nov 2010, 2:23:30 UTC

Transient said ...

It is in the "Network Usage" tab 3rd or 4th line on the right


That was the solution. I'm not sure how those ever got set down to 0.25 days but now all my systems are sitting fat, dumb, and happy.

Thanks to all who made suggestions.

ID: 68424 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,178,626
RAC: 3,201
Message 68426 - Posted: 6 Nov 2010, 11:39:23 UTC - in response to Message 68408.  

Chile Man said:

Oh my god, you are over the 4 million mark already?!


To be honest, I had not even noticed that until you mentioned it - medical treatments and work have kept me so busy of last that I have been running on auto pilot.

Haven't even really had time to pull "Biscuit Boy's" chain - which is always recreational.



Hey, hey, HEY!!! Be nice or I will send my staff infection to you!!! Actually I am over it now but for awhile they thought I had MRSA, which is just an old staff infection that happens to be resistant to an antibiotic! In the end it was not MRSA but after 7 days of iv's every 12 hours and then 15 days of oral antibiotics, 2 different kinds for 5 of those days, they say I am good to go now! Between seemingly passing myself several times going back and forth to the doctors office for iv's and getting my pc's setup with a new, for me, backup process, I too have been on auto pilot for a few weeks!!

I built a Windows Home Server but somewhere around the time I switched from Comcast to Verizon Fios I lost access to the backups on it. Backups would be happening but I could not restore anything and then I accidentally deleted a whole hard drives worth of data, a 500 gig hard drives worth of data!!! After a month of trying to get it back I gave up and removed all the hard drives from the machine, except the C: drive, and built another machine for them and the backups! It still runs as a Server and crunches but it no longer does ANY backups at all! I need the Server part because of the number of pc's I have here, 11 currently running. Somebody needs to be in charge and if a Server is on the system it automatically takes on that role.

Anyway along with that I needed to separate out my wifes pc in the backup process. She takes a ton of digital pictures and won't delete anything! So I bought two 2tb drives and I will set them up as Raid 1 so I will have a backup of her backup. I have not bought the enclosure yet but I have one in mind that is fast and local. In the end her machine will have 1.5tb of storage and two 2tb drives for backups. The other two 1tb drives, that came out of the Server, went into the other machine I built and all other machines will be backed up to them. About half are done already but the drives that have alot of my data on them are still not done yet. When I did one pc it took 36 hours to do it over my network, so it is a time investment and a my rac takes a hit when I do it!

I am glad you got your job queue back up and going like you like it! I prefer a shorter one, I have the initial cache set at 0.10 days and the additional set at 0.25 days except on 2 pc's where it is set to 0.75 days. I weather most outages okay but do run out of work occasionally!
ID: 68426 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael Gould

Send message
Joined: 3 Feb 10
Posts: 39
Credit: 15,438,423
RAC: 4,427
Message 68450 - Posted: 8 Nov 2010, 5:58:22 UTC - in response to Message 68410.  
Last modified: 8 Nov 2010, 6:00:58 UTC

Although it is an "override" dataset, it is unclear to me how this value would be manually set - I haven't spotted the menu option for it yet which likely means it is looking me right in the eye.

It is in the "Network Usage" tab 3rd or 4th line on the right.


Aha!! That's why I wasn't getting the amount of buffer I wanted! Man, the things you can learn lurking here! Thank you transient. And thanks Holvenstot for asking the question.
ID: 68450 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Job Queue



©2024 University of Washington
https://www.bakerlab.org