partial completion 'waiting to run'

Questions and Answers : Windows : partial completion 'waiting to run'

To post messages, you must log in.

AuthorMessage
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 68766 - Posted: 6 Dec 2010, 20:51:45 UTC

I notice sometimes my work units are shown partially (sometimes almost nearly) finished, but shown as 'waiting to run' while other work units have been started. I try to cache several days worth of work since I've run out many times in the past when the project is down. I don't understand why these units stop in the middle while others are started and finished, then new work started. But somehow the 'waiting to run' units sit. Some are like 95% complete and they just sit and wait to expire from work not being completed by the deadline. Does anyone know why this occasionally happens? The BOINC manager version doesn't seem to matter. I have this happen with new and old versions. ????? Why if a partial WU shows 'waiting to run' and it's almost totally finished, it never restarts before a brand new WU starts?
ID: 68766 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 68767 - Posted: 6 Dec 2010, 20:52:27 UTC

Win XP with intel quad core cpu's.
ID: 68767 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68774 - Posted: 7 Dec 2010, 4:45:21 UTC

My best guess is that this is a memory issue. What can happen, especially with a many core machine, is that a task reaches a point or a model that requires more memory then the rest of the execution has. The combination of all 4 running at the same time then exceeds your preference for how much memory BOINC should use and the task goes to a status of "waiting for memory"... and BOINC seems to take a note that indicates it was using xxx MB of memory when it got deferred to the waiting status.

And so it starts a new task, hoping it might run with less memory and often it can. At no point during the execution of the new task does the memory requirement of the 3 other tasks go low enough for this one that's waiting to run. And so BOINC continues to wait on that one.

Then you reboot your computer, or restart BOINC, and it knows how much memory that task needs, and it doesn't start it for the same reasons that existed when the machine was powered down. At this point, I believe I'm correct in saying it will show a "waiting to run" status rather then the previous "waiting for memory" status. The reason for this might be that it only shows the waiting for memory status when this run of BOINC has actually kicked it out due to the preference on how much memory to use for BOINC tasks. Since it hasn't run it yet this time, it shows the status differently. But the underlaying fact is that BOINC knows how much memory needs to be free for that task to run, and that is now why it is waiting.

Most people allow a higher percentage of memory to be used by BOINC when the machine is idle. And so often such tasks will be picked up and run when the machine is not in use and BOINC is allowed more memory. But BOINC strives to preserve as much completed work as possible as well, and so it probably wouldn't transition back to that task until another task reaches a checkpoint. So I wouldn't expect it to instantly pick it up when the machine is idle for the configured number of minutes.

BOINC is trying to meet your preferences. One presumably is to use all 4 CPUs, and another is for BOINC to live within your preference for memory usage.

What happens when the task approaches deadline? It sounds like it does eventually run... when it does run again, do you find you only have 3 active tasks? Or perhaps a fourth that is just getting started and is not using much memory yet?

How much memory does your machine have? How much is BOINC configured to use? (check the messages as BOINC starts).

If memory does prove to be the issue, there are only a few ways for it to run any differently then it already is:
1) Get more memory, or allow BOINC to use a higher percentage of memory (which may make your machine a bit sluggish, but try it and see. You can always set it back)
2) Keep the BOINC % of memory the same for when active, but allow BOINC to use more when idle. This can make it take a moment to wake up after you've been away for a while. Generally not a big deal, just don't panic fearing a blue screen of death. But it should give enough for the task to run when you are away.
3) Limit BOINC to using 3 (or less) CPUs. This would reduce the amount of memory BOINC needs to run, but also reduce your throughput.
4) Manually suspend tasks until your pesky one runs.
5) Don't worry about it. BOINC will "git 'er done" when the deadline approaches. No tinkering required.
Rosetta Moderator: Mod.Sense
ID: 68774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 68783 - Posted: 7 Dec 2010, 22:48:06 UTC - in response to Message 68774.  

Mod.Sense - first thanks for taking the time for such a detailed response.
I believe what you are saying makes complete sense for my situation. I just installed my first i7 Bloomfield core cpu and while it's a quad I was a little surprised to see it running 8 tasks right off the bat. The i7 threading capabilities make for that apparently. The box has 4 gigs of ram but being Windows XP it's only using 3. I just checked another box with a Q9550 quad that has done the same thing with 2 WU's now waiting to run. Same deal, XP,
4 gigs of RAM (3 useable), etc. I think I just hit some bigger projects that pegged my RAM.

My preferences are set to use 100% of all memory, page file, etc. on my boxes.

Everything else you write appears to be what I've seen. I'll recheck system log messages also to see if this is started by a 'waiting for memory' issue that morphs into the 'waiting to run' as you write.

I'm leaving the rest of your great response complete so hopefully it can help someone else if they experience this and are wondering. Now it will be shown twice on the page.

Thanks again and best of the Holidays to you and everyone associated with Rosetta@home.

/amgthis
My best guess is that this is a memory issue. What can happen, especially with a many core machine, is that a task reaches a point or a model that requires more memory then the rest of the execution has. The combination of all 4 running at the same time then exceeds your preference for how much memory BOINC should use and the task goes to a status of "waiting for memory"... and BOINC seems to take a note that indicates it was using xxx MB of memory when it got deferred to the waiting status.

And so it starts a new task, hoping it might run with less memory and often it can. At no point during the execution of the new task does the memory requirement of the 3 other tasks go low enough for this one that's waiting to run. And so BOINC continues to wait on that one.

Then you reboot your computer, or restart BOINC, and it knows how much memory that task needs, and it doesn't start it for the same reasons that existed when the machine was powered down. At this point, I believe I'm correct in saying it will show a "waiting to run" status rather then the previous "waiting for memory" status. The reason for this might be that it only shows the waiting for memory status when this run of BOINC has actually kicked it out due to the preference on how much memory to use for BOINC tasks. Since it hasn't run it yet this time, it shows the status differently. But the underlaying fact is that BOINC knows how much memory needs to be free for that task to run, and that is now why it is waiting.

Most people allow a higher percentage of memory to be used by BOINC when the machine is idle. And so often such tasks will be picked up and run when the machine is not in use and BOINC is allowed more memory. But BOINC strives to preserve as much completed work as possible as well, and so it probably wouldn't transition back to that task until another task reaches a checkpoint. So I wouldn't expect it to instantly pick it up when the machine is idle for the configured number of minutes.

BOINC is trying to meet your preferences. One presumably is to use all 4 CPUs, and another is for BOINC to live within your preference for memory usage.

What happens when the task approaches deadline? It sounds like it does eventually run... when it does run again, do you find you only have 3 active tasks? Or perhaps a fourth that is just getting started and is not using much memory yet?

How much memory does your machine have? How much is BOINC configured to use? (check the messages as BOINC starts).

If memory does prove to be the issue, there are only a few ways for it to run any differently then it already is:
1) Get more memory, or allow BOINC to use a higher percentage of memory (which may make your machine a bit sluggish, but try it and see. You can always set it back)
2) Keep the BOINC % of memory the same for when active, but allow BOINC to use more when idle. This can make it take a moment to wake up after you've been away for a while. Generally not a big deal, just don't panic fearing a blue screen of death. But it should give enough for the task to run when you are away.
3) Limit BOINC to using 3 (or less) CPUs. This would reduce the amount of memory BOINC needs to run, but also reduce your throughput.
4) Manually suspend tasks until your pesky one runs.
5) Don't worry about it. BOINC will "git 'er done" when the deadline approaches. No tinkering required.


ID: 68783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 68786 - Posted: 8 Dec 2010, 7:02:50 UTC

Unless you are doing something to keep a log of task status, I don't know how you will be able to see if tasks are going to "waiting for memory" status. Normally no such message is issued to the log.

Because your memory allowed is no higher when the machine is idle, the task never has a window of opportunity to sneak back in because there's never sufficient memory. Setting to allow 100% is perhaps unwise. You need to leave a place for the operating system to live.

With 8 tasks and 3GB, those machines are really over committed. You may want to review your disk IO and page faulting rates. It is possible the machines are thrashing where the tasks are all pushing each other out of memory to disk. This can lead to excessive wear on your disk drive, and to reduced throughput overall. The tasks essentially expend more resources fighting each other, then getting useful work done.

Suggest you limit BOINC to 4 CPUs, ok maybe 6 if you do little else on the machine. This should be sufficient for the tasks to run unimpaired by memory swapping, thus potentially improving throughput and reducing wear on the drive. Should also afford enough memory to run things more in deadline order as you were expecting.

Interesting experiment. Check the RAC on your two machines now. Set one to 5 CPUs and the other to 6 CPUs and see which maintains the higher RAC between the two. And how they compare to the RAC running (trying to run) with 8 CPUs. Keep in mind it takes about 2 weeks for such changes to be fully reflected in the RAC.

When you are ready, obviously making use of the remaining 1GB of memory on the machine will be more ideal. As you know, to do so will require bumping to 64bit Windows. 4GB for 8 virtual cores is still on the low side for memory.
Rosetta Moderator: Mod.Sense
ID: 68786 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 68851 - Posted: 21 Dec 2010, 17:27:12 UTC

Mod Sense, thanks again. More experimentation is needed. I have only one i7 cpu
but I can tweak both ways for a couple of weeks and watch what happens. I'm hoping to install 64 bit windows 7 if I can get past some BIOS issues. I did watch while BOINC 'orphaned' off several of my nearly complete WU's as time expired and they were still 'waiting to run'. So that answered one question I had - BOINC will let the WU expire past it's due date and start newer WU's with
later deadline dates if you have memory issues like I do.

More testing is in order.

Merry Christmas everyone!
ID: 68851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mikg

Send message
Joined: 13 Mar 09
Posts: 3
Credit: 317,208
RAC: 0
Message 71440 - Posted: 19 Oct 2011, 18:42:13 UTC - in response to Message 68851.  

Mod Sense, thanks again. More experimentation is needed. I have only one i7 cpu
but I can tweak both ways for a couple of weeks and watch what happens. I'm hoping to install 64 bit windows 7 if I can get past some BIOS issues. I did watch while BOINC 'orphaned' off several of my nearly complete WU's as time expired and they were still 'waiting to run'. So that answered one question I had - BOINC will let the WU expire past it's due date and start newer WU's with
later deadline dates if you have memory issues like I do.



I have a Dell T7500 with dual 6 core Xeons (HT enabled) for a total of 24 threads available (boinc is set to use only 85%).

I'm seeing this same behavior and I have 48gb of ram. I will check to see how much I am allocating to boinc when I get home and adjust from there.

Sounds like I'm close to running out of headspace trying to do too much all at once (GPUGrid is also running with two GPUs at the same time).

Mike
ID: 71440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
amgthis

Send message
Joined: 25 Mar 06
Posts: 81
Credit: 203,879,282
RAC: 0
Message 71630 - Posted: 22 Nov 2011, 17:30:14 UTC - in response to Message 71440.  

I've since put together another quad core box this time with the 2500
sandy bridge intel and 16g of ram. I have boinc manager set to use
100% of memory and plenty of disk space. I'm also now running Debian
'squeeze' release on this box. Unfortunately, Boinc running Rosetta
still exhibits this behavior of abandoning wu's partially completed, and
starting work with *later* expire dates. This has resulted in lots
of work dying on the vine and expiring prior to completion. I've also
set my extra work buffer up to 9 days sometimes because I've run out of
work. Now I've lowered it to 4-5 days, since it never really gathers
enough work for all cores for all days you set anyhow. Plus I didn't want
it starting even more work before finishing others already in progress.
BTW, right now Rosetta is the only project for this manager to
try and manage. (6.10.58 from the debian stable tree)

With all cores running 100% 24/7 no restrictions - my memory free
is over 9 gigs. No swap being used.

It seems the manager really isn't all that great at queuing work
to consistently avoid letting good work go to waste and not being
returned on time.


Mod Sense, thanks again. More experimentation is needed. I have only one i7 cpu
but I can tweak both ways for a couple of weeks and watch what happens. I'm hoping to install 64 bit windows 7 if I can get past some BIOS issues. I did watch while BOINC 'orphaned' off several of my nearly complete WU's as time expired and they were still 'waiting to run'. So that answered one question I had - BOINC will let the WU expire past it's due date and start newer WU's with
later deadline dates if you have memory issues like I do.



I have a Dell T7500 with dual 6 core Xeons (HT enabled) for a total of 24 threads available (boinc is set to use only 85%).

I'm seeing this same behavior and I have 48gb of ram. I will check to see how much I am allocating to boinc when I get home and adjust from there.

Sounds like I'm close to running out of headspace trying to do too much all at once (GPUGrid is also running with two GPUs at the same time).

Mike


ID: 71630 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 71632 - Posted: 23 Nov 2011, 17:45:42 UTC

Yes, all of those sound like BOINC issues, not specific to Rosetta. It decides how much work to keep on hand, which task to run next, and how best to manage your backlog of tasks.

Have you tried more than one version of BOINC?
Rosetta Moderator: Mod.Sense
ID: 71632 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Questions and Answers : Windows : partial completion 'waiting to run'



©2025 University of Washington
https://www.bakerlab.org