Purge function

Message boards : Number crunching : Purge function

To post messages, you must log in.

AuthorMessage
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 40679 - Posted: 10 May 2007, 22:15:05 UTC
Last modified: 10 May 2007, 22:17:44 UTC

I posted a request for a purge function over on the BOINC forum - I've requested it before and I think I recall FluffyChicken doing the same at some point, possibly on Rom Walton's blog...

Anyway, Nicholas has posted back there saying it's already implemented in BOINC! I think this would be a really useful addition to R@H for two reasons:

1. Reduced processing of expired tasks
2. Ability to purge jobs that are causing problems after their release

There must be thousands of computers out there running tasks that have expired. I know some of these results will still be of use, but I'm sure in most of the cases the cycles could be put to better use. There are lots of computers out there that are off for a while, then used heavily for a while, and off again - for example students who go through cycles of doing assignments. My mum went away for a week recently and when the computer was switched back on on her return, it would have run expired tasks for two days if I didn't mail her and ask her to remove the expired jobs...

So to summarise, I think it'd boost the useful throughput of the project, and give a very useful tool to reduce the chances of fatal tasks being crunched.

Any comments/thoughts?

cheers
Danny

Edit: Here's a good example - my friend is on holiday and when he gets back his computer is going to run all of these expired jobs before it gets on to some useful work!
ID: 40679 · Rating: 0.99999999999999 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 40710 - Posted: 11 May 2007, 14:35:30 UTC
Last modified: 11 May 2007, 14:37:37 UTC

Thanks to Rattledagger on the BOINC forum:

This is handled by 2 flags:
1; result_abort
2; result_abort_if_not_started

As the names implies, #1 will immediately abort result. Intended used if wu cancelled by project or errored-out, meaning result won't be used by project and user won't get any credit for it, and it's therefore a waste of time to continue.

#2 will only abort if have never started crunching a result, even if you've only crunched 1 second client won't abort result. Intended used in cases wu has already got "canonical result", meaning result won't be used by project and it's therefore no point to start crunching it. But, since user can still get credit for result, a started result isn't cancelled, since it's no way to know if only a few seconds or a long time left to crunch...


Client-side, #1 has been included for nearly a year, since v5.5.1. #2 was added at the same time, but due to a small bug #2 wasn't working before in v5.8.17, meaning for windows-users you need v5.9.xx.

Server-side, nothing was done until WCG had a batch of bad wu's in March? 2007, and they programmed this part, and got it incorporated in general BOINC-code in April with some bug-fixes added later...

For project to use this, they needs to enable "send_result_abort" in their config-file.


I think it would be very wise to implement this before a batch of bad tasks is released rather than waiting until it happens, as it's almost inevitable that it will happen at some point.
ID: 40710 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 40912 - Posted: 13 May 2007, 23:00:20 UTC

Quote from Sekerob:

WCG implemented it and created some of the server side code. Recently it send instruction to BOINC clients for a bad batch. It only works on >= 5.8 clients. Think D@H adopted their code. Theoretically servers could send an instruction to clients where work was deemed 'Too Late' or 'No Reply', but thing is, the client has to initiate a server contact. Not heard if the abort feature is being employed to do those. Helps efficiency and lessens frustration if inadvertently one gets completed and zero credit is awarded.

It looks like the majority of the code is already available. Any comments?...
ID: 40912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
FluffyChicken
Avatar

Send message
Joined: 1 Nov 05
Posts: 1260
Credit: 369,635
RAC: 0
Message 41016 - Posted: 15 May 2007, 15:31:20 UTC - in response to Message 40912.  

Quote from Sekerob:

WCG implemented it and created some of the server side code. Recently it send instruction to BOINC clients for a bad batch. It only works on >= 5.8 clients. Think D@H adopted their code. Theoretically servers could send an instruction to clients where work was deemed 'Too Late' or 'No Reply', but thing is, the client has to initiate a server contact. Not heard if the abort feature is being employed to do those. Helps efficiency and lessens frustration if inadvertently one gets completed and zero credit is awarded.

It looks like the majority of the code is already available. Any comments?...


bumpalicious
Team mauisun.org
ID: 41016 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 41168 - Posted: 19 May 2007, 22:19:44 UTC

bump!

There's a 2.4GHz C2D here that's going to be crunching nothing but expired WUs!

Surely adding this function would be equivalent to gaining a few thousand computers???
ID: 41168 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 41525 - Posted: 27 May 2007, 12:55:52 UTC

the cancelled gp04 jobs are another reason for the purge function! There are probably still thousands of those jobs in the wild, some of which will cause some people problems!
ID: 41525 · Rating: 9.9920072216264E-15 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 41613 - Posted: 29 May 2007, 20:51:22 UTC

1.16GB VM for a stalled rosetta thread! that's a record for me. I've lost 11hrs crunching on this machine and 22hrs on another, and now i've lost a remote cruncher permanently because one of these jobs was hogging memory! If the purge function was activated then these jobs could have been removed from people's queues before they caused problems.

It's inevitable that a batch jobs are going to be released that cause major problems and being able to purge these after they've been downloaded to the clients could make a huge difference.

On top of that, it'll give the project a big boost in processing power - far fewer expired jobs would be run.

Can someone pls have a look at implementing this, or just comment on it - the code is already in BOINC!

Danny
ID: 41613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Matt3223

Send message
Joined: 15 Dec 05
Posts: 10
Credit: 58,569
RAC: 0
Message 41614 - Posted: 29 May 2007, 21:20:45 UTC
Last modified: 29 May 2007, 21:27:11 UTC

Sounds to be something worth at least looking into!

I'll watch this thread to see when a reply shows up

[edit] seems the purge idea has been in requested as far back as 2005!....apparently the priority isn't all that high...

I remember using FADs purge function quite a bit........definitly made me feel like I wasn't wasting time crunching unnecessary units..[/edit]
ID: 41614 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 41615 - Posted: 29 May 2007, 21:35:33 UTC - in response to Message 41614.  

I remember using FADs purge function quite a bit........definitly made me feel like I wasn't wasting time crunching unnecessary units..[/edit]

yeah - imo it was one of the best additions they made at FaD ;)

It's inevitable that it's gonna be needed at some point or the project risks losing lots of computers to either annoyed users or stuck jobs. (It's not a threat - I won't leave - just a concern!) :D
ID: 41615 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,821,902
RAC: 15,180
Message 42141 - Posted: 13 Jun 2007, 16:13:20 UTC

bump ;)

Lots of cpu time being wasted - i've got another machine crunching expired jobs here.
ID: 42141 · Rating: -2 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : Purge function



©2024 University of Washington
https://www.bakerlab.org