Message boards : Number crunching : Why are my 'Remaining' time estimates so far off?
Author | Message |
---|---|
scott Send message Joined: 18 Aug 19 Posts: 4 Credit: 2,238,450 RAC: 0 |
Hi Everyone! I have been running BOINC on my desktop, which has a Ryzen 2600 (12-cores @ 3.66) and 16GB of memory, and when I start a task, it estimates that they will take 8-hours to compete, but end up taking about 24 hours to complete. Why is the estimate so far off of the actual time required? I have linked a screenshot of my BOINC window and the properties of one of the tasks. https://imgur.com/a/vrkpngV Thank you for your help! |
manalog Send message Joined: 8 Apr 15 Posts: 24 Credit: 233,155 RAC: 0 |
Which runtime did you set on your preferences? Default is 8 hours, but you could modified it to one day. The other option is that your computer is taking 24 hours just for the first decoy, but this is very unlikely (in particular given your host's performances) |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,754,927 RAC: 22,839 |
Why is the estimate so far off of the actual time required?If your system is busy doing things other than processing BOINC work, or you have "Use at most 100% of CPU time" set to anything less than 100% it will take more than 8 hours Runtime to do 8 hours of CPU time. A heavily overcommitted system can take 24 hours to complete one 8 hour Task (or longer). However in your case there is no sign of any recently completed work in your Task list, so it's not possible to see if that is the case here. I have linked a screenshot of my BOINC window and the properties of one of the tasks.And it shows a group of Tasks that have just started processing (3min down) and 8 hours to go, which is what i would expect. Grant Darwin NT |
CIA Send message Joined: 3 May 07 Posts: 100 Credit: 21,059,812 RAC: 0 |
Just as a heads up, you are running an older version of BOINC. I suggest you download the current version. https://boinc.berkeley.edu/download_all.php |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,754,927 RAC: 22,839 |
OK, looking at the results- you aren't using the default Target CPU time which is 8 hours, you've set it to 24 hours. The Estimated times (as i understand it) were meant to be determined now by your Targe CPU time. However it would appear that they are set to the default CPU time (8 hours). As you process & return work, then those Estimated times will eventually end up matching the actual processing time. But it will take a while. As it is, there are signs of heavy usage of that system- the CPU time & Runtime show a difference of an hour and a quarter Run time 1 days 1 hours 12 min 51 sec CPU time 1 days 0 hours 0 min 2 secOn a lighly used system i would expect a difference of maybe 5min over 24hrs of processing. Grant Darwin NT |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
However it would appear that they are set to the default CPU time (8 hours). As you process & return work, then those Estimated times will eventually end up matching the actual processing time. But it will take a while. Unfortunately, the corrections no longer work. All of my Ryzen 3000's are set for 18 hour work units, but show 8 hour estimates. It has been that way for weeks/months. (And I use a config.xml to speed up the corrections too.) |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
As far as I can see, all tasks are delivered to all clients (regardless of preferences) with an estimated 80 000 GFLOPs of work to perform and a command-line option to run for 8 hours: <workunit> <rsc_fpops_est>80000000000000.000000</rsc_fpops_est> <command_line>… -cpu_run_time 28800 …</command_line> </workunit>The associated application is declared as achieving 2.77 GFLOPs per second: <app_version> <app_name>rosetta</app_name> <version_num>420</version_num> <flops>2777777777.777778</flops> </app_version>The application must also see the project preference for target run time and, if set, override the command-line parameter with it. But BOINC does not know about that, so will initially estimate that each task will take 80 000 ÷ 2.77 seconds, which is 8 hours. As tasks complete, BOINC compares the elapsed time with the initial estimate and calculates a correction factor. Over time, this factor gets adjusted so that future estimates should be closer to the actual run time. You can see the adjustments by enabling dcf_debug in your Event Log options, and see the factor (for your own machines) on the Computer Details page online. So with target run time set to 24 hours, the correction factor should end up at 3 and the estimate should match the target. (Lots of ‘should’s here, I know; maybe it doesn’t actually work like this, or at all…) |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
As tasks complete, BOINC compares the elapsed time with the initial estimate and calculates a correction factor. Over time, this factor gets adjusted so that future estimates should be closer to the actual run time. You can see the adjustments by enabling dcf_debug in your Event Log options, and see the factor (for your own machines) on the Computer Details page online. So with target run time set to 24 hours, the correction factor should end up at 3 and the estimate should match the target. (Lots of ‘should’s here, I know; maybe it doesn’t actually work like this, or at all…) The estimate corrections stopped working with 4.20. I don't know if it is a bug or a feature, or even if they know about it. The communications is less than ideal. |
scott Send message Joined: 18 Aug 19 Posts: 4 Credit: 2,238,450 RAC: 0 |
Yea fairly decent usage. I also have it set to use 100% of the CPUs 80% of the time, and to stop when non-BOINC is over 75%, which doesn't happen too often. OK, looking at the results- you aren't using the default Target CPU time which is 8 hours, you've set it to 24 hours. The Estimated times (as i understand it) were meant to be determined now by your Targe CPU time. Oh ok, thank you! I went into the settings and changed it. Just as a heads up, you are running an older version of BOINC. I suggest you download the current version. https://boinc.berkeley.edu/download_all.php Thanks, I will upgrade after these tasks finish! My current ones are showing 11 hours down and 16 left, so it might be a bit more accurate. We'll see with the next batch! Thanks for the help everyone! |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Jim1348 wrote: (And I use a config.xml to speed up the corrections too.)Could you give more detail about that? I wonder whether something here is interacting badly with BOINC’s attempts to determine the correction factor automatically. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Could you give more detail about that?It has always worked thus far. Maybe something has changed in BOINC? <cc_config> <options> <rec_half_life_days>1.000000</rec_half_life_days> <use_all_gpus>1</use_all_gpus> <allow_multiple_clients>1</allow_multiple_clients> <allow_remote_gui_rpc>1</allow_remote_gui_rpc> <max_file_xfers_per_project>4</max_file_xfers_per_project> <max_file_xfers>4</max_file_xfers> </options> </cc_config> |
Brian Nixon Send message Joined: 12 Apr 20 Posts: 293 Credit: 8,432,366 RAC: 0 |
Thanks. I don’t see anything there that would affect the correction factor. Digging deeper, I think the problem is that the project is configured with <dont_use_dcf/>which prevents the duration correction factor from being used at all… Getting that fixed will require a project admin’s attention. I’ve mentioned it in the Ralph forum thread where the change was announced. |
Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0 |
Getting that fixed will require a project admin’s attention. I’ve mentioned it in the Ralph forum thread where the change was announced. Very good. They did it for a reason. I hope it can be changed. You have saved us the trouble of asking. Thanks. |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,103,850 RAC: 425 |
I use a 4 hours target time and all the WU's show a time to completion of 8 hours. But they complete in 4 hours and half. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,154,260 RAC: 4,107 |
I use a 4 hours target time and all the WU's show a time to completion of 8 hours. But they complete in 4 hours and half. Do you split time with other Boinc Projects? |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,754,927 RAC: 22,839 |
BOINC stopped using DCF long time ago- when Credit New came out from memory. Rosetta uses it's own Credit mechanism (which i believe reverts to Credit New under certain circumstances). As far as I can see, all tasks are delivered to all clients (regardless of preferences) with an estimated 80 000 GFLOPs of work to perform and a command-line option to run for 8 hours:On all other projects, the rsc_fpops_est value is used for Estimated completion time & Credit calculations. Due to the way Rosetta works (fixed run time, not time to finish a given amount of data) they have a modified credit & Estimated completion time mechanism. As far as I can see, all tasks are delivered to all clients (regardless of preferences) with an estimated 80 000 GFLOPs of work to perform and a command-line option to run for 8 hours:That <command_line>… -cpu_run_time 28800 …</command_line> would explain the fixed Completion time estimates. My understanding was that it was meant to be supplied by the users Target CPU time setting. The BOINC Manager uses that value for determining how much work to request, and i would expect the Scheduler makes use of it for determining how much work to actually send out (along with Max tasks per day etc). So that value is available to to given to each Task as it is sent out to different hosts. At least the present system does stop most people from getting way more than they can handle when new work types/applications come out. The only ones now impacted will be those with larger than the default target CPU Runtime, and at least they won't get huge amounts more than they can process. Just a bit more. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,754,927 RAC: 22,839 |
I use a 4 hours target time and all the WU's show a time to completion of 8 hours. But they complete in 4 hours and half.The CPU is doing things other than just crunching BOINC work, that's why the discrepancy between CPU time & Runtime. Your system. Run time 4 hours 17 min 44 sec CPU time 3 hours 57 min 49 sec My lightly used system. Run time 7 hours 55 min 24 sec CPU time 7 hours 52 min 27 sec My dedicated cruncher. Run time 7 hours 58 min 13 sec CPU time 7 hours 57 min 30 sec Grant Darwin NT |
scott Send message Joined: 18 Aug 19 Posts: 4 Credit: 2,238,450 RAC: 0 |
Well, after updating to the newest version and changing the time setting in my profile to 8 hours, it still appears to be doing the same thing. After updating, it also downloaded more than it is currently running, though I usually have it stop between sets of tasks so they stay on the same schedule, and it hasn't downloaded extra tasks before the update. They aren't any smaller, so I'm not sure why they populated, but they are probably based on the incorrect estimate of 8 hours. I can get these done before the deadline, but I was just curious why it happened after updating? Picture of current tasks: https://imgur.com/a/7AJutph |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1677 Credit: 17,754,927 RAC: 22,839 |
Well, after updating to the newest version and changing the time setting in my profile to 8 hours, it still appears to be doing the same thing.Since you haven't changed anything else, of course it is. You have so many cores & threads available, your system is on for so many hours each day, it is able to process BOPINC work for a certain percentage of that time, you have your cache set to a certain size, you have your Resource share between projects set to a certain ratio. All the Manager is doing is meeting your settings. If you only want it to download enough work to process, and no more, then set your cache to zero. The larger your cache, the longer it takes to process work, the more projects you do, the more settings you change, etc, etc, etc the longer it will take for your Resource share settings to be honoured. I also have it set to use 100% of the CPUs 80% of the time, and to stop when non-BOINC is over 75%, which doesn't happen too often.That's part of your Task processing time problem. Having CPUs xx% of the time at anything less than 100% means things will take much longer than they should. If you have a problem with your system cooling, then limit the number of cores/threads in use, but keep "Use at most xx % of CPU time" at 100%. And i personally don't bother with "Suspend when non-BOINC CPU usage is above --- %" at all. Rosetta (and most other BOINC projects) all run at idle priority. If something else starts running, it gets as much CPU time as it needs. No need to stop BOINC from processing, that happens anyway. And other than badly behaved web page scripts, very few other general use programmes require even that much CPU time. If you have a programme that really needs the CPU time when it's running (eg rendering), then you can use the Exclusive applications option in the BOINC Manager to stop it what that application starts up, then restart when it's done. Grant Darwin NT |
Daedalus Send message Joined: 1 Aug 08 Posts: 39 Credit: 10,103,850 RAC: 425 |
I use a 4 hours target time and all the WU's show a time to completion of 8 hours. But they complete in 4 hours and half. I used to. But it was too much of a hassle so my main rig works for folding now. None of my two computers are pure crunching boxes. I use them for common work. I used to even game on my main rig before the COVID crisis. |
Message boards :
Number crunching :
Why are my 'Remaining' time estimates so far off?
©2024 University of Washington
https://www.bakerlab.org