task swamping on multi-project host guidance requested

Author	Message
Viktor Send message Joined: 7 Jul 08 Posts: 5 Credit: 3,715,672 RAC: 0	Message 101381 - Posted: 20 Apr 2021, 1:57:04 UTC Howdy all, I have a linux machine running boinc 24/7. I run Milkyway@home on 1 core/1 gpu, Einstein@home on 1 core/1 gpu, Rosetta@home on 4 cpu cores. To accomplish this I have my rosetta app_config set to: <project_max_concurrent>4</project_max_concurrent> This works great except as soon as I accept tasks Rosetta@home feels the need to give me 1000 tasks which are due in 5 minutes. (Exaggeration, but not by much.) If I turn my cache to .01 - .01 which seems to be the overall preferred "fix" after much google action my gpu projects starve due to lack of cache. Ideas? ID: 101381 · Rating: 0 · rate: / Reply Quote

Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1939 Credit: 18,534,891 RAC: 0	Message 101384 - Posted: 20 Apr 2021, 6:56:48 UTC - in response to Message 101381. Howdy all, I have a linux machine running boinc 24/7. I run Milkyway@home on 1 core/1 gpu, Einstein@home on 1 core/1 gpu, Rosetta@home on 4 cpu cores. To accomplish this I have my rosetta app_config set to: <project_max_concurrent>4</project_max_concurrent> This works great except as soon as I accept tasks Rosetta@home feels the need to give me 1000 tasks which are due in 5 minutes. (Exaggeration, but not by much.) If I turn my cache to .01 - .01 which seems to be the overall preferred "fix" after much google action my gpu projects starve due to lack of cache. Ideas? Don't use project_max_concurrent. WIth the number of core/threads limited for Rosetta, the system will struggle to do enough work to meet your Resource share settings, as the GPU projects will always be out performing the work done by CPU only Rosetta. So in order to do enough Rosetta work to catch up with the GPU projects it will need to stop doing GPU work to allow Rosetta to catch up. Give Rosetta more cores & threads, and the GPUs can continue to crucnch without getting way ahead of Rosetta for work done. Ideally, use an app_config.xml file to reserve a CPU core/thread to support your GPUs (if needed), but allow all projects to use all available CPU cores/threads that aren't being used to support a GPU. With more than one project, no cache is best as it will allow your Resource share settings to be met in a matter of days (or weeks) and not months (possibly many months). As long as the Estimated completion time for any Rosetta Tasks you get is around 8 hours, and Rosetta can use all the available CPU core/threads (other than the 2 reserved to support the GPUs), with no cache things should settle down within 24hrs. We did have a batch of work that was erroring out in a matter of seconds, and a couple of other batches that could error out after only an hour or 2, but they have been cleared up so things should settle down now. Grant Darwin NT ID: 101384 · Rating: 0 · rate: / Reply Quote

Jim1348 Send message Joined: 19 Jan 06 Posts: 881 Credit: 52,257,545 RAC: 0	Message 101394 - Posted: 20 Apr 2021, 14:31:40 UTC - in response to Message 101381. This works great except as soon as I accept tasks Rosetta@home feels the need to give me 1000 tasks which are due in 5 minutes. (Exaggeration, but not by much.) If I turn my cache to .01 - .01 which seems to be the overall preferred "fix" after much google action my gpu projects starve due to lack of cache. Recent (in the last couple of years) versions of BOINC have a strange problem due to a change in the scheduler, where they randomly go berserk and download too many work units. I have posted on it in a number of forums. It will eventually correct itself, but in the mean time you can do some of the other fixes. ID: 101394 · Rating: 0 · rate: / Reply Quote

mikey Send message Joined: 5 Jan 06 Posts: 1900 Credit: 12,902,147 RAC: 1	Message 101407 - Posted: 20 Apr 2021, 21:42:34 UTC - in response to Message 101394. This works great except as soon as I accept tasks Rosetta@home feels the need to give me 1000 tasks which are due in 5 minutes. (Exaggeration, but not by much.) If I turn my cache to .01 - .01 which seems to be the overall preferred "fix" after much google action my gpu projects starve due to lack of cache. Recent (in the last couple of years) versions of BOINC have a strange problem due to a change in the scheduler, where they randomly go berserk and download too many work units. I have posted on it in a number of forums. It will eventually correct itself, but in the mean time you can do some of the other fixes. AND it's important to remember that aborting unwanted tasks is an okay thing to do!!! JUST because you got sent a bazillion tasks doesn't mean you have to actually try and finish them, abort the unwanted ones. ID: 101407 · Rating: 0 · rate: / Reply Quote

Viktor Send message Joined: 7 Jul 08 Posts: 5 Credit: 3,715,672 RAC: 0	Message 101424 - Posted: 21 Apr 2021, 18:20:48 UTC - in response to Message 101407. Thank you guys for your thoughtful replies. I will tinker with setting and see if I can get the desired behavior out of my setup. I like the second plan proposed. I do not want my gpu's idle and I need to hold back 2 cores for other non-boinc work. ID: 101424 · Rating: 0 · rate: / Reply Quote

Viktor Send message Joined: 7 Jul 08 Posts: 5 Credit: 3,715,672 RAC: 0	Message 101445 - Posted: 22 Apr 2021, 13:59:28 UTC - in response to Message 101424. Thanks again all who helped. Asking for aid and then not giving updates is a dick'ish move.... thus: * Gutted all my controls via app_config on projects * Changed my prefs to use max of 75% of cores * I kept my cc_config GPU exclusions to force certain GPU apps onto certain GPU's * verified .5 day cache with .01 additional Updated all projects and kickstarted it. Rosetta took 5 cores, GPU projects 1 per, 2 total. * Changed my prefs to use max of 74% of cores because inclusive programmer math. Oops. Updated all projects and kickstarted it. Rosetta took 4 cores, GPU projects 1 per. Rosetta has 4 tasks waiting in reserve which is perfect. ID: 101445 · Rating: 0 · rate: / Reply Quote

Viktor Send message Joined: 7 Jul 08 Posts: 5 Credit: 3,715,672 RAC: 0	Message 101513 - Posted: 26 Apr 2021, 0:13:58 UTC - in response to Message 101446. Please don't be so vague, undoing app_config settings is not trivial. Sure thing and warning heeded. I was checking project status and found a private message from a user in 2020 offering help for the amount of errors my client was throwing. I did a deep dive on how my rosetta progress was going and noticed the flood of tasks, etc mentioned in my initial post. I disallowed any new rosetta tasks a week ago after and let them run through to avoid giving the project any headaches. After I had no tasks left I posted on the forum and received help. Per recommendations I removed the 1 line present in my app_config which was to limit the concurrent tasks. As boinc does not like a blank app_config I deleted it from all projects as I had only created them to help balance rossetta vs the gpu projects. I issued the command to update the projects via "boinccmd --project (url of project) update". I restarted the boinc service which was when I ran into the 6 vs 7 problem. See below. At that point the event log will show you how many CPUs that translates to. Likely the correct six. I've seen BOINC schedule one CPU more than configured when in panic mode but that shouldn't be the case here with only 10 tasks in progress and nearly full time left. Is that what the Manager showed you? Again, that may not be reality. Without different configuration I'd expect 1 core total scheduled for the GPU tasks and the remaining 5 of 6 for CPU tasks. Real usage will rather have been 2+5, more than you wanted. But if the Manager displayed just that in this case it was coincidence. I agree that in theory boinc with 75% volunteered on a 8 core CPU should =6 cores. With that allocation rosetta wanted to run 5 processes and my other two gpu projects wanted to run 2 total, resulting in 7 total used. 6=/=7. My amateur assumption was that I had run into a "counts from 0" issue. My solution was to volunteer 74%, which is confirmed as 5 cores via journalctl. 74% "cpus" volunteered on an 8 core is 5.9x.... so it makes no sense that my this would result with my desired effect: viktor@bender:~$ ps -u boinc PID TTY TIME CMD 84619 ? 00:00:16 boinc 84673 ? 00:11:47 rosetta_4.20_x8 84676 ? 00:11:42 rosetta_4.20_x8 84678 ? 00:11:37 rosetta_4.20_x8 84681 ? 00:11:32 rosetta_4.20_x8 84746 ? 00:09:00 hsgamma_FGRPB1G 84812 ? 00:00:40 milkyway_1.46_x with boinc reporting: max CPUs used: 5 As to what the event manager thinks I can't help you. I could try to fire up a gui, but I can gather what info I need from logs/ps/nvidia-smi/etc. Either set 75% and configure the GPU projects to schedule 1 CPU and 1 GPU per task. That way up to 2 CPUs will be scheduled for (usually) 2 GPU tasks and the remaining 4-6 for CPU tasks. Ok, so it sounds like regardless of my current real life situation being what I am looking for, I came to it via an incorrect way. I am 100% down to keep working until it is done right. I will work with cpu_usage on gpu_versions of the GPU projects. I know I sound like a broken record, but thanks. The replies take me ~15 minutes to type out and those who are providing aid are doing so of their own free will. Much easier to click "next thread". I will report back when I bump my GPU apps to cpu_usage of 1 and see if rosetta takes the other 4 seats. ID: 101513 · Rating: 0 · rate: / Reply Quote

Viktor Send message Joined: 7 Jul 08 Posts: 5 Credit: 3,715,672 RAC: 0	Message 101515 - Posted: 26 Apr 2021, 0:43:44 UTC - in response to Message 101513. Last modified: 26 Apr 2021, 0:54:49 UTC Either set 75% and configure the GPU projects to schedule 1 CPU and 1 GPU per task. That way up to 2 CPUs will be scheduled for (usually) 2 GPU tasks and the remaining 4-6 for CPU tasks. Ok, so it sounds like regardless of my current real life situation being what I am looking for, I came to it via an incorrect way. I am 100% down to keep working until it is done right. I will work with cpu_usage on gpu_versions of the GPU projects. I know I sound like a broken record, but thanks. The replies take me ~15 minutes to type out and those who are providing aid are doing so of their own free will. Much easier to click "next thread". I will report back when I bump my GPU apps to cpu_usage of 1 and see if rosetta takes the other 4 seats. Well that did it. Forcing the GPU projects to eat 1 core per, 2 total and volunteering 75% cores has resulted in 4 rosetta tasks, 1 milkyway, 1 einstein. In hindsight this makes sense as if the gpu projects were using a fraction of a cpu core the math works. Will report back in a week or so. ID: 101515 · Rating: 0 · rate: / Reply Quote