Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 278 · 279 · 280 · 281 · 282 · 283 · 284 . . . 315 · Next
Author | Message |
---|---|
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Nope.When running more than one project, no cache is best. Less chance of deadline issues. No idea what the defaults are, but it certainly isn't those values- if it were, people wouldn't have nearly as many problems as they do. If you are running Windows 7 or later, you can just use it's Memory Diagnostic tool. It doesn't work the memory as hard, so it doesn't take as long, but it will still show up dodgy memory. If it's only borderline it may not pick up a problem, then you make use of the F1 key to change the default test options, which will take longer.Run Memtest on the system to see if there is an issue with the memory,I'm running memtester on 1GB since yesterday. I don't think it covers much but it's a start, I guess. But you're saying because I didn't do parts 2 to 4 it's bad for the project?Sorry, but i've got no idea what it is you're asking bout there. No, just that you should find out what else is chewing up your CPU time.Also check your completed Valid Tasks and compare the Run time to the CPU time- if there's more than a few minutes difference, it means you're using your system a bit. If there's 30min or so then you're using it a lot. Taking 3 hrs 10 min to do 3hrs work isn't an issue. But if you're taking 9hrs+ to do only 3 hrs worth of work, it's really something you should look in to. Jean-David Beyer wrote: it was pretty easy to find which module it wasYou mean by turning the computer off, pulling a module, turning the computer on, turning it off again etc.?[/quote]Nope. Turn the computer off, remove all but one module, then power back up & test that module. If it's faulty- job done. If not, power down, pull that module, fit another one. Test again. Etc, etc. On systems with huge amounts of RAM & multiple modules, testing one at a time lets you do other things in between testing modules if you have to- otherwise all you can do it start the test & then wait for it to finish, hours (or days) later. Grant Darwin NT |
Jean-David Beyer Send message Joined: 2 Nov 05 Posts: 201 Credit: 6,765,644 RAC: 6,588 |
Jean-David Beyer wrote: I could do it a little more effidiently than that. It ran with 4 modules for many months. The problem occurred when I added 4 new modules. So I took out all 4 new modules and the problem went away. I put in two of the new modules and still no problems. I moved those two new modules to the other two memory slots (was it a slot problem or a module problem?) and it still worked, So I put another new module in and it still worked, so it was probably the last new module. And so it proved. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2025 Credit: 9,943,884 RAC: 6,777 |
Some daemons are down...so, no validation |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2183 Credit: 41,726,991 RAC: 6,784 |
Hello. I'm back :) Now, where were we... I have to tell you, I'm absolutely amazed that you think Boinc scheduling being wrong by 50% one way or 2-300% the other way for the bulk of the time a task is processing - and 100% of the time it's sitting waiting in the cache - is no kind of problem,And I am absolutely amazed & astounded you would think something that at no stage have I ever said or I suggested. Well, it started when I talked about reducing target runtime from 12 to 8hrs, thereby reducing wallclock time of running tasks by 7-11hrs each and reducing wallclock time of all cached tasks by 7-11hrs each as well , and you said 'all it would do is reduce tasks tripping into panic mode', as if that wasn't the pragmatic solution. Taking 14-22hrs out of runtime goes a long way - in al l likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run. By saying "all it would do..." you're saying it's not a solution, when it might be the entire solution to missed deadlines and panic mode. It is a problem for Scheduling. The entire problem is one of scheduling and the failure to meet deadlines. Everything else you talk about is purely academic and entirely irrelevant to the user if deadlines are met and all CPU time is maximised for projects important to the user. Which they are. It's an issue for the computer to solve, which it's perfectly capable of and it will always do for itself better than any attempt to micromanage it with other settings. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2183 Credit: 41,726,991 RAC: 6,784 |
But for anyone else that's been reading these posts... Well, that's obviously not true. If you limit the number of cores available to Boinc, your Boinc processing will be limited to the maximum # of cores you've allocated, while your unallocated cores won't be used for Boinc and may or may not be fully utilised, depending what else is going on. Use all your cores all the time. Your computer will decide millions of times per second what it should do with its capability better than any human ever will. Do not listen to the man behind the curtain. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2183 Credit: 41,726,991 RAC: 6,784 |
Some daemons are down... so, no validation They were, probably for about 16hrs today. I think it's now fixed. 175k backlog when I looked earlier, now below 100k. Edit: Server status page says there's still a 96k backlog, but it's not a live figure. Checking all my hosts, all tasks have been validated for each of them. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 411 Credit: 12,359,416 RAC: 3,742 |
But for anyone else that's been reading these posts... One thing I’ve noticed is that my Ryzens appear to be power limited, the TDP is 65w and the PTT comes out at 88w so with all 24 cores running each core is getting about 3.67w but with only 20 cores running each core gets about 4.4w and the power draw is still 88w with the cores running a higher frequency. Also, running 23 cores allows the os to have its two pennies worth without having to swap out the data for a running WU and then swap it back in again, the WUs can run closer to 100% than the 97/98% they get when running 24 cores. That being said, I always run at 24 cores and let the computer sort itself out. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Taking 14-22hrs out of runtime goes a long way - in al l likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run.Which is exactly what my advice does- it reduces the Runtime for each and every Task, for all projects that the person does. It doesn't just do it for one Project, but for all of them. As i said before & i will say again- your suggestion addresses the symptom, mine fixes what is actually causing the problem. Most people prefer to fix the problem. If you're ok with just fixing the symptom, then so be it. It's obviously true if you actually understand what is going on.But for anyone else that's been reading these posts...Well, that's obviously not true. If you don't understand, then it it's not going to be obvious. One final attempt to point out the obvious- People doing GPU processing have known this for over a decade. If a GPU application requires a CPU core/thread to keep it fed, then losing the output from that core thread running the CPU application in order to support each Task running on the GPU, and you can get 10-20 times more work done from the GPU, and you don't reduce your CPU output by a large amount because they aren't all fighting to for CPU processing time that just isn't available. If you don't reserve that core/thread, your GPU output is way less than it could be, and your CPU output takes a massive dive as well. For the CPU- Needing only 3 hours to do 3 hours worth of work means you will do way more work than it if takes you 12 hours to do 3 hours worth of work- it's that simple. By losing the output of 1 thread, you end up doing 4 times the amount of work on each of the remaining cores/threads. So the amount of work done each day by not using all the cores/threads is many, many times greater than the amount of work done if you try to use all cores/threads for BOINC work on a system that is also doing large amounts of other CPU intensive work. So the statement you claim is not true, is true & factual, as evidenced by the output of thousands of computers over many years of crunching. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Also, running 23 cores allows the os to have its two pennies worth without having to swap out the data for a running WU and then swap it back in again, the WUs can run closer to 100% than the 97/98% they get when running 24 cores.Unfortunately this will just muddy the waters for those that don't understand the issue of an over committd system. What i've been talking about is systems that are doing a lot of non-BOINC work while BOINC is running. Hence the massive difference between Run time & CPU time- for the person that started all this with Denis it was 4 times as long. 4 times... For a system that is lightly used, or just a dedicated cruncher, then using all cores & threads all the time will result in the greatest amount of work being done each day. But if it's heavily used for other things, reserving a thread or 2 will result in a massive increase in output of BOINC work, even with the loss of BOINC output from that thread/ those threads. Grant Darwin NT |
Bill F Send message Joined: 29 Jan 08 Posts: 49 Credit: 1,656,004 RAC: 1,491 |
You two guys are having too much fun to be doing this by your selves, Since we have lots or users with different configuration's I figured that I would help muddy the waters a little more. I you are running Windows 10 or newer some GPU's can do more of there own scheduling without as much CPU involvement.... please see below this is mostly old news. ----------------------------------- On Windows 10 version 2004 Hardware accelerated GPU scheduling was added if your Video card supported it and you had (NVIDIA version 451.48, AMD version 20.5.1 Beta) (or newer) installed. Theory being off loading the CPU with the GPU scheduling that the GPU could do for it's self. You can Google "Hardware accelerated GPU scheduling" https://www.howtogeek.com/756935/how-to-enable-hardware-accelerated-gpu-scheduling-in-windows-11/ Have Fun Bill F |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
You two guys are having too much fun to be doing this by your selves, Since we have lots or users with different configuration's I figured that I would help muddy the waters a little more.Yep, bringing up something that is only tangentially relevant to the discussion at best certainly doesn't help in the slightest. Here we are talking about Scheduling work between different BOINC applications & sharing time with non-BOINC applications. The link you posted to is about Operating System scheduling. The first line of that article- Windows 10 and Windows 11 come with an advanced setting, called Hardware-Accelerated GPU Scheduling, which can boost gaming and video performance using your PC's GPU.It's there to boost your Video card's performance. If you're playing a CPU limited game while running BOINC work in the background, it might provide some very slight benefit, if any. Limiting the number of cores/threads BOINC can use so it doesn't compete with the game would provide much, much more benefit. Using the BOINC settings to suspend BOINC while gaming would be better still. The whole idea behind BOINC was to make use of unused CPU resources, not to try to use them even while they're being used heavily by other applications. As i have been pointing out over & over gin in this discussion, trying to do so results in BOINC not actually getting much work done. And as any gamer would tell you, you don't want anything else running in the background while gaming as it will impact on your gaming experience (no matter how many cores/threads you have, although it probably wouldn't be that much of an issue with Threadrippers & greater. but when people get worked up over the difference between 200 frames per second and 203 frames per sconed...). Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2183 Credit: 41,726,991 RAC: 6,784 |
Taking 14-22hrs out of runtime goes a long way - in all likelihood all the way - to prevent deadlines being missed and panic mode being tripped without changing anything else, while running all the projects the user wanted to run.Which is exactly what my advice does- it reduces the Runtime for each and every Task, for all projects that the person does. It doesn't just do it for one Project, but for all of them. In this case, you're assuming the problem when there's no evidence of it being the one you describe based on the symptom. A while back you rightly pointed out the symptom was one of scheduling. Solve the scheduling issue where Rosetta knowingly misleads Boinc, in the way I described - the end. If you were to take a look at adrianxw's Rosetta tasks (where tbf he only seems to be running Rosetta tasks atm which mislead Boinc in the other direction) and no deadlines are being missed any more so Panic mode won't be arising let alone missing deadlines. nor will they if his tasks are all Beta ones. You also ignore the fact that, with all cores usable to Boinc tasks, the Folding tasks are <additional> to those tasks. I don't know how many Folding tasks run at a time - I assume it's one - so the inefficiency you see in Boinc tasks is entirely taken up by a 9th task running at normal priority on an 8-core machine. How does that pan out? Neither of us know for sure, but I'm going to suggest that almost all of that "inefficiency" disappears by the processing of a 9th task on an 8 core machine. It's obviously true if you actually understand what is going on.But for anyone else that's been reading these posts...Well, that's obviously not true. Having cut off quoting the critical part of what I wrote about how the unutilised cores to Boinc are made use of, this isn't a statement on what I wrote but what I explicitly didn't write, so worthless, One final attempt to point out the obvious- On a different situation no-one's talking about... irrelevant. When someone raises that as their issue, bring it up again. Then it might have a point. |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2025 Credit: 9,943,884 RAC: 6,777 |
Still the same error, again and again ERROR: Error in protocols::cyclic_peptide_predict::SimpleCycpepPredictpplication::set_up_n_to_c_cyclization_mover() function: residue 1 does not have a LOWER_CONNECT. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
...time to end it. I have tried my best i to help you understand, but every point you make shows that you still don't understand what is happening, so it really is time for me to give up once and for all. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.Now up to 24,128, and the Server Staus showing several processes on boinc-process not running. Seems to be nothing but recurring issues with that server lately. Grant Darwin NT |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Now all processes on boinc-process are down and Waiting for Validation is now up to 35,496.Server Status showing all green, but there's a backlog of 12,640 Tasks waiting on Validation at present.Now up to 24,128, and the Server Staus showing several processes on boinc-process not running. Maybe it's gone down in sympathy with the ralph server over on Ralph. It's been down for 4-5 days now. Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 276 Credit: 513,050 RAC: 161 |
Everything is running as of as of 5 Jun 2024, 10:16:46 UTC |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1751 Credit: 18,534,891 RAC: 857 |
Everything is running as of as of 5 Jun 2024, 10:16:46 UTC10 minutes earlier everything on boinc-processes was dead. And the same with the ralph server at Ralph, it's showing life again as well. BTW- check the date time stamp- that's for the Task application data. The server status data is this one- Remote daemon status as of 5 Jun 2024, 10:45:06 UTC It would be good if these things were updated more often. Grant Darwin NT |
kotenok2000 Send message Joined: 22 Feb 11 Posts: 276 Credit: 513,050 RAC: 161 |
They probably rebooted it. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org