Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 274 · 275 · 276 · 277 · 278 · 279 · 280 . . . 316 · Next

AuthorMessage
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 276
Credit: 523,512
RAC: 610
Message 109147 - Posted: 22 Apr 2024, 21:37:26 UTC - in response to Message 109145.  

I hope they will still get points when script runs, because each task would still generate unique data.
ID: 109147 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109148 - Posted: 22 Apr 2024, 22:01:50 UTC - in response to Message 109134.  

I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask. The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.

I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.

I understand the issue better now.
Irrespective of fault, it seems like all Boinc projects are having problems coexisting with Folding@home, evidenced by Grant's comment
And the same issue is happening with your other projects.
Asteroids- 2hrs Runtime,1hr CPU time.
SIdock- 31.5hrs Runtime, 27hrs 40min CPU time.
Denis- 3hr 40min Runtime, 1hr CPU time.

This is only a problem to the extent that tasks miss deadlines, which is what you have, so check these settings in turn:

1. Ensure "at most xx% of CPU time" is set to 100% for all Boinc tasks.
2. You may think Rosetta is set to 8hrs, but every one of your tasks runs to 43,200secs of CPU time, which is 12hrs. So go to your account online and within rosetta@home preferences reaffirm "Target CPU run time" is set explicitly to 8hrs and Update Preferences. Rosetta certainly thinks it's set to 12hrs.
3. If you still can't complete tasks within the deadline, reduce your cache size in Boinc, so you don't download too many tasks to complete before deadline.

I think Point 2 will be the solution.
Rosetta is a bit weird when non-default runtimes are set.
They're all downloaded as if they're 8hrs tasks, but when it gets close to that runtime only then does it adjust the remaining time up toward 12hrs.
So they run 4hrs longer, then projects the size of the rest of the cache as if it will be 8hrs again.
It's been programmed to <not> adjust based on past history. I forget why but I do recall when it was deliberately made to work that way.
ID: 109148 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109149 - Posted: 22 Apr 2024, 22:30:45 UTC - in response to Message 109133.  

This will solve the <entirety> of your problems, while (coincidentally) massively increasing your contribution to <all> the projects you run within your preferred settings.
He's running Folding at home as well.
He asked about this issue 4 years ago and ignored all advice as to how to fix it. He asked about it again about a month or so back, and once again refused to take any advice on how to resolve it.
He just likes to whinge about things he's not prepared to do anything about- ie Look in Task manager to see exactly what processes are using CPU time, and then limiting the number of cores/threads BOINC can use so it's not impacted by those used by Folding.

Ta, I didn't pick up the Folding@home involvement - that explains part of it.
But I do think it's the Target CPU time aspect that's tipping things over the edge - partly because I'm set to 12hr tasks too and it is a bit weird, but I run a small enough cache and only two projects so it never affects me.
The part about using 12hr tasks not changing the projected runtime of the rest of the Rosetta cache is something that was brought in... about 4 years ago.
I'm pretty sure that's not a coincidence.

Which means the the tipping point <is> a Rosetta issue after all
ID: 109149 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109150 - Posted: 23 Apr 2024, 6:59:02 UTC - in response to Message 109134.  
Last modified: 23 Apr 2024, 7:13:33 UTC

I'd like to comment.

I see a problem, a problem that I should not be seeing. I try to make headway to resolve it, so ask.
No you don't, you just ignore what you are told as to how to fix it. Twice now.


The result of asking each time is the same, basically, the BOINC folk tell me the problem is Folding, the Folding folk tell me it is not.
And since it is occurring with a BOINC project- actually all of your BOINC projects, not just this one- might it be somewhat obvious that those of us here doing BOINC work might have some idea of what is actually going on? While those at Folding- unless they do BOINC work as well- won't have the slightest idea of what you are complaining to them about?
And if you had paid the slightest bit of attention to the responses i gave you previously, you would understand what the problem is & how to fix it.


I have set no new tasks at both. I would seem to face a choice, I can support one or the other. Both are important to me.
The third option would be to fix it so that both can co-exist, hundreds (if not a thousand +) of other people have done so.

Twice i have told you what the problem is. Twice i have told you how you could fix the problem.
And twice you have ignored completely everything you were told that would allow you to sort it out.

So, yeah, not doing either of them is probably the best option for you.
Grant
Darwin NT
ID: 109150 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109152 - Posted: 23 Apr 2024, 7:23:57 UTC - in response to Message 109149.  

Which means the the tipping point <is> a Rosetta issue after all
Nope.
If it took 12 hours to do 12 hours of work, there'd be no problem.
But because it takes 24hrs to do 12hrs work, it's a big problem. Even set to 8 hours, it would still take 16hrs, so still Panic mode.
Make it so the CPU isn't over committed, and all would be OK.

His problem is purely down to it taking 2-4 times longer than it should to process any BOINC Tasks, because the CPU is also processing Folding work on the same CPU cores/threads- X cores/threads trying to process X+1 or X+2 applications (that are using 100% of each core/thread) is always going to cause problems. As long as the number of applications being run is equal to or less than the number of cores/threads, all will be well- so limiting the number of cores/threads available to BOINC so Folding has as many as it needs (1, 2, 4 or however many that is) would sort it out.

Of course if "Use at most xx % of CPU time" is anything other than 100%, that would just add to the issues of doing Folding on the same cores/threads as BOINC work (as would any GPU Tasks from BOINC projects that require 1 core/thread per GPU Task being run to support it, and that too can be resolved, although it's more difficult than it needs to be).
Grant
Darwin NT
ID: 109152 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile robertmiles

Send message
Joined: 16 Jun 08
Posts: 1235
Credit: 14,372,156
RAC: 1,319
Message 109153 - Posted: 23 Apr 2024, 12:29:01 UTC

I remember from when I was running Folding@Home also that Folding@Home expects to use entire CPU cores, not just the available threads in that CPU core. An easy way to handle this is to start the Folding@Home program at least a full minute before starting any BOINC program.
ID: 109153 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2030
Credit: 10,081,714
RAC: 12,131
Message 109154 - Posted: 23 Apr 2024, 12:39:09 UTC - in response to Message 109145.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.


Good for you.
I have a lot of "cancelled by the server"
ID: 109154 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 16 Jun 07
Posts: 28
Credit: 3,344,955
RAC: 13,924
Message 109155 - Posted: 23 Apr 2024, 16:01:27 UTC - in response to Message 109154.  
Last modified: 23 Apr 2024, 16:01:53 UTC


I have a lot of "cancelled by the server"


same here, i lost a hundred tasks :(
i crunch for Ukraine. Join our team forums about Rosetta@home
ID: 109155 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Jean-David Beyer

Send message
Joined: 2 Nov 05
Posts: 202
Credit: 6,883,028
RAC: 10,853
Message 109157 - Posted: 23 Apr 2024, 17:36:12 UTC - in response to Message 109154.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.


Good for you.
I have a lot of "cancelled by the server"


So do I (although mostly they run OK). There seems to be something wrong with the server. It sends out a task, and before it returns its result or times out it sends the same one to me. Then the first user returns the result, and mine gets cancelled. Just plain sloppy.
ID: 109157 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109159 - Posted: 24 Apr 2024, 0:31:42 UTC - in response to Message 109152.  

Which means the the tipping point <is> a Rosetta issue after all
Nope.
If it took 12 hours to do 12 hours of work, there'd be no problem.
But because it takes 24hrs to do 12hrs work, it's a big problem. Even set to 8 hours, it would still take 16hrs, so still Panic mode.
Make it so the CPU isn't over committed, and all would be OK.

His problem is purely down to it taking 2-4 times longer than it should to process any BOINC Tasks, because the CPU is also processing Folding work on the same CPU cores/threads- X cores/threads trying to process X+1 or X+2 applications (that are using 100% of each core/thread) is always going to cause problems. As long as the number of applications being run is equal to or less than the number of cores/threads, all will be well- so limiting the number of cores/threads available to BOINC so Folding has as many as it needs (1, 2, 4 or however many that is) would sort it out.

Of course if "Use at most xx % of CPU time" is anything other than 100%, that would just add to the issues of doing Folding on the same cores/threads as BOINC work (as would any GPU Tasks from BOINC projects that require 1 core/thread per GPU Task being run to support it, and that too can be resolved, although it's more difficult than it needs to be).

I don't completely agree.
It's not just that a 12hr task (that Rosetta only shows Boinc as 8hrs for the bulk of its run) is taking 20-32hrs to complete, it's that the next tasks in the cache are showing 8hrs to Boinc but will also take 20-32hrs too.
Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks.
14-22hrs less processing time to complete tasks will make a huge difference to whether Panic mode arises. I'd guess <all> the difference.
This is only an issue if the cache is set above a day. It can be made to work by ensuring Rosetta tasks only run for the time Adrian already thought they were set to (8hrs rather than 12hrs they actually run for).

It can certainly be solved your way, but that gets a bit fiddly imo and doesn't resolve the confusion Rosetta runtime introduces.
I'd rather my solution if I were him too, especially if RAM and disk space don't come into the equation.
And we already know Adrian didn't like your solution, so let's see what he thinks of my alternative. It's entirely up to him.
ID: 109159 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109160 - Posted: 24 Apr 2024, 0:42:12 UTC - in response to Message 109154.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.

Good for you.
I have a lot of "cancelled by the server"

It was a very early call - in the first few hours.
In the end I had 13 cancelled by the server, none of which had started to run.
However, I did have 1 task that ran to completion, but came up with a validate error because the previous host reported it late.
On balance, it could've been a lot worse on a 16-thread machine. I'll live with it.
ID: 109160 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109161 - Posted: 24 Apr 2024, 0:49:35 UTC - in response to Message 109157.  

I've got 15 tasks returned after deadline and they've all validated and credited.
I have a further 6 awaiting validation.

Just checking further, the tasks I returned after deadline have been reissued to other users 10 minutes before I returned them.
One of the reissues has been cancelled by the Server. The others haven't.

Good for you.
I have a lot of "cancelled by the server"

So do I (although mostly they run OK). There seems to be something wrong with the server. It sends out a task, and before it returns its result or times out it sends the same one to me. Then the first user returns the result, and mine gets cancelled. Just plain sloppy.

It's a consequence of the whole site being down.
It seems like, once the site came back up, it timed-out tasks that missed deadline straight away and reissued them, but the host didn't re-poll the server until it's timer ran out - could've been 4-5hrs after the site came back up - to report they were completed.
It's just unfortunate.
ID: 109161 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109162 - Posted: 24 Apr 2024, 6:28:59 UTC - in response to Message 109159.  

Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks.
14-22hrs less processing time to complete tasks will make a huge difference to whether Panic mode arises.
All that does is stop Panic mode from occurring most of the time- there will still be times where it does occur (because of all the other projects all taking longer to complete their Tasks than they expect to as well).
Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.


It can certainly be solved your way, but that gets a bit fiddly imo and doesn't resolve the confusion Rosetta runtime introduces.
How is it fiddly?
I'm changing one value, and fixing the cause of the problem (over committed CPU).
You're changing one value, and fixing the symptom (Panic mode occuring).

In both cases, only one value needs to be changed.
Although it does require some thought to fix the problem, to determine what % "Use at most..." should be set to.
87% leaves 1 core/thread free for non-BOINC work (7/8=0.875).
75% leaves 2 cores/threads free for non-BOINC work (6/8=0.75).

Not really a big effort required IMHO.
Grant
Darwin NT
ID: 109162 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109169 - Posted: 25 Apr 2024, 5:14:48 UTC
Last modified: 25 Apr 2024, 5:24:56 UTC

And once again we've got problems.
The Validators & Assimilators are down, so the backlog of that work continues to pile up. And if it backs up enough, then the disks end up full & things crash and fall over all over again.


Edit- looks like they're all on the one server- boinc-process
Grant
Darwin NT
ID: 109169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109172 - Posted: 25 Apr 2024, 10:00:17 UTC - in response to Message 109162.  

Changing the target runtime back to 8hrs, even with the folding@home contention, will take 7-11hrs out of the running tasks and a further 7-11hrs out of the cached tasks.
14-22hrs less processing time to complete tasks will make a huge difference to whether Panic mode arises.
All that does is stop Panic mode from occurring most of the time- there will still be times where it does occur (because of all the other projects all taking longer to complete their Tasks than they expect to as well).

Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.
It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
Missing deadlines has all sorts of consequences both sides of the server divide. Meeting deadlines has none.

For some reason I now want to quote Mr Micawber from Charles Dickens' David Copperfield:
“Annual income twenty pounds, annual expenditure nineteen pounds, nineteen and six, result happiness.
Annual income twenty pounds, annual expenditure twenty pounds ought and six, result misery”
Point being, the detail isn't relevant as long as you succeed.

Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.

First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing.
Second, that it's any business of the user as long as the computer doesn't crash and completes its work successfully and within the envelope of time allowed.
If the user is happy for more tasks to be running simultaneously, outside of their individual planned time, but still within the overall deadline, that's entirely up to them.
Your alternative being a smaller number of tasks run for each project, but with a core/thread dedicated to them, which is fine but will fall flat when there's a lack of task availability.
It's a choice. I recognise it, but I wouldn't personally opt for your one either.
ID: 109172 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2198
Credit: 41,930,465
RAC: 17,266
Message 109173 - Posted: 25 Apr 2024, 10:18:19 UTC

I was going to edit the last post, but decided it's worth a new message.

I notice adrianxw hasn't reappeared here to comment, so I looked at his tasks and he's taken Rosetta off "no new tasks".
I believe he's now set Target Run Time to the default. Not to 8hrs explicitly, but the default. That is "Not Selected".

However, his completed tasks now run for ~10,800secs rather than 43,200secs, taking ~15,000secs rather than ~112,000secs.
This will definitely provide a solution for him imo. Fine.

At some point somewhere - and quite recently - Rosetta's default appears to have changed to 3hrs, meaning tasks get completed and used up far more quickly than intended.
And I'm not sure about this, but I think Boinc is forced to assume and schedule Rosetta tasks to run for 8hrs, which is now not right.

Can people check what they have set up?
Is it 8 hrs or "Not Selected"?
Do tasks run for 8hrs or 3 for "Not Selected"?
I believe it's the latter.
What does Boinc assume runtime will be at download?

Somethings gone wrong imo.
ID: 109173 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 276
Credit: 523,512
RAC: 610
Message 109174 - Posted: 25 Apr 2024, 10:39:06 UTC

It sets 8 hours for 4.20 and 3 for 6.05
ID: 109174 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Link
Avatar

Send message
Joined: 4 May 07
Posts: 356
Credit: 382,349
RAC: 0
Message 109175 - Posted: 25 Apr 2024, 10:45:50 UTC - in response to Message 109172.  

Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.
Eliminating the reason for the panic mode is the entire solution, everything else is a workaround, which might fail as soon as something changes (new WU type, new project, whatever) or even before.


It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
It's not just not pretty, highly overcommiting the system might slow down the overall production, in particular with hyperthreading CPUs many people leave 1-2 theads for non-BOINC stuff.
.
ID: 109175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1758
Credit: 18,534,891
RAC: 388
Message 109176 - Posted: 25 Apr 2024, 11:07:33 UTC - in response to Message 109172.  

Given panic-mode means Boinc realises tasks can't be completed within deadline, preventing Panic mode occurring is the entire solution.
No Panic mode doesn't mean they won't be completed. It means there is a high risk of not being completed if not processed immediately.
Which fixing the overcommitted CPU does resolve.


It may not be pretty in that processes are sharing cores, but imo no-one in their right mind cares which bit of a process of which task runs at what time as long as 1) the CPUs are being fully utilised and 2) tasks complete within deadline without further manual intervention.
No one in their right mind would think taking 12 hours to 6 hours work is good (which is double the time required- on another project it's taking them 4 times as long).


Stopping the sharing of cores & threads will fix the actual problem, not just the symptoms.
First, I dispute the sharing of cores & threads is a) a problem and b) one that needs fixing.
You may dispute that, but it doesn't make it any less true.
And it needs fixing because the poster keeps complaining about it. If they don't complain about it, then no it doesn't need fixing.


If the user is happy for more tasks to be running simultaneously, outside of their individual planned time, but still within the overall deadline, that's entirely up to them.
Yep.
But in this case it May cause problems with deadlines, resulting in Panic Mode, which the poster has an issue with, so it is an issue that should be addressed.
Why fix the symptom, when fixing the problem would result in more work being done- even with less cores/threads available to BOINC, the amount of work done for BOINC would be almost triple what it presently is.


Your alternative being a smaller number of tasks run for each project, but with a core/thread dedicated to them, which is fine but will fall flat when there's a lack of task availability.
Why would you think that?????
All my setting does is stop 9 things, or 10 things or more from trying to run on 8 cores/threads at the same time. It does not in any way stop cores/threads from being used by different projects at the same time. What it does stop is BOINC from trying to use cores/threads that are being heavily used by non BOINC applications.
If there are 10 projects with work, or only one, all available cores/threads will be used.
Grant
Darwin NT
ID: 109176 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
tazzduke

Send message
Joined: 2 Jul 09
Posts: 2
Credit: 1,234,765
RAC: 0
Message 109177 - Posted: 25 Apr 2024, 11:25:45 UTC - in response to Message 109176.  

Greetings,

Well I have 3 systems running at the moment, all using the default location in preferences with target cpu set to 2hrs

Ryzen 5700x (#1) 8c/16t - only using 8c, cpu times are averaging 3hrs (Win11)

Ryzen 5700x (#2) 8c/16t - only using 8c, cpu times are averaging 2hrs (Win11)

Dual Xeon E5-2470v2 (20c/40t) - only using 8c, cpu times are averaging 2hrs (Linux Mint 21.3) also LHC using some cores as well.

I have set work fetch preferences to 0.1 days & 0.1 days, which keeps a small amount of workunits in cache on each machine, its how I like it.

But as Grant has already mentioned, the validators are still offline, as pendings are growing.

I also fine tuned the core usage on these machines, as I have app_config files in each project, cause sometimes I am running various other projects at sometimes, again my preference only.

When pushing hard on some projects and I start using the hyperthreads, I still as a rule, leave 2 threads in reserve for each cpu, for the OS and GPU to use, again my preference only.

Hope you have a good day :-)
ID: 109177 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 274 · 275 · 276 · 277 · 278 · 279 · 280 . . . 316 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org