Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 281 · 282 · 283 · 284 · 285 · 286 · 287 . . . 315 · Next

AuthorMessage
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109429 - Posted: 3 Jul 2024, 12:21:46 UTC - in response to Message 109428.  

boinc-process server has died, again.


Once a week, approximately....
ID: 109429 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 16 Jun 07
Posts: 28
Credit: 3,169,006
RAC: 4,335
Message 109430 - Posted: 3 Jul 2024, 17:24:15 UTC - in response to Message 109429.  

While there are no Rosetta tasks, you can crunch some Ralph nvidia GPU tasks (1000 available at this moment https://ralph.bakerlab.org/server_status.php) and help accelerate release of GPU app to Rosetta!
i crunch for Ukraine. Join our team forums about Rosetta@home
ID: 109430 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109431 - Posted: 3 Jul 2024, 18:16:28 UTC - in response to Message 109428.  

boinc-process server has died, again.

I didn't notice again and, now I look, it's back.
Maybe I should look more often.
Or you should look less often...

The last of my Rosetta tasks are running now, showing the benefit of ensuring all my runtimes are at least 8hrs rather than the 3hr mistake Rosetta Beta tasks are set to
ID: 109431 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109432 - Posted: 3 Jul 2024, 18:36:22 UTC - in response to Message 109430.  

While there are no Rosetta tasks, you can crunch some Ralph nvidia GPU tasks (1000 available at this moment https://ralph.bakerlab.org/server_status.php) and help accelerate release of GPU app to Rosetta!

I just did.
And then remembered the minimum 5Gb (6Gb) req't for RAM on my Video Card, which only has 4Gb... <sigh>
ID: 109432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109433 - Posted: 3 Jul 2024, 18:56:42 UTC - in response to Message 109432.  

And then remembered the minimum 5Gb (6Gb) req't for RAM on my Video Card, which only has 4Gb... <sigh>


I also have a 4gb gpu....and it's AMD :-(
ID: 109433 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rilian
Avatar

Send message
Joined: 16 Jun 07
Posts: 28
Credit: 3,169,006
RAC: 4,335
Message 109434 - Posted: 4 Jul 2024, 15:34:06 UTC - in response to Message 109432.  

While there are no Rosetta tasks, you can crunch some Ralph nvidia GPU tasks (1000 available at this moment https://ralph.bakerlab.org/server_status.php) and help accelerate release of GPU app to Rosetta!

I just did.
And then remembered the minimum 5Gb (6Gb) req't for RAM on my Video Card, which only has 4Gb... <sigh>

prev batch required 6gb, current batch 5gb, who knows maybe next batch will be 4gb :) so keep the project active :)
i crunch for Ukraine. Join our team forums about Rosetta@home
ID: 109434 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109435 - Posted: 7 Jul 2024, 19:02:39 UTC

New tasks came down about an hour ago
ID: 109435 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 411
Credit: 12,359,416
RAC: 3,742
Message 109440 - Posted: 8 Jul 2024, 18:00:14 UTC - in response to Message 109435.  

New tasks came down about an hour ago


Sadly, still with the lower connect error
ID: 109440 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Landjunge

Send message
Joined: 15 Jan 08
Posts: 1
Credit: 11,470,328
RAC: 31
Message 109441 - Posted: 8 Jul 2024, 21:01:29 UTC - in response to Message 109434.  
Last modified: 8 Jul 2024, 21:02:03 UTC

While there are no Rosetta tasks, you can crunch some Ralph nvidia GPU tasks (1000 available at this moment https://ralph.bakerlab.org/server_status.php) and help accelerate release of GPU app to Rosetta!

I just did.
And then remembered the minimum 5Gb (6Gb) req't for RAM on my Video Card, which only has 4Gb... <sigh>

prev batch required 6gb, current batch 5gb, who knows maybe next batch will be 4gb :) so keep the project active :)


i had no problem running two ralph's in parallel on a 8gb rtx3070.
ID: 109441 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109442 - Posted: 9 Jul 2024, 4:51:27 UTC - in response to Message 109440.  

Sadly, still with the lower connect error


+1
ID: 109442 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109444 - Posted: 9 Jul 2024, 7:45:12 UTC - in response to Message 109442.  
Last modified: 9 Jul 2024, 7:49:06 UTC

Sadly, still with the lower connect error

+1

I've had one.
CPU runtime 2 seconds
Even clicking reply, typing +1, then clicking send takes more time, let alone the time taken checking if I had any
I can't bring myself to care, let alone mention it

In the meantime, the whole site went down for a few hours, in which time Boinc decided to bring down 21 WCG tasks I didn't really want to have in my cache, which I consider a waste of time even of it will keep my PC occupied
ID: 109444 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109445 - Posted: 9 Jul 2024, 8:15:41 UTC - in response to Message 109444.  
Last modified: 9 Jul 2024, 8:16:16 UTC

Even clicking reply, typing +1, then clicking send takes more time, let alone the time taken checking if I had any
I can't bring myself to care, let alone mention it


Is there a remote hope that someone of the team reads, before or later, the forum and take a solution for an old bug??
A hope, remote hope...
ID: 109445 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109447 - Posted: 10 Jul 2024, 5:18:45 UTC - in response to Message 109444.  

Even clicking reply, typing +1, then clicking send takes more time, let alone the time taken checking if I had any
I can't bring myself to care, let alone mention it


And when you have over 30 wus bugged in the last 5 hrs, what do yo do?
ID: 109447 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109448 - Posted: 11 Jul 2024, 0:53:33 UTC - in response to Message 109445.  

Even clicking reply, typing +1, then clicking send takes more time, let alone the time taken checking if I had any
I can't bring myself to care, let alone mention it

Is there a remote hope that someone of the team reads, before or later, the forum and take a solution for an old bug??
A hope, remote hope...

After a few years now, I think we can be certain the answer is a firm no.

I was taken by a reply I had (in the days when I was being replied to - also years ago) when a lot higher proportion of tasks were getting rejected and, rather than delete the offending tasks, because they only ran for 15-20secs of CPU time, was to let them run and error out because even 30 tasks would only be 5-600secs of CPU time (actually core time, so divide by the number of cores for actual seconds of CPU time) and that was several orders of magnitude less work than coding some way of deleting them before they went out. During which exercise, a lot of good tasks would be taken out at the same time, so it was counterproductive in a multitude of ways.

And that's what happened.

The same applies here. No-one in their right mind would do any different.

The only real problem is the amount of time wasted complaining about it.

Tbh, I think it's exactly the same reason why <I> stopped getting replies. A complete waste of time and effort.
So, if you feel bad about my reply here, take a moment to think about my situation...

Meanwhile, boinc-process server is down again - no validation going on right now - 200k waiting in the queue
ID: 109448 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109449 - Posted: 11 Jul 2024, 8:41:18 UTC - in response to Message 109448.  

Meanwhile, boinc-process server is down again - no validation going on right now - 200k waiting in the queue
Almost 300k now.
Grant
Darwin NT
ID: 109449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109450 - Posted: 11 Jul 2024, 18:11:04 UTC - in response to Message 109449.  

Almost 300k now.
Almost 400k now.
Grant
Darwin NT
ID: 109450 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109451 - Posted: 11 Jul 2024, 18:41:07 UTC - in response to Message 109448.  

So, if you feel bad about my reply here, take a moment to think about my situation...


I'm not bad about your reply, I'm sorry for your pessimism.
I continue to think that if a software is bugged, it's good thing to advice developers.

Don't they read it? Too bed for them.
ID: 109451 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109452 - Posted: 12 Jul 2024, 0:39:04 UTC - in response to Message 109450.  

Almost 300k now.
Almost 400k now.

I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly
ID: 109452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109453 - Posted: 12 Jul 2024, 1:02:44 UTC - in response to Message 109451.  

So, if you feel bad about my reply here, take a moment to think about my situation...

I'm not bad about your reply, I'm sorry for your pessimism.
I continue to think that if a software is bugged, it's good thing to advice developers.

Don't they read it? Too bad for them.

I'm not a coder of any kind, but the impression I get is that it's an error-trapping issue rather than a bug (you could say that's the same thing, I accept).
The impression I get (but may be very wrong) is that tasks are seeded randomly, but don't double-check if the random seed is out of bounds so it can be re-seeded, and errors out as a result.
It's a <perfect>, even if ugly, solution.
It happens so rarely and with such little consequence (wasted CPU time is approx zero) that it's not worth the effort to correct among a batch somewhere around a million tasks.
The rest give them the results they need.

It may offend from a user pov, but I think from a researcher pov it's neither here nor there.
It's very likely they <do> know. It just doesn't matter.
And, as always, we're here for the project's needs. They don't exist for ours.

The tail has never wagged the dog at this project - unlike many other projects.
That's been made very clear to me. It's not pessimism on my part, but realism.
I don't need to be told twice, even if others need to be told ten or twenty times and still not take the hint.
I know that sounds harsh, but I don't know how else to say it.
ID: 109453 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109454 - Posted: 13 Jul 2024, 12:33:58 UTC - in response to Message 109452.  

Almost 300k now.
Almost 400k now.

I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly

I'm just in the final stages of clearing down all the excess WCG tasks Boinc brought down from the previous Rosetta outage and we're out of Rosetta tasks again.
So frustrating...
ID: 109454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 281 · 282 · 283 · 284 · 285 · 286 · 287 . . . 315 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org