Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 280 · 281 · 282 · 283 · 284 · 285 · 286 . . . 315 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109389 - Posted: 19 Jun 2024, 21:01:17 UTC - in response to Message 109387.  

Ooh, 360k tasks. We live to fight another day (or two)

Turned into 3+ days, but we're out again.

While I know most people will have finished up their outstanding tasks already, I managed to sneak 4 extra returned tasks today and now discover that the validators running under boinc-process are down again.
Better now than at other times, I guess
ID: 109389 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109390 - Posted: 20 Jun 2024, 6:15:28 UTC
Last modified: 20 Jun 2024, 6:15:54 UTC

That boinc-process server has developed a habit of regularly falling over, it was well past due for another crash.
Grant
Darwin NT
ID: 109390 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109391 - Posted: 20 Jun 2024, 7:51:27 UTC - in response to Message 109389.  

Ooh, 360k tasks. We live to fight another day (or two)

Turned into 3+ days, but we're out again.

While I know most people will have finished up their outstanding tasks already, I managed to sneak 4 extra returned tasks today and now discover that the validators running under boinc-process are down again.
Better now than at other times, I guess

Or maybe not better now as 660k tasks newly available
ID: 109391 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109396 - Posted: 20 Jun 2024, 20:10:55 UTC - in response to Message 109391.  

Or maybe not better now as 660k tasks newly available


0 wus and a lot of daemons are down....
ID: 109396 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109397 - Posted: 20 Jun 2024, 23:19:40 UTC - in response to Message 109396.  
Last modified: 20 Jun 2024, 23:26:14 UTC

Or maybe not better now as 660k tasks newly available

0 wus and a lot of daemons are down....

Yup. I would've expected 660k to last at least 2 days, but I'm not sure it lasted much more than 15hrs, Unless tasks got pulled.
Front page figures borked on top of boinc-process server borked

Edit: Actually, I'm now thinking tasks did get pulled.

Unvalidated tasks were about 20k before the new batch arrived - now 160k
In progress tasks were about 30k, now 112k
That implies 222k tasks were grabbed

But the front page is locked at 7am with 660k queued, 440k have gone missing, presumed pulled
ID: 109397 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109400 - Posted: 21 Jun 2024, 9:20:01 UTC - in response to Message 109397.  

Or maybe not better now as 660k tasks newly available

0 wus and a lot of daemons are down...

Yup. I would've expected 660k to last at least 2 days, but I'm not sure it lasted much more than 15hrs, Unless tasks got pulled.
Front page figures borked on top of boinc-process server borked

Still the same - now nudged

Edit while posting: site went down, back 5mins later, no apparent change yet but might be shortly (fingers-crossed)
ID: 109400 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109401 - Posted: 21 Jun 2024, 9:51:54 UTC
Last modified: 21 Jun 2024, 9:53:20 UTC

boinc-process server still dead, front page Server Status numbers still not updated (Last update, 07:04 UTC, yesterday).
Grant
Darwin NT
ID: 109401 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109403 - Posted: 21 Jun 2024, 11:48:26 UTC - in response to Message 109401.  

boinc-process server still dead, front page Server Status numbers still not updated (Last update, 07:04 UTC, yesterday).

Add it to the very long list of things I'm completely wrong about... <sigh>
I've asked. We wait.
ID: 109403 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109404 - Posted: 21 Jun 2024, 22:54:26 UTC

Just heard the fans in my system wind up.
Checked BOINC & lo and behold- Rosetta has work again.


Now if they could just get that boinc-process server that's been dead for a while now up and running again then all would be good.
Grant
Darwin NT
ID: 109404 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109405 - Posted: 21 Jun 2024, 23:00:01 UTC - in response to Message 109404.  

Just heard the fans in my system wind up.
Checked BOINC & lo and behold- Rosetta has work again.

Now if they could just get that boinc-process server that's been dead for a while now up and running again then all would be good.

Both you, and this PC were ahead of me.
The rest, still just as you say.

In a way, knowing if there are tasks or not, and whether they give credit or not, or how long they'll last, isn't massively different
ID: 109405 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109406 - Posted: 22 Jun 2024, 1:32:44 UTC

Server Status on the front page is yet to update, but all the servers on the Server Status page are now green and work is still flowing.
Grant
Darwin NT
ID: 109406 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109410 - Posted: 22 Jun 2024, 11:33:54 UTC - in response to Message 109406.  

And now everything is finally back. Currently

As of 22 Jun 2024, 11:02:26 UTC [ Scheduler running ]
Total queued jobs: 1,336,930
In progress: 153,424
Successes last 24h: 91,239

and

Tasks ready to send 4785
Tasks in progress 153988
Workunits waiting for validation 0
Workunits waiting for assimilation 0
ID: 109410 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109411 - Posted: 22 Jun 2024, 21:06:58 UTC - in response to Message 109410.  

And now everything is finally back. Currently

As of 22 Jun 2024, 11:02:26 UTC [ Scheduler running ]
Total queued jobs: 1,336,930
In progress: 153,424
Successes last 24h: 91,239
At last!
And plenty of work as well.

Now things just need to stop falling over in the first place.
Grant
Darwin NT
ID: 109411 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109412 - Posted: 23 Jun 2024, 16:01:19 UTC - in response to Message 109411.  

Now things just need to stop falling over in the first place.

Yes, but also I'd remind everyone of my view
Rosetta Beta 6.04 tasks wrongly default to 3hrs CPU runtime while Rosetta v4.20 rightly default to 8hrs.

So set the Rosetta@home Target CPU Runtime explicitly to 8hrs so that CPU runtime matches what Boinc is told to assume, and not to 'not selected'.

Do more work, get more credits, Boinc schedules more correctly and sooner, batches of tasks issued by Rosetta last longer. Rosetta tasks run out less often. <Everyone> wins.

The alternative is what we have now - no new tasks. Everyone loses.

The more people make this change, the better for everyone, whether that boinc-process server goes down or not
ID: 109412 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109416 - Posted: 26 Jun 2024, 8:50:24 UTC

boinc-process server is dead again, Validation backlog continues to grow.
Grant
Darwin NT
ID: 109416 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109418 - Posted: 26 Jun 2024, 10:06:36 UTC - in response to Message 109416.  

boinc-process server is dead again, Validation backlog continues to grow.
And it's back again.
Grant
Darwin NT
ID: 109418 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109419 - Posted: 26 Jun 2024, 23:18:31 UTC - in response to Message 109418.  
Last modified: 26 Jun 2024, 23:26:47 UTC

boinc-process server is dead again, Validation backlog continues to grow.
And it's back again.

This is getting like my home-life...
"I've lost my xyz"
"You could at least help to look"
"Oh, there it is"
Me: "What was that you said?"

If I play dumb long enough before paying any attention, most things right themselves on their own

Edit: I just reached 40,000,000 on Rosetta
Edit2: And 100,000,000 for my team across all projects
ID: 109419 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109425 - Posted: 2 Jul 2024, 1:35:11 UTC - in response to Message 109412.  

Now things just need to stop falling over in the first place.

Yes, but also I'd remind everyone of my view
Rosetta Beta 6.04 tasks wrongly default to 3hrs CPU runtime while Rosetta v4.20 rightly default to 8hrs.

So set the Rosetta@home Target CPU Runtime explicitly to 8hrs so that CPU runtime matches what Boinc is told to assume, and not to 'not selected'.

Do more work, get more credits, Boinc schedules more correctly and sooner, batches of tasks issued by Rosetta last longer. Rosetta tasks run out less often. <Everyone> wins.

The alternative is what we have now - no new tasks. Everyone loses.

The more people make this change, the better for everyone, whether that boinc-process server goes down or not

Queued jobs down to 153k 3hrs ago, so another shout out for this.
I'm estimating we only have another 12-13hrs of tasks unless more get queued up.
ID: 109425 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2185
Credit: 41,726,991
RAC: 6,784
Message 109426 - Posted: 2 Jul 2024, 20:29:38 UTC - in response to Message 109425.  

Queued jobs down to 153k 3hrs ago, so another shout out for this.
I'm estimating we only have another 12-13hrs of tasks unless more get queued up.

I think we had a few extra Rosetta 4.20 tasks but not many and we're out anyway now
Fingers crossed for another batch
ID: 109426 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109428 - Posted: 3 Jul 2024, 10:16:45 UTC

boinc-process server has died, again.
Grant
Darwin NT
ID: 109428 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 280 · 281 · 282 · 283 · 284 · 285 · 286 . . . 315 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org