Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 282 · 283 · 284 · 285 · 286 · 287 · 288 . . . 315 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109458 - Posted: 16 Jul 2024, 6:58:01 UTC - in response to Message 109454.  

I'm just in the final stages of clearing down all the excess WCG tasks Boinc brought down from the previous Rosetta outage and we're out of Rosetta tasks again.
So frustrating...

A relatively small number of tasks available - showing 230k on the front page 3hrs ago. Hopefully part of more, but may not be.
It's something
ID: 109458 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109459 - Posted: 16 Jul 2024, 10:53:31 UTC - in response to Message 109458.  
Last modified: 16 Jul 2024, 10:53:42 UTC

A relatively small number of tasks available - showing 230k on the front page 3hrs ago. Hopefully part of more, but may not be.
It's something


Snd seems a new kind of simulations: "testmpnn_hallucinated" and "testmpnn_diffusion"
ID: 109459 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109460 - Posted: 16 Jul 2024, 12:19:37 UTC - in response to Message 109459.  

A relatively small number of tasks available - showing 230k on the front page 3hrs ago. Hopefully part of more, but may not be.
It's something

Snd seems a new kind of simulations: "testmpnn_hallucinated" and "testmpnn_diffusion"

Yup - wonder what that's all about.
A few more tasks becoming available too - still not a great amount. Showing 475k an hour ago on the front page.
Every little bit helps
ID: 109460 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109461 - Posted: 16 Jul 2024, 12:45:26 UTC - in response to Message 109460.  

Snd seems a new kind of simulations: "testmpnn_hallucinated" and "testmpnn_diffusion"

Yup - wonder what that's all about.


Maybe related to "message-passing neural networks" (mpnn), like this
ID: 109461 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
kotenok2000
Avatar

Send message
Joined: 22 Feb 11
Posts: 276
Credit: 513,050
RAC: 161
Message 109462 - Posted: 16 Jul 2024, 12:53:13 UTC

Graphics work with these tasks.
ID: 109462 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109463 - Posted: 17 Jul 2024, 9:17:49 UTC - in response to Message 109452.  

I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly


Now the server are green, but there are over 18k wu pending validation. Increasing.
ID: 109463 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109464 - Posted: 17 Jul 2024, 9:18:27 UTC - in response to Message 109462.  

Graphics work with these tasks.


And also wus seems ok, no errors despite the name "test"....
ID: 109464 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109465 - Posted: 17 Jul 2024, 12:31:42 UTC - in response to Message 109461.  

Snd seems a new kind of simulations: "testmpnn_hallucinated" and "testmpnn_diffusion"

Yup - wonder what that's all about.

Maybe related to "message-passing neural networks" (mpnn), like this

Very likely. Thanks for the link - looks like good work.
ID: 109465 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109466 - Posted: 17 Jul 2024, 12:34:28 UTC - in response to Message 109463.  

I think it went to almost 500k, but I took a look at 20:35 UK time just as parts of boinc-process came back online and after a refresh it was all back
A glance now (01:38 UK time) and it shows 266k, so it's coming down slowly

Now the server are green, but there are over 18k wu pending validation. Increasing.

Now pink - boinc-process is down again and 56k awaiting validation.
And not too many tasks left to come down either.
We continue to be very hand-to-mouth atm
ID: 109466 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109468 - Posted: 18 Jul 2024, 9:48:02 UTC
Last modified: 18 Jul 2024, 9:49:50 UTC

Some more work would be nice.
It's been freezing the last few mornings here, and the system has been keeping the lounge room almost comfortable.

Buit now it's out of work, and tomorrow morning if more work doesn't come along, it'll be almost as cold inside as it is outside (or an upgraded version over at Ralph & some new work there would be nice- either this or that, or even both would be nice).
Grant
Darwin NT
ID: 109468 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile G.L.I.S.
Avatar

Send message
Joined: 25 Dec 08
Posts: 26
Credit: 2,450,252
RAC: 2,483
Message 109469 - Posted: 18 Jul 2024, 9:55:19 UTC

Still... 'completed awaiting validation'...
More credits gone, along with electricity and time?
ID: 109469 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109470 - Posted: 18 Jul 2024, 10:28:17 UTC - in response to Message 109468.  

Some more work would be nice.
It's been freezing the last few mornings here, and the system has been keeping the lounge room almost comfortable.

But now it's out of work, and tomorrow morning if more work doesn't come along, it'll be almost as cold inside as it is outside (or an upgraded version over at Ralph & some new work there would be nice- either this or that, or even both would be nice).

Your post made me look for the first time exactly where Darwin is and, after checking on a weather site, discover your winter is still 2-4C higher than this English summer.
My sympathies are therefore quite limited, as well as thinking it's a rather inefficient way to heat the house.

While I understand and accept your reasoning for keeping a tight cache, I can only repeat my advice to change from setting a default runtime at Rosetta, which turns out to be only 3hrs, to making it explicitly 8hrs to match what Boinc thinks it is (at the point of download anyway). Not only would you get an extra 5hrs work, you would reduce your churn through tasks by almost two-thirds, marginally extending how long each batch of tasks will last, which is valuable when we see each batch run out before further tasks become available.

To emphasise the difference between me and you, I keep a 0.5 plus 0.1 cache and set a 12hr runtime.
So when I have 4-5hrs of tasks remaining, I already have 16 tasks (8C16T) cued up and another 16 can come down, which works out at 28-29hrs of work when Rosetta runs out.
At an 8hr runtime, this would still be 20-21hrs.
As compared to your maximum of 3hrs work while trying to gobble up tasks only at the last minute.
The difference is huge for one host and, the more people who make the runtime change I suggest, the longer batches of tasks would last and the shorterfewer periods without any on the whole site.

This is why I keep repeating myself. Everyone should do both yourselves and everyone else a favour imo,
ID: 109470 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109471 - Posted: 18 Jul 2024, 10:33:11 UTC - in response to Message 109469.  

Still... 'completed awaiting validation'...
More credits gone, along with electricity and time?

All credits do get caught up once the server is restarted. No time or energy lost.
Just a hiccup in when they get awarded which might take a day or two at most (but might also be just a few hours)
ID: 109471 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109472 - Posted: 19 Jul 2024, 6:39:57 UTC - in response to Message 109471.  

Just a hiccup in when they get awarded which might take a day or two at most (but might also be just a few hours)


Well, not so few hours...
ID: 109472 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109473 - Posted: 20 Jul 2024, 1:02:48 UTC - in response to Message 109472.  

Just a hiccup in when they get awarded which might take a day or two at most (but might also be just a few hours)

Well, not so few hours...

I looked earlier today. I think it came back about 10hrs after your post, so between 1 & 2 days.
I think I saw the whole site go down (again) a few hours before too.
Everything seems so fragile.
No new tasks yet, but I've picked up a few resends through the day
ID: 109473 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Ace Casino

Send message
Joined: 16 Jul 07
Posts: 18
Credit: 14,827,983
RAC: 17,320
Message 109475 - Posted: 21 Jul 2024, 19:01:59 UTC - in response to Message 109468.  

Try putting another shrimp on the barbie.
ID: 109475 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tom M

Send message
Joined: 20 Jun 17
Posts: 98
Credit: 17,749,090
RAC: 42,645
Message 109477 - Posted: 23 Jul 2024, 18:17:18 UTC - in response to Message 109475.  

Try putting another shrimp on the barbie.


And wined the Ken up?
Help, my tagline is missing..... Help, my tagline is......... Help, m........ Hel.....
ID: 109477 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1751
Credit: 18,534,891
RAC: 857
Message 109478 - Posted: 24 Jul 2024, 11:49:22 UTC

I wonder if they've got data centre issues?

Server Status page shows, well, next to nothing (although Ralph still shows everything's ok).
Grant
Darwin NT
ID: 109478 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2025
Credit: 9,943,884
RAC: 6,777
Message 109479 - Posted: 24 Jul 2024, 14:12:59 UTC - in response to Message 109478.  

I wonder if they've got data centre issues?
Server Status page shows, well, next to nothing (although Ralph still shows everything's ok).


Well, it's summer
The servers went on vacation
ID: 109479 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2183
Credit: 41,726,991
RAC: 6,784
Message 109481 - Posted: 24 Jul 2024, 23:10:31 UTC - in response to Message 109478.  

'Project down for maintenance' messages being issued for over 24hrs
While all tasks are pretty much completed this sounds like the best time
Servers being randomly up and down over a considerable period, it does need a thorough going over
Let's hope they find and resolve everything...
...and have a whole bunch of tasks waiting for us on completion

I can dream
ID: 109481 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 282 · 283 · 284 · 285 · 286 · 287 · 288 . . . 315 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org