Problems and Technical Issues with Rosetta@home

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home

To post messages, you must log in.

Previous · 1 . . . 295 · 296 · 297 · 298 · 299 · 300 · 301 . . . 313 · Next

AuthorMessage
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109960 - Posted: 2 Nov 2024, 2:54:03 UTC - in response to Message 109959.  

Boinc-process lives
Till the next time.

It'd be nice if they got the main page Server Status info updating again, but if it's one or the other then it's better having the Validators running while there is work available.

Definitely. Waiting to validate has edged fractionally down to 662,008 on the Server Status page, but I'm definitely seeing more tasks than that validated - out of order for some reason but they all count.
I've got a full cache, but I'm manually polling anyway to see my credits going up each time.
These are our salad days (hours anyway)
ID: 109960 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109961 - Posted: 2 Nov 2024, 5:21:00 UTC
Last modified: 2 Nov 2024, 5:21:20 UTC

And the main page Server Status is updating again.

The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either.
Hopefully they'll start putting a dent in the backlog over the next few hours. Once that happens it shouldn't take long to then clear the backlog; but at present all they're doing is treading water.
Grant
Darwin NT
ID: 109961 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109962 - Posted: 2 Nov 2024, 7:35:55 UTC - in response to Message 109961.  

The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either.
Hopefully they'll start putting a dent in the backlog over the next few hours. Once that happens it shouldn't take long to then clear the backlog; but at present all they're doing is treading water.

This is true. Several hours later, the 663k backlog is now 659k.
But my team's unvalidated tasks are up from 120 to 132.
It seemed a fair few were being validated at the start, but now not many more have been since.
If it takes 2 or 3 days to notice the entire server is down I'm not convinced anyone will notice at all that the validation backlog is barely reducing.
It may take until new tasks run out or, more likely, for boinc-process to fail again, take another few days, then get re-restarted to improve matters...
ID: 109962 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109963 - Posted: 2 Nov 2024, 10:03:11 UTC - in response to Message 109962.  

The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either.
Hopefully they'll start putting a dent in the backlog over the next few hours. Once that happens it shouldn't take long to then clear the backlog; but at present all they're doing is treading water.

This is true. Several hours later, the 663k backlog is now 659k.
Now it's up to 683k.

I'm hoping that it's just a case of a messy crash of the server, and it's just re-building/verifying it's storage. In which case it could take a day or so to complete, during which performance is significantly degraded. And once done, the backlog will clear like it usually does in an hour or 2.
Or there is something still seriously wrong and the backlog will continue to grow slowly until the current batch of work runs out & the work being returned tapers off (or the server just crashes yet again, and the backlog climbs rapidly like before).
Grant
Darwin NT
ID: 109963 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109965 - Posted: 2 Nov 2024, 12:24:07 UTC - in response to Message 109963.  

The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either.
Hopefully they'll start putting a dent in the backlog over the next few hours. Once that happens it shouldn't take long to then clear the backlog; but at present all they're doing is treading water.

This is true. Several hours later, the 663k backlog is now 659k.
Now it's up to 683k.

I'm hoping that it's just a case of a messy crash of the server, and it's just re-building/verifying it's storage. In which case it could take a day or so to complete, during which performance is significantly degraded. And once done, the backlog will clear like it usually does in an hour or 2.
Or there is something still seriously wrong and the backlog will continue to grow slowly until the current batch of work runs out & the work being returned tapers off (or the server just crashes yet again, and the backlog climbs rapidly like before).

While we're guessing, I now note that when I've uploaded completed tasks I'm not seeing any change in credits so, despite what the server status page shows, the continuing buildup to 699k is because validation has stopped altogether, not just slowed.
While all servers show green/running I don't know what other trigger there'll be so someone notices, because it isn't even noticed when they're all red.
We could be waiting some while.

So, the new prediction game is: what will the validation backlog peak at? 1m? 1.2m? 1.5m?
ID: 109965 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2014
Credit: 9,842,442
RAC: 4,053
Message 109967 - Posted: 2 Nov 2024, 17:55:44 UTC - in response to Message 109965.  

While all servers show green/running I don't know what other trigger there'll be so someone notices, because it isn't even noticed when they're all red.
We could be waiting some while.

So, the new prediction game is: what will the validation backlog peak at? 1m? 1.2m? 1.5m?


Maybe a solution is to stop the wus generator and stop download/upload until the validation queue is clear...
ID: 109967 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109968 - Posted: 2 Nov 2024, 19:51:15 UTC
Last modified: 2 Nov 2024, 20:27:25 UTC

Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour).
It's taken 16 hours since the Validators were restarted, but we're starting to get some significant falls in the backlog- and looking at my systems pendings, they've actually started to drop too.
*fingers crossed*



Maybe a solution is to stop the wus generator and stop download/upload until the validation queue is clear...
Stopping the return of completed work, you get a massive surge of returned results awaiting on Validation when it's re-enabled (instead of 10k per hour you're looking at 100k or more per hour), and if they're still not working properly, you get an instant backlog & log jam.
Stopping new work from being sent would be the most effective method- as caches clear then the amount returned per hour tapers off. When work is re-enabled, the returned per hour gradually builds up again. No sudden massive surge.
Grant
Darwin NT
ID: 109968 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
OffDutyTaoist

Send message
Joined: 10 Oct 06
Posts: 3
Credit: 1,998,088
RAC: 24
Message 109969 - Posted: 2 Nov 2024, 22:52:27 UTC

My Pixel 6 is having issues again with Rosetta v4.20 arm-android-linux-gnu.

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_05_2997716_402_1

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_10_2997716_399_1

rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_12_2997716_399_1

All were running at the same time, got up to ~0.319% then started resetting my phone. I paused all of them and tried running them individually, and all three would do the same thing when ran separately. I ended up having to abort all them and suspend the project unless someone has an idea.
ID: 109969 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109970 - Posted: 2 Nov 2024, 23:14:07 UTC - in response to Message 109969.  

You've managed to complete 2 of those Tasks on that phone, but it's taking 10.5 hours to do 7.5 hours of work, which indicates that the phone is busy doing other things while it's trying to process the Rosetta Tasks.
It's possible the phone is overheating, although it should just throttle & not restart.
Other than setting the phone to run only when it's not doing other things, or only while on the charger (although doing that you would have to change the Target CPU time to 4 hours or less to make sure to return them before the deadline), otherwise i'd say Rosetta just isn't a suitable project for that device.
Grant
Darwin NT
ID: 109970 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109971 - Posted: 3 Nov 2024, 2:40:32 UTC - in response to Message 109968.  

Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour).
It's taken 16 hours since the Validators were restarted, but we're starting to get some significant falls in the backlog- and looking at my systems pendings, they've actually started to drop too.
*fingers crossed*

I looked a few hours ago and my 132 pending had dropped to 80 and now I've arrived home it's already further down to just 31.
Backlog down to 370k so it's all looking good now. My fears from yesterday have largely been allayed.
ID: 109971 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109972 - Posted: 3 Nov 2024, 11:23:57 UTC - in response to Message 109971.  

Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour).
It's taken 16 hours since the Validators were restarted, but we're starting to get some significant falls in the backlog- and looking at my systems pendings, they've actually started to drop too.
*fingers crossed*

I looked a few hours ago and my 132 pending had dropped to 80 and now I've arrived home it's already further down to just 31.
Backlog down to 370k so it's all looking good now. My fears from yesterday have largely been allayed.

Err... backlog to validate - nil
ID: 109972 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109974 - Posted: 4 Nov 2024, 0:28:08 UTC - in response to Message 109972.  

Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour).
It's taken 16 hours since the Validators were restarted, but we're starting to get some significant falls in the backlog- and looking at my systems pendings, they've actually started to drop too.
*fingers crossed*

I looked a few hours ago and my 132 pending had dropped to 80 and now I've arrived home it's already further down to just 31.
Backlog down to 370k so it's all looking good now. My fears from yesterday have largely been allayed.

Err... backlog to validate - nil

Not quite sure what's happening atm, but the validation backlog is up at 10k, but I don't think it's stopped working - just not quite keeping up for some reason.
The weirdness continues
ID: 109974 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1743
Credit: 18,534,891
RAC: 3,108
Message 109975 - Posted: 4 Nov 2024, 7:01:21 UTC - in response to Message 109974.  
Last modified: 4 Nov 2024, 7:03:42 UTC

Not quite sure what's happening atm, but the validation backlog is up at 10k, but I don't think it's stopped working - just not quite keeping up for some reason.
The weirdness continues
26k now.
The server has had issues for months now. I'm wondering if this is a symptom of those issues as they progressively worsen?

Someone there really needs to take a close look at the system logs to see just what is going on- WTF does the server keep crashing? And why is it now having so much trouble Validating work?
I'm thinking it's time to it to be replaced- a decade and a half is a very long time in computer hardware development.
Grant
Darwin NT
ID: 109975 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2014
Credit: 9,842,442
RAC: 4,053
Message 109976 - Posted: 4 Nov 2024, 8:21:28 UTC - in response to Message 109975.  

Someone there really needs to take a close look at the system logs to see just what is going on- WTF does the server keep crashing? And why is it now having so much trouble Validating work? I'm thinking it's time to it to be replaced- a decade and a half is a very long time in computer hardware development.


As i said a lot of time ago, we don't know if the server page is updated.
If not, the hw and (above all) the os/sw are very old.
Ubuntu 16....
ID: 109976 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109978 - Posted: 4 Nov 2024, 17:42:44 UTC - in response to Message 109976.  
Last modified: 4 Nov 2024, 17:47:55 UTC

Someone there really needs to take a close look at the system logs to see just what is going on- WTF does the server keep crashing? And why is it now having so much trouble Validating work? I'm thinking it's time to it to be replaced- a decade and a half is a very long time in computer hardware development.

As i said a lot of time ago, we don't know if the server page is updated.
If not, the hw and (above all) the os/sw are very old.
Ubuntu 16....

Just being 'old' isn't the worst thing in the world.
Being old and having failure issues every few weeks is a sign that if you don't fix this stuff, it's going to fail altogether.
Which will inevitably result in someone asking whether they can afford the time and trouble to update it all or whether they should go in another direction entirely.
I'm not sure how convinced I am they'll update the hw & sw to continue here tbh

In the meantime, I think all tasks have just run out, so we'll soon see if the validation backlog (currently 59k) will start to edge back down again

Edit: Just checked and no-one in my team has <any> tasks pending validation. Am I just lucky? Or is the backlog not real?
ID: 109978 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109979 - Posted: 4 Nov 2024, 21:27:06 UTC - in response to Message 109978.  

In the meantime, I think all tasks have just run out, so we'll soon see if the validation backlog (currently 59k) will start to edge back down again

Edit: Just checked and no-one in my team has <any> tasks pending validation. Am I just lucky? Or is the backlog not real?

Well, that changed quick.
Validation backlog back down to nil and 700k tasks have popped up
We live to crunch another day
ID: 109979 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2014
Credit: 9,842,442
RAC: 4,053
Message 109980 - Posted: 5 Nov 2024, 8:14:45 UTC - in response to Message 109978.  

Just being 'old' isn't the worst thing in the world.

Not for servers exposed costantly to the internet.
Security fixes, bugfix, support are fundamental (if you care about the project).
There is also the performance factor: do you see the difference of a recente file system (ZFS 2.5) and old one (0.7 - if true)?


Which will inevitably result in someone asking whether they can afford the time and trouble to update it all or whether they should go in another direction entirely.
I'm not sure how convinced I am they'll update the hw & sw to continue here tbh

+1
ID: 109980 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 2014
Credit: 9,842,442
RAC: 4,053
Message 109981 - Posted: 5 Nov 2024, 9:47:15 UTC - in response to Message 109979.  

Validation backlog back down to nil and 700k tasks have popped up
We live to crunch another day


And another day with over 46k wus pending validation... :-(
ID: 109981 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2166
Credit: 41,629,484
RAC: 5,494
Message 109982 - Posted: 5 Nov 2024, 17:13:06 UTC - in response to Message 109981.  

Validation backlog back down to nil and 700k tasks have popped up
We live to crunch another day

And another day with over 46k wus pending validation... :-(

Yes, and now 88k
But I just looked through my team's tasks again and it's the same as a few days ago.
A high figure showing on the server status page, but none of my team have <any> tasks awaiting validation.

Is this 2 coincidences in a row? I'm certainly confused.
ID: 109982 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Bryn Mawr

Send message
Joined: 26 Dec 18
Posts: 408
Credit: 12,313,481
RAC: 1,316
Message 109983 - Posted: 5 Nov 2024, 19:19:48 UTC - in response to Message 109982.  

Validation backlog back down to nil and 700k tasks have popped up
We live to crunch another day

And another day with over 46k wus pending validation... :-(

Yes, and now 88k
But I just looked through my team's tasks again and it's the same as a few days ago.
A high figure showing on the server status page, but none of my team have <any> tasks awaiting validation.

Is this 2 coincidences in a row? I'm certainly confused.


You are the lucky one.

The problem appears to have started for me at 02:00 GMT, for the next hour I have about 50% pending and since then I’ve only had 5 validated out of nearly 100 completed.
ID: 109983 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 . . . 295 · 296 · 297 · 298 · 299 · 300 · 301 . . . 313 · Next

Message boards : Number crunching : Problems and Technical Issues with Rosetta@home



©2025 University of Washington
https://www.bakerlab.org