Message boards : Number crunching : Problems and Technical Issues with Rosetta@home
Previous · 1 . . . 295 · 296 · 297 · 298 · 299 · 300 · 301 . . . 313 · Next
Author | Message |
---|---|
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Boinc-process livesTill the next time. Definitely. Waiting to validate has edged fractionally down to 662,008 on the Server Status page, but I'm definitely seeing more tasks than that validated - out of order for some reason but they all count. I've got a full cache, but I'm manually polling anyway to see my credits going up each time. These are our salad days (hours anyway) |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1743 Credit: 18,534,891 RAC: 3,788 |
And the main page Server Status is updating again. The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either. Hopefully they'll start putting a dent in the backlog over the next few hours. Once that happens it shouldn't take long to then clear the backlog; but at present all they're doing is treading water. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either. This is true. Several hours later, the 663k backlog is now 659k. But my team's unvalidated tasks are up from 120 to 132. It seemed a fair few were being validated at the start, but now not many more have been since. If it takes 2 or 3 days to notice the entire server is down I'm not convinced anyone will notice at all that the validation backlog is barely reducing. It may take until new tasks run out or, more likely, for boinc-process to fail again, take another few days, then get re-restarted to improve matters... |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1743 Credit: 18,534,891 RAC: 3,788 |
Now it's up to 683k.The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either. I'm hoping that it's just a case of a messy crash of the server, and it's just re-building/verifying it's storage. In which case it could take a day or so to complete, during which performance is significantly degraded. And once done, the backlog will clear like it usually does in an hour or 2. Or there is something still seriously wrong and the backlog will continue to grow slowly until the current batch of work runs out & the work being returned tapers off (or the server just crashes yet again, and the backlog climbs rapidly like before). Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Now it's up to 683k.The Validators are validating, but they're seriously struggling- the backlog isn't getting any bigger, but it's not getting any less either. While we're guessing, I now note that when I've uploaded completed tasks I'm not seeing any change in credits so, despite what the server status page shows, the continuing buildup to 699k is because validation has stopped altogether, not just slowed. While all servers show green/running I don't know what other trigger there'll be so someone notices, because it isn't even noticed when they're all red. We could be waiting some while. So, the new prediction game is: what will the validation backlog peak at? 1m? 1.2m? 1.5m? |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2014 Credit: 9,834,623 RAC: 3,398 |
While all servers show green/running I don't know what other trigger there'll be so someone notices, because it isn't even noticed when they're all red. Maybe a solution is to stop the wus generator and stop download/upload until the validation queue is clear... |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1743 Credit: 18,534,891 RAC: 3,788 |
Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour). It's taken 16 hours since the Validators were restarted, but we're starting to get some significant falls in the backlog- and looking at my systems pendings, they've actually started to drop too. *fingers crossed* Maybe a solution is to stop the wus generator and stop download/upload until the validation queue is clear...Stopping the return of completed work, you get a massive surge of returned results awaiting on Validation when it's re-enabled (instead of 10k per hour you're looking at 100k or more per hour), and if they're still not working properly, you get an instant backlog & log jam. Stopping new work from being sent would be the most effective method- as caches clear then the amount returned per hour tapers off. When work is re-enabled, the returned per hour gradually builds up again. No sudden massive surge. Grant Darwin NT |
OffDutyTaoist Send message Joined: 10 Oct 06 Posts: 3 Credit: 1,998,088 RAC: 29 |
My Pixel 6 is having issues again with Rosetta v4.20 arm-android-linux-gnu. rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_05_2997716_402_1 rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_04_10_2997716_399_1 rb_10_30_639032_632668_ab_t000__robetta_cstwt_5.0_FT_IGNORE_THE_REST_07_12_2997716_399_1 All were running at the same time, got up to ~0.319% then started resetting my phone. I paused all of them and tried running them individually, and all three would do the same thing when ran separately. I ended up having to abort all them and suspend the project unless someone has an idea. |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1743 Credit: 18,534,891 RAC: 3,788 |
You've managed to complete 2 of those Tasks on that phone, but it's taking 10.5 hours to do 7.5 hours of work, which indicates that the phone is busy doing other things while it's trying to process the Rosetta Tasks. It's possible the phone is overheating, although it should just throttle & not restart. Other than setting the phone to run only when it's not doing other things, or only while on the charger (although doing that you would have to change the Target CPU time to 4 hours or less to make sure to return them before the deadline), otherwise i'd say Rosetta just isn't a suitable project for that device. Grant Darwin NT |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour). I looked a few hours ago and my 132 pending had dropped to 80 and now I've arrived home it's already further down to just 31. Backlog down to 370k so it's all looking good now. My fears from yesterday have largely been allayed. |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour). Err... backlog to validate - nil |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Over the last 90 min, the Validator backlog has dropped by over 100k. Looks like it's dropping by around 35k per hour (when the Validators were down completely, the rate of increase was roughly 12k per hour). Not quite sure what's happening atm, but the validation backlog is up at 10k, but I don't think it's stopped working - just not quite keeping up for some reason. The weirdness continues |
Grant (SSSF) Send message Joined: 28 Mar 20 Posts: 1743 Credit: 18,534,891 RAC: 3,788 |
Not quite sure what's happening atm, but the validation backlog is up at 10k, but I don't think it's stopped working - just not quite keeping up for some reason.26k now. The server has had issues for months now. I'm wondering if this is a symptom of those issues as they progressively worsen? Someone there really needs to take a close look at the system logs to see just what is going on- WTF does the server keep crashing? And why is it now having so much trouble Validating work? I'm thinking it's time to it to be replaced- a decade and a half is a very long time in computer hardware development. Grant Darwin NT |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2014 Credit: 9,834,623 RAC: 3,398 |
Someone there really needs to take a close look at the system logs to see just what is going on- WTF does the server keep crashing? And why is it now having so much trouble Validating work? I'm thinking it's time to it to be replaced- a decade and a half is a very long time in computer hardware development. As i said a lot of time ago, we don't know if the server page is updated. If not, the hw and (above all) the os/sw are very old. Ubuntu 16.... |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Someone there really needs to take a close look at the system logs to see just what is going on- WTF does the server keep crashing? And why is it now having so much trouble Validating work? I'm thinking it's time to it to be replaced- a decade and a half is a very long time in computer hardware development. Just being 'old' isn't the worst thing in the world. Being old and having failure issues every few weeks is a sign that if you don't fix this stuff, it's going to fail altogether. Which will inevitably result in someone asking whether they can afford the time and trouble to update it all or whether they should go in another direction entirely. I'm not sure how convinced I am they'll update the hw & sw to continue here tbh In the meantime, I think all tasks have just run out, so we'll soon see if the validation backlog (currently 59k) will start to edge back down again Edit: Just checked and no-one in my team has <any> tasks pending validation. Am I just lucky? Or is the backlog not real? |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
In the meantime, I think all tasks have just run out, so we'll soon see if the validation backlog (currently 59k) will start to edge back down again Well, that changed quick. Validation backlog back down to nil and 700k tasks have popped up We live to crunch another day |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2014 Credit: 9,834,623 RAC: 3,398 |
Just being 'old' isn't the worst thing in the world. Not for servers exposed costantly to the internet. Security fixes, bugfix, support are fundamental (if you care about the project). There is also the performance factor: do you see the difference of a recente file system (ZFS 2.5) and old one (0.7 - if true)? Which will inevitably result in someone asking whether they can afford the time and trouble to update it all or whether they should go in another direction entirely. +1 |
[VENETO] boboviz Send message Joined: 1 Dec 05 Posts: 2014 Credit: 9,834,623 RAC: 3,398 |
Validation backlog back down to nil and 700k tasks have popped up And another day with over 46k wus pending validation... :-( |
Sid Celery Send message Joined: 11 Feb 08 Posts: 2166 Credit: 41,629,484 RAC: 5,494 |
Validation backlog back down to nil and 700k tasks have popped up Yes, and now 88k But I just looked through my team's tasks again and it's the same as a few days ago. A high figure showing on the server status page, but none of my team have <any> tasks awaiting validation. Is this 2 coincidences in a row? I'm certainly confused. |
Bryn Mawr Send message Joined: 26 Dec 18 Posts: 407 Credit: 12,313,481 RAC: 1,604 |
Validation backlog back down to nil and 700k tasks have popped up You are the lucky one. The problem appears to have started for me at 02:00 GMT, for the next hour I have about 50% pending and since then I’ve only had 5 validated out of nearly 100 completed. |
Message boards :
Number crunching :
Problems and Technical Issues with Rosetta@home
©2025 University of Washington
https://www.bakerlab.org