What's been up with R@H?

Message boards : Number crunching : What's been up with R@H?

To post messages, you must log in.

AuthorMessage
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 10,544
Message 56038 - Posted: 26 Sep 2008, 18:07:04 UTC

Any news? Seems it was offline for the last 17hrs or so...
ID: 56038 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 56040 - Posted: 26 Sep 2008, 18:22:56 UTC - in response to Message 56038.  


i thought they might have gone out of business.. at least it looks like its running

Any news? Seems it was offline for the last 17hrs or so...

ID: 56040 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile retheridge

Send message
Joined: 13 Aug 08
Posts: 1
Credit: 362,493
RAC: 0
Message 56041 - Posted: 26 Sep 2008, 18:23:36 UTC - in response to Message 56038.  

Any news? Seems it was offline for the last 17hrs or so...


I don't have the answer, but I can verify I've had the same problem. My results have all since uploaded but I'm still waiting to download workunits -- my status message says, "Communication deferred xx:xx:xx" and it repeats once the countdown completes. I suppose there's a lot of computers waiting in line, but it seems like a long time to have computers sitting idle when we could be processing the next amazing breakthrough.
ID: 56041 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 56042 - Posted: 26 Sep 2008, 18:29:52 UTC
Last modified: 26 Sep 2008, 18:30:10 UTC

I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically.
Rosetta Moderator: Mod.Sense
ID: 56042 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Keith E. Laidig
Volunteer moderator
Project developer
Avatar

Send message
Joined: 1 Jul 05
Posts: 154
Credit: 117,189,961
RAC: 0
Message 56043 - Posted: 26 Sep 2008, 19:09:03 UTC - in response to Message 56042.  

I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically.


The back-end fileserver experienced a kernel panic while updating the filesystem journal. There is a tremendous amount of I/O on this old machine and sometimes it doesn't keep up well.

We're implementing a new back-end filesystem for R@H - along the line of the SAN that was attempted last year - and plan to move R@H over in the next few weeks. The amount of data that exists and the rapidity with which your clients update it makes the transfer of data very challenging indeed...


ID: 56043 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
BarryAZ

Send message
Joined: 27 Dec 05
Posts: 153
Credit: 30,843,285
RAC: 0
Message 56044 - Posted: 26 Sep 2008, 20:02:24 UTC - in response to Message 56043.  

Ah -- OK -- thanks for the detail -- I was beginning to think there was some sort of gentlemen's understanding regarding news updates here. Glad to hear an explanation. Note, this is the sort of thing which might have a home on the home page.


I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically.


The back-end fileserver experienced a kernel panic while updating the filesystem journal. There is a tremendous amount of I/O on this old machine and sometimes it doesn't keep up well.

We're implementing a new back-end filesystem for R@H - along the line of the SAN that was attempted last year - and plan to move R@H over in the next few weeks. The amount of data that exists and the rapidity with which your clients update it makes the transfer of data very challenging indeed...


ID: 56044 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile rochester new york
Avatar

Send message
Joined: 2 Jul 06
Posts: 2842
Credit: 2,020,043
RAC: 0
Message 56045 - Posted: 26 Sep 2008, 20:13:25 UTC - in response to Message 56044.  
Last modified: 26 Sep 2008, 20:14:51 UTC

maybe we could get some kind of e mail alert to let us know rossetta is down but not out ...new people might not hang around long enough if they dont think the system is ever going to come back up >>>and we need new crunchers..


Ah -- OK -- thanks for the detail -- I was beginning to think there was some sort of gentlemen's understanding regarding news updates here. Glad to hear an explanation. Note, this is the sort of thing which might have a home on the home page.


I can confirm the project was down (by my own observation), and presume that the scheduler is very busy right now. My hosts have been requesting work, but sometimes getting responses back from the scheduler with no new work, just confirming completed results. It will take a couple of hours for things to get more normal. Work is now flowing. But perhaps somewhat sporatically.


The back-end fileserver experienced a kernel panic while updating the filesystem journal. There is a tremendous amount of I/O on this old machine and sometimes it doesn't keep up well.

We're implementing a new back-end filesystem for R@H - along the line of the SAN that was attempted last year - and plan to move R@H over in the next few weeks. The amount of data that exists and the rapidity with which your clients update it makes the transfer of data very challenging indeed...

ID: 56045 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
fjpod

Send message
Joined: 9 Nov 07
Posts: 17
Credit: 2,201,029
RAC: 0
Message 56074 - Posted: 28 Sep 2008, 19:00:15 UTC

I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work.
ID: 56074 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 56075 - Posted: 28 Sep 2008, 19:19:31 UTC - in response to Message 56074.  

I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work.


My PCs are working quite fine...
ID: 56075 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Path7

Send message
Joined: 25 Aug 07
Posts: 128
Credit: 61,751
RAC: 0
Message 56076 - Posted: 28 Sep 2008, 20:42:15 UTC - in response to Message 56074.  

I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work.

Hello fjpod,

After reading your post my first thought is some security software (firewall?) is preventing Boinc from accessing the Internet.
Are you able to run another project on that one computer?

Good luck,
Path7.

ID: 56076 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
R.L. Casey

Send message
Joined: 7 Jun 06
Posts: 91
Credit: 2,728,885
RAC: 0
Message 56079 - Posted: 29 Sep 2008, 12:29:32 UTC - in response to Message 56074.  

I run R@H on about 10 computers. All but one are working fine. The problem one wasn't reporting finished WUs. Communications were automatically deferred for 24 hours. Been going on for about 2 days, so after trying one last update, I reset the project which didn't work either. I finally deleted the project and went to re-attach and it's telling me R@H is temporarily unavailable. Why on only one of my computers? All the others are receiving and reporting work.


fjpod, which one of your computers is having the problem? Thanks!
ID: 56079 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : What's been up with R@H?



©2024 University of Washington
https://www.bakerlab.org