RAC dropping, BOINC dropping comms

Message boards : Number crunching : RAC dropping, BOINC dropping comms

To post messages, you must log in.

1 · 2 · 3 · 4 · Next

AuthorMessage
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 31266 - Posted: 16 Nov 2006, 20:46:26 UTC

I've noticed lately more then ever that my Windows PCs seem to have BOINC drop the communications to the running tasks. I've had days where each of my three PCs is all blank on all BOINC tabs, with no tasks running in task manager. Have to just end and restart BOINC and things fire up normally. But it's been sitting there idle until I get it restarted (hence my dropping RAC).

I've noticed it is consistently happening when downloads are in progress. I have scheduled hours for when BOINC is allowed to use the network, so when it fires up again, it is typically during the hours when NOT to use the network and I can see one file left in the transfers tab.

I enabled the network one day and actually caught it dropping the running tasks. The tasks ran for about 30 seconds more and then ended themselves.

The title bar on BOINC shows nothing in the ()s once it has dropped. And all the tabs are blank, doesn't show WUs, or projects nor messages.

I am seeing this on both BOINC 5.4.11 and 5.4.9.

This is clearly a BOINC problem. Has anyone heard of it elsewhere? Or suggest Windows changes to resolve it?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 31266 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Christian Diepold
Avatar

Send message
Joined: 23 Sep 05
Posts: 37
Credit: 300,225
RAC: 0
Message 31373 - Posted: 18 Nov 2006, 12:39:49 UTC

The same problem happens to me as well. I noticed it first with Leiden Classical, and it only happened after I updated my machines from 5.4.9 to 5.4.11.

Take a look here.
ID: 31373 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 31379 - Posted: 18 Nov 2006, 16:18:02 UTC

Yep! That's it!

Additional details in my case... I ran successfully on 5.4.9 for many months without seeing this occur (although on the order of a year ago I was seeing a similar problem and thought new BOINC version had resolved). Then in past 3 weeks or so it started hitting all three of my Windows, single user installations on a very regular basis (I use network only at night, and it was happening most every night). It always seems to have ONE file left to download at the point of failure. Because it ran well for months, I do not suspect a firewall issue.

I installed BOINC 5.4.11 in hopes of correcting the problem, on one PC. I enabled my firewall for the new code version. It seems to have the same problem.

I run only Rosetta and Ralph on these machines.

I have one machine with full time network access (although it often loses it's connection due to ISP drops or Windows TCP problems, I haven't figured it out. It is a Statilite ISP). It seems to have the problem occur less frequently.

I'm guessing the heavy crunchers don't see the problem because they tend to install as a service.

Does anyone know the details of how to enable some tracing of the BOINC file transfers?
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 31379 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
larry1186

Send message
Joined: 18 Apr 06
Posts: 7
Credit: 329,257
RAC: 0
Message 31580 - Posted: 22 Nov 2006, 18:39:30 UTC

I'm not sure how detailed it can get, but maybe a cc_config.xml file could shed some light on the subject. I've noticed the same disconnection from localhost as well. Since you claim it's tied with network activity, I wonder if limiting to only 1 file transfer at a time will have an effect, or limiting the file transfer rates.
Don't get distracted by shiny objects.
ID: 31580 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 31590 - Posted: 22 Nov 2006, 23:42:45 UTC

Definately worth a try. We will see if it survives over Thanksgiving long weekend here.

I assert that file transfer is related, because when I start BOINC back up, it's always during time of day when network access is not allowed to BOINC. And every time, I see one file left in the file transfers tab. And I see one WU still in a status of transferring. So, I conclude that it's crashing (losing contact with localhost) at night when network activity is allowed. And given that each R@H WU downloads several files, it seems a bit odd that there is ALWAYS one file left to transfer.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 31590 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Astro
Avatar

Send message
Joined: 2 Oct 05
Posts: 987
Credit: 500,253
RAC: 0
Message 31596 - Posted: 23 Nov 2006, 5:09:15 UTC

ID: 31596 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Gyumaou

Send message
Joined: 20 Nov 06
Posts: 2
Credit: 4,086
RAC: 0
Message 31701 - Posted: 27 Nov 2006, 5:57:16 UTC

I had the same problem....
Following Feet1st's advice I changed the "connect to network about every .... days" setting to 1 day and my client managed to survive the night somehow. However, I did not suspend network activity but left it at available all the time, and thus my client was normally downloading more WUs all through the night. So maybe this problem is not about downloading WUs...
I always had the connect to set to 0.1 days and full network activity allowed and nothing of this sort happened when I was actually using the pc, only when I was away (night, school).
Well, I'll see for a couple more nights if this was just a fluke or have things really improved and let you guys know soon.
ID: 31701 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Team TMR

Send message
Joined: 2 Nov 05
Posts: 21
Credit: 1,583,679
RAC: 0
Message 31703 - Posted: 27 Nov 2006, 8:32:38 UTC
Last modified: 27 Nov 2006, 8:42:12 UTC

Is the boinc.exe task still running when this happens?
ID: 31703 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 31704 - Posted: 27 Nov 2006, 8:36:38 UTC

I've seen the same problem too, with the BOINC client crashing when downloading files (usually in the middle of the night - typical!).

I've seen the same crashes with 5.4.11, 5.6.5, 5.7.2 and 5.7.4 too. The crashes all started in the first week of November.
ID: 31704 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 31712 - Posted: 27 Nov 2006, 15:29:42 UTC
Last modified: 27 Nov 2006, 15:43:37 UTC

First week of November... ya that's about when it hit me as well. No change to BOINC version. Just got back to two machines after the long weekend. These two are set to only allow network connection at night, and all my machines have a 1.2 day queue.

The changed cc_config.xml (to download one file at a time) did not seem to help on 5.4.11.

Yes the BOINC Manager is still active, but when you bring it up, nothing appears in any of the tabs, and the title bar does NOT show "localhost" in the ()s. <edit> I should point out that the Rosetta threads end after about 30 seconds. I was active on the machine once and caught a failure. <end edit>

Whoops, guess I set that machine where I changed the cc_config to network all the time... and sure enough, true to the pattern, when I bring BOINC down and back up, it downloaded one last file as shown in the messages tab. It always seems there is one last small file left to download at the point of failure. If network activity is active when BOINC comes back up, it often completes before you can even see it in the transfers tab. It is NOT one of the large files of the WU. I've seen BOINC restart in this state every time (I'm up to about a dozen failures observed of November).
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 31712 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 31713 - Posted: 27 Nov 2006, 15:36:56 UTC - in response to Message 31712.  

Yes the BOINC Manager is still active, but when you bring it up, nothing appears in any of the tabs, and the title bar does NOT show "localhost" in the ()s.

The BOINC Manager (boincmgr.exe) starts a separate task, boinc.exe. In my case, its boinc.exe that crashes, leaving BOINC Manager (or BoincView) nothing to talk to.

At least I'm not the only one seeing this. I started a thread over on the BOINC Message Boards too, but my thread was about boinc crashing rather than not being able to connect to localhost. I now realise this is the same problem but seeing different ways.

I've no idea what happened in early November. It was a full week before Microsoft's November "Patch Tuesday" that I started seeing the problems, so it wasn't anything from Windows Update that triggered it.
ID: 31713 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 31735 - Posted: 27 Nov 2006, 21:18:13 UTC - in response to Message 31713.  
Last modified: 27 Nov 2006, 21:22:01 UTC

The BOINC Manager (boincmgr.exe) starts a separate task, boinc.exe. In my case, its boinc.exe that crashes, leaving BOINC Manager (or BoincView) nothing to talk to.


Well observed.

boinc.exe is the client, which actually does the control of the projects, etc.

It can be run with or without the BoincManager - but on win boxes when you choose standalone mode then BoincManager is set to start boinc.exe - and in that case you can't stop the manager without stopping boinc.

In this configuration, the only time the manager thinks of starting the client is when it is loaded. If the client fails for any reason, you therefore need to exit the MAnaer and start it again in order to get boinc to restart.

When you install as a service, then boinc.exe runs without the manager - and if you use BoincView to look at things you never need the Manager at all.

for geeks:

It is also possible to run boinc.exe on its own by clicking on its icon in the BOINC folder. It you do that it runs in a command window that displays all the messages we are more used to seeing on the message tab. BoincView is quite happy to talk to it in this state. The danger with this is that it is too easy to stop boinc running by closing the box in a thoughtless moment, so I do not recommend this as a way to run for extended periods. Nor do BOINC, which is why there is not an install option to do this...

R~~
ID: 31735 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 31739 - Posted: 27 Nov 2006, 21:40:38 UTC

Yep... I only ever run BOINC as a service anyway. It was only today I realised the reported problems with BOINC Manager losing its connection to localhost was the same problem I've been having with the boinc.exe process terminating.

Still no closer to finding the cause/cure though. Oddly, just about everyone I've seen so far with this problem on the various message boards is running Rosetta.
ID: 31739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 31760 - Posted: 28 Nov 2006, 12:28:55 UTC
Last modified: 28 Nov 2006, 12:34:58 UTC

Is the termination of boinc.exe accompanied by a consistent abend code and dump? When I get the blank mgr tabs I also get this:

BOINC Windows Runtime Debugger Version 5.4.9
*** UNHANDLED EXCEPTION ****
Reason: Access Violation (0xc0000005) at address 0x0033B014
read attempt to address 0x00000008

Dump Timestamp : 11/23/06 23:46:30
Dump Timestamp : 11/22/06 15:47:48
Dump Timestamp : 11/10/06 00:16:06
Dump Timestamp : 11/09/06 16:09:28
Dump Timestamp : 11/03/06 15:26:57
Dump Timestamp : 09/15/06 11:06:13
Dump Timestamp : 09/12/06 01:25:23
ID: 31760 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Marky-UK

Send message
Joined: 1 Nov 05
Posts: 73
Credit: 1,689,495
RAC: 0
Message 31761 - Posted: 28 Nov 2006, 12:35:58 UTC

In my case yes, the code address is always the same (for the same BOINC version).

My thread over on the BOINC Message Board is here and I listed some code addresses there.

The address I get from 5.4.11 is the same you got from 5.4.9 (0x0033B014)
ID: 31761 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 31763 - Posted: 28 Nov 2006, 13:04:26 UTC
Last modified: 28 Nov 2006, 13:10:42 UTC

The only time I was able to observe a crash was when mgr was requesting work and reporting a task. The crash happened briefly thereafter, but I'm not sure if during download or after. I'm guessing, could there be any interference from AV software scanning the downloaded files while a task is starting? edit: I've only seen this phenomenon in association with the Rosetta project, never happened in association w/Einstein so far as I know.
ID: 31763 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 10,544
Message 31764 - Posted: 28 Nov 2006, 14:03:07 UTC - in response to Message 31735.  
Last modified: 28 Nov 2006, 14:04:48 UTC

This is off-topic but might be useful for some.
boinc.exe is the client, which actually does the control of the projects, etc.

It can be run with or without the BoincManager - but on win boxes when you choose standalone mode then BoincManager is set to start boinc.exe - and in that case you can't stop the manager without stopping boinc.

Unless you kill boincmgr.exe in task manager which leaves boinc happily running ;)


for geeks:

It is also possible to run boinc.exe on its own by clicking on its icon in the BOINC folder. It you do that it runs in a command window that displays all the messages we are more used to seeing on the message tab. BoincView is quite happy to talk to it in this state. The danger with this is that it is too easy to stop boinc running by closing the box in a thoughtless moment, so I do not recommend this as a way to run for extended periods. Nor do BOINC, which is why there is not an install option to do this...

R~~

Another option is to start boinc.exe as a scheduled task at computer startup. You then have the options of running it under a specific account if you're paranoid about security, running it under the system account if you want the screensaver, or under your normal user account. This is useful if you can't/don't want to install as a service for whatever reason, but also don't want boincmgr running to reduce the memory footprint or just don't want people playing with it.

Danny
ID: 31764 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Feet1st
Avatar

Send message
Joined: 30 Dec 05
Posts: 1755
Credit: 4,690,520
RAC: 0
Message 31814 - Posted: 29 Nov 2006, 15:18:47 UTC
Last modified: 29 Nov 2006, 15:21:09 UTC

I just got another drop and confirmed that boincmgr.exe is still running, but just as was suspected, the boinc.exe is NOT running when the BOINC manager shows blank tabs, and the Rosetta threads are no longer running.

[Edit], interesting, this is the machine I set to download files one at a time, and THIS time I've got 5 files in my transfers list upon restart, rather then the usual one single file left.
Add this signature to your EMail:
Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might!
https://boinc.bakerlab.org/rosetta/
ID: 31814 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 31840 - Posted: 29 Nov 2006, 23:46:12 UTC

The unexpected termination of boinc.exe is more than an annoyance. On one occasion the client terminated after I had just gone to bed. The client died 20 minutes later and by morning I had lost 7 hours of hyper-threaded processing. So if one is not checking boinc regularly there is the chance that one's computer may be asleep for hours or days (3-day weekend?).
ID: 31840 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile sslickerson

Send message
Joined: 14 Oct 05
Posts: 101
Credit: 578,497
RAC: 0
Message 31841 - Posted: 30 Nov 2006, 2:04:37 UTC

I currently have 2 WU's running that started approx. 10 minutes apart. I've been checking in on them for a bit over 5 hours now and I have noticed that instead of being about 10-15 minutes apart (as they started) they are now 47 minutes apart. This seems a bit strange to me as it has been going on now for several weeks. I can see a few minutes separation over 24 hours due to other processes taking cycles but 50 minutes...?

--Tim



ID: 31841 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · 3 · 4 · Next

Message boards : Number crunching : RAC dropping, BOINC dropping comms



©2024 University of Washington
https://www.bakerlab.org