Message boards : Number crunching : RAC dropping, BOINC dropping comms
Author | Message |
---|---|
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I've noticed lately more then ever that my Windows PCs seem to have BOINC drop the communications to the running tasks. I've had days where each of my three PCs is all blank on all BOINC tabs, with no tasks running in task manager. Have to just end and restart BOINC and things fire up normally. But it's been sitting there idle until I get it restarted (hence my dropping RAC). I've noticed it is consistently happening when downloads are in progress. I have scheduled hours for when BOINC is allowed to use the network, so when it fires up again, it is typically during the hours when NOT to use the network and I can see one file left in the transfers tab. I enabled the network one day and actually caught it dropping the running tasks. The tasks ran for about 30 seconds more and then ended themselves. The title bar on BOINC shows nothing in the ()s once it has dropped. And all the tabs are blank, doesn't show WUs, or projects nor messages. I am seeing this on both BOINC 5.4.11 and 5.4.9. This is clearly a BOINC problem. Has anyone heard of it elsewhere? Or suggest Windows changes to resolve it? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Christian Diepold Send message Joined: 23 Sep 05 Posts: 37 Credit: 300,225 RAC: 0 |
The same problem happens to me as well. I noticed it first with Leiden Classical, and it only happened after I updated my machines from 5.4.9 to 5.4.11. Take a look here. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Yep! That's it! Additional details in my case... I ran successfully on 5.4.9 for many months without seeing this occur (although on the order of a year ago I was seeing a similar problem and thought new BOINC version had resolved). Then in past 3 weeks or so it started hitting all three of my Windows, single user installations on a very regular basis (I use network only at night, and it was happening most every night). It always seems to have ONE file left to download at the point of failure. Because it ran well for months, I do not suspect a firewall issue. I installed BOINC 5.4.11 in hopes of correcting the problem, on one PC. I enabled my firewall for the new code version. It seems to have the same problem. I run only Rosetta and Ralph on these machines. I have one machine with full time network access (although it often loses it's connection due to ISP drops or Windows TCP problems, I haven't figured it out. It is a Statilite ISP). It seems to have the problem occur less frequently. I'm guessing the heavy crunchers don't see the problem because they tend to install as a service. Does anyone know the details of how to enable some tracing of the BOINC file transfers? Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
larry1186 Send message Joined: 18 Apr 06 Posts: 7 Credit: 329,257 RAC: 0 |
I'm not sure how detailed it can get, but maybe a cc_config.xml file could shed some light on the subject. I've noticed the same disconnection from localhost as well. Since you claim it's tied with network activity, I wonder if limiting to only 1 file transfer at a time will have an effect, or limiting the file transfer rates. Don't get distracted by shiny objects. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
Definately worth a try. We will see if it survives over Thanksgiving long weekend here. I assert that file transfer is related, because when I start BOINC back up, it's always during time of day when network access is not allowed to BOINC. And every time, I see one file left in the file transfers tab. And I see one WU still in a status of transferring. So, I conclude that it's crashing (losing contact with localhost) at night when network activity is allowed. And given that each R@H WU downloads several files, it seems a bit odd that there is ALWAYS one file left to transfer. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Astro Send message Joined: 2 Oct 05 Posts: 987 Credit: 500,253 RAC: 0 |
|
Gyumaou Send message Joined: 20 Nov 06 Posts: 2 Credit: 4,086 RAC: 0 |
I had the same problem.... Following Feet1st's advice I changed the "connect to network about every .... days" setting to 1 day and my client managed to survive the night somehow. However, I did not suspend network activity but left it at available all the time, and thus my client was normally downloading more WUs all through the night. So maybe this problem is not about downloading WUs... I always had the connect to set to 0.1 days and full network activity allowed and nothing of this sort happened when I was actually using the pc, only when I was away (night, school). Well, I'll see for a couple more nights if this was just a fluke or have things really improved and let you guys know soon. |
Team TMR Send message Joined: 2 Nov 05 Posts: 21 Credit: 1,583,679 RAC: 0 |
Is the boinc.exe task still running when this happens? |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
I've seen the same problem too, with the BOINC client crashing when downloading files (usually in the middle of the night - typical!). I've seen the same crashes with 5.4.11, 5.6.5, 5.7.2 and 5.7.4 too. The crashes all started in the first week of November. |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
First week of November... ya that's about when it hit me as well. No change to BOINC version. Just got back to two machines after the long weekend. These two are set to only allow network connection at night, and all my machines have a 1.2 day queue. The changed cc_config.xml (to download one file at a time) did not seem to help on 5.4.11. Yes the BOINC Manager is still active, but when you bring it up, nothing appears in any of the tabs, and the title bar does NOT show "localhost" in the ()s. <edit> I should point out that the Rosetta threads end after about 30 seconds. I was active on the machine once and caught a failure. <end edit> Whoops, guess I set that machine where I changed the cc_config to network all the time... and sure enough, true to the pattern, when I bring BOINC down and back up, it downloaded one last file as shown in the messages tab. It always seems there is one last small file left to download at the point of failure. If network activity is active when BOINC comes back up, it often completes before you can even see it in the transfers tab. It is NOT one of the large files of the WU. I've seen BOINC restart in this state every time (I'm up to about a dozen failures observed of November). Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
Yes the BOINC Manager is still active, but when you bring it up, nothing appears in any of the tabs, and the title bar does NOT show "localhost" in the ()s. The BOINC Manager (boincmgr.exe) starts a separate task, boinc.exe. In my case, its boinc.exe that crashes, leaving BOINC Manager (or BoincView) nothing to talk to. At least I'm not the only one seeing this. I started a thread over on the BOINC Message Boards too, but my thread was about boinc crashing rather than not being able to connect to localhost. I now realise this is the same problem but seeing different ways. I've no idea what happened in early November. It was a full week before Microsoft's November "Patch Tuesday" that I started seeing the problems, so it wasn't anything from Windows Update that triggered it. |
River~~ Send message Joined: 15 Dec 05 Posts: 761 Credit: 285,578 RAC: 0 |
The BOINC Manager (boincmgr.exe) starts a separate task, boinc.exe. In my case, its boinc.exe that crashes, leaving BOINC Manager (or BoincView) nothing to talk to. Well observed. boinc.exe is the client, which actually does the control of the projects, etc. It can be run with or without the BoincManager - but on win boxes when you choose standalone mode then BoincManager is set to start boinc.exe - and in that case you can't stop the manager without stopping boinc. In this configuration, the only time the manager thinks of starting the client is when it is loaded. If the client fails for any reason, you therefore need to exit the MAnaer and start it again in order to get boinc to restart. When you install as a service, then boinc.exe runs without the manager - and if you use BoincView to look at things you never need the Manager at all. for geeks: It is also possible to run boinc.exe on its own by clicking on its icon in the BOINC folder. It you do that it runs in a command window that displays all the messages we are more used to seeing on the message tab. BoincView is quite happy to talk to it in this state. The danger with this is that it is too easy to stop boinc running by closing the box in a thoughtless moment, so I do not recommend this as a way to run for extended periods. Nor do BOINC, which is why there is not an install option to do this... R~~ |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
Yep... I only ever run BOINC as a service anyway. It was only today I realised the reported problems with BOINC Manager losing its connection to localhost was the same problem I've been having with the boinc.exe process terminating. Still no closer to finding the cause/cure though. Oddly, just about everyone I've seen so far with this problem on the various message boards is running Rosetta. |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
Is the termination of boinc.exe accompanied by a consistent abend code and dump? When I get the blank mgr tabs I also get this: BOINC Windows Runtime Debugger Version 5.4.9 *** UNHANDLED EXCEPTION **** Reason: Access Violation (0xc0000005) at address 0x0033B014 read attempt to address 0x00000008 Dump Timestamp : 11/23/06 23:46:30 Dump Timestamp : 11/22/06 15:47:48 Dump Timestamp : 11/10/06 00:16:06 Dump Timestamp : 11/09/06 16:09:28 Dump Timestamp : 11/03/06 15:26:57 Dump Timestamp : 09/15/06 11:06:13 Dump Timestamp : 09/12/06 01:25:23 |
Marky-UK Send message Joined: 1 Nov 05 Posts: 73 Credit: 1,689,495 RAC: 0 |
In my case yes, the code address is always the same (for the same BOINC version). My thread over on the BOINC Message Board is here and I listed some code addresses there. The address I get from 5.4.11 is the same you got from 5.4.9 (0x0033B014) |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
The only time I was able to observe a crash was when mgr was requesting work and reporting a task. The crash happened briefly thereafter, but I'm not sure if during download or after. I'm guessing, could there be any interference from AV software scanning the downloaded files while a task is starting? edit: I've only seen this phenomenon in association with the Rosetta project, never happened in association w/Einstein so far as I know. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,688,048 RAC: 10,544 |
This is off-topic but might be useful for some. boinc.exe is the client, which actually does the control of the projects, etc. Unless you kill boincmgr.exe in task manager which leaves boinc happily running ;)
Another option is to start boinc.exe as a scheduled task at computer startup. You then have the options of running it under a specific account if you're paranoid about security, running it under the system account if you want the screensaver, or under your normal user account. This is useful if you can't/don't want to install as a service for whatever reason, but also don't want boincmgr running to reduce the memory footprint or just don't want people playing with it. Danny |
Feet1st Send message Joined: 30 Dec 05 Posts: 1755 Credit: 4,690,520 RAC: 0 |
I just got another drop and confirmed that boincmgr.exe is still running, but just as was suspected, the boinc.exe is NOT running when the BOINC manager shows blank tabs, and the Rosetta threads are no longer running. [Edit], interesting, this is the machine I set to download files one at a time, and THIS time I've got 5 files in my transfers list upon restart, rather then the usual one single file left. Add this signature to your EMail: Running Microsoft's "System Idle Process" will never help cure cancer, AIDS nor Alzheimer's. But running Rosetta@home just might! https://boinc.bakerlab.org/rosetta/ |
Nothing But Idle Time Send message Joined: 28 Sep 05 Posts: 209 Credit: 139,545 RAC: 0 |
The unexpected termination of boinc.exe is more than an annoyance. On one occasion the client terminated after I had just gone to bed. The client died 20 minutes later and by morning I had lost 7 hours of hyper-threaded processing. So if one is not checking boinc regularly there is the chance that one's computer may be asleep for hours or days (3-day weekend?). |
sslickerson Send message Joined: 14 Oct 05 Posts: 101 Credit: 578,497 RAC: 0 |
I currently have 2 WU's running that started approx. 10 minutes apart. I've been checking in on them for a bit over 5 hours now and I have noticed that instead of being about 10-15 minutes apart (as they started) they are now 47 minutes apart. This seems a bit strange to me as it has been going on now for several weeks. I can see a few minutes separation over 24 hours due to other processes taking cycles but 50 minutes...? --Tim |
Message boards :
Number crunching :
RAC dropping, BOINC dropping comms
©2024 University of Washington
https://www.bakerlab.org