Message boards : Number crunching : SETI infected by Rosetta?
Author | Message |
---|---|
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
I have one of my older machines sharing mostly Rosetta and some SETI - running the 5.8.8 client so that I can stay on NT4 (which means the SETI Optimized app). For about the past 10 days, this machine has errored every SETI unit. Error details (I guess that's stderr.out) contain interleaved blocks of messages from the SETI app AND from the Rosetta app - see any unit at http://setiathome.berkeley.edu/results.php?hostid=2528837 Yes, the file has messages from both apps in it. Now sometime back (a few months) there were some Rosetta units that would refuse to suspend when BOINC told them to, resulting in the cpu sharing between a SETI unit and a RAH unit, even tho BOINC believed that RAH was suspended - thought that had gone away with recent RAH versions, but haven't really looked for it. Anybody have ideas (other than this is old HW and old SW - I understand that might be the cause)? How does Rosetta write output into what SETI thinks is his output file? Did something change in the last 10 days to invalidate my configuration in an absolute-nothing-works basis? (SETI, that is - Rosetta still works) (posting to both projects) Dave |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Unreal! I take it then that this was on the NT host for Rosetta? I see you encountered the ultimate long-running task there. Ran for 35hrs. Have any other Rosetta tasks apparently had this occur? Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,827 RAC: 3,166 |
I have one of my older machines sharing mostly Rosetta and some SETI - running the 5.8.8 client so that I can stay on NT4 (which means the SETI Optimized app). You might try upgrading to a newer version of Boinc, you are still using version 5.8.8 and I, for example, am using 6.4.5. That is a lot of updates, I tried checking what is the absolute newest version but it won't pop up right now. |
dcdc Send message Joined: 3 Nov 05 Posts: 1832 Credit: 119,673,616 RAC: 11,118 |
I have one of my older machines sharing mostly Rosetta and some SETI - running the 5.8.8 client so that I can stay on NT4 (which means the SETI Optimized app). You might try upgrading to a newer version of Boinc, you are still using version 5.8.8 and I, for example, am using 6.4.5. That is a lot of updates, I tried checking what is the absolute newest version but it won't pop up right now. newer versions aren't NT compatible ;) I guess you could run a newer OS (i.e. a lightweight linux) in a virtual machine? There's some extra overhead but it'll mean you can keep crunching on that machine... Dotch's linux pack might be worth investigating too: https://boinc.bakerlab.org/rosetta/forum_thread.php?id=4675 |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
The only way I can see the output getting comingled like this is if both SETI and Rosetta were assigned the same slot. I've never seen a BOINC bug like that, but if there were, then a newer version might help. But I'm thinking you were recently copying around stuff in your BOINC data path between machines. And this basically corrupted the status in a way that two projects were assigned the same slot, at the same time. I don't think you'll see it happen again. Once the tasks in that slot are completed, things should be back to normal. Rosetta Moderator: Mod.Sense |
LizzieBarry Send message Joined: 25 Feb 08 Posts: 76 Credit: 201,862 RAC: 0 |
I see you encountered the ultimate long-running task there. Ran for 35hrs. You should've looked done that list some more. Task 220279092 ran for 42hrs, apparently successfully (until you examine the detail) for a measly 7 credits. Anybody have ideas (other than this is old HW and old SW - I understand that might be the cause)? How does Rosetta write output into what SETI thinks is his output file? Did something change in the last 10 days to invalidate my configuration in an absolute-nothing-works basis? (SETI, that is - Rosetta still works) I'm not inclined to blame the Rosetta suspension problems from a few months ago for the issues of a few days ago. Too much time passed before issues arose. It looks more like some kind of file corruption, either on the HD or just the Boinc installation. I take the point about wanting to run the optimised client for Seti, but at the moment it's just delivering error results quicker, which doesn't help anyone. Better to upgrade to a far newer, very stable version, like Mikey suggests. Run chkdsk /f /v /r - this is a full scan of files and hard disk sectors, used and unused, which will take a very long time. If that comes up clear, set No New Tasks in Boinc Manager, complete (or abort) all running tasks, report them, detach from each project. Download the latest Boinc for your machine and install, reattach to your projects and allow new tasks. Hopefully this will clear whatever corruption there is in your installation. Sounds a lot, I know, but clearly something's gone very wrong with what you accept is old HW & SW. No point hanging onto a failing past when something better's available. [Edit: Just seen the comment about NT compatibility. Point taken. Can the old 5.88 version be reinstalled then? Maybe detaching and reattaching will be good enough after all?] |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,827 RAC: 3,166 |
I have one of my older machines sharing mostly Rosetta and some SETI - running the 5.8.8 client so that I can stay on NT4 (which means the SETI Optimized app). Yeah if NT is only there because the user has a license for it, then by all means switch to a Linux, or whatever, OS. If the user HAS to stay with NT then he may just end up choosing one project and sticking with it and lets others crunch for the other project. ALL users are helpful and needed but only if they can contribute. Currently this user is having trouble doing that. I HOPE he picks Rosetta!!!! |
FoldingSolutions Send message Joined: 2 Apr 06 Posts: 129 Credit: 3,506,690 RAC: 0 |
Unreal! It might help to look at the computer that was actually running this "35 hour" unit! I didn't know they even bothered making operating systems for these :D |
The_Bad_Penguin Send message Joined: 5 Jun 06 Posts: 2751 Credit: 4,271,025 RAC: 0 |
i'm never one to discourage a donor, but you really have to wonder if its worth the electrical costs for what thats able to contribute... |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
Thanks >You might try upgrading to a newer version of Boinc Can't as long as I stay with NT >I guess you could run a newer OS (i.e. a lightweight linux) Might if that becomes the only way forward - would like to fool with linux at some point-never got the nerve >Can the old 5.88 version be reinstalled then? It's still there - in this whole episode, nothing has been changed, upgraded or downgraded - I'm just examining the evidence to decide what to do. >NT is only there because the user has a license for it, Have a couple of them because they *gave* them away when you bought a MS compiler like VC++ or VJ++ (which I happened to get real cheap in the Student edition) >But I'm thinking you were recently copying around stuff No, no copying of anything going on here - never really touch that machine. >worth the electrical costs its debatable, I thought I'd hang on to it until something (like absolute SW incompatibility) forces my hand. In general though, NT runs for ever, and this machine might have been up for 6 months or a year straight, with no changes. It just sits in the corner and hums. So this problem can only really be HW, or something about Rosetta apps, since nothing else changes as frequently as Rosetta apps. Some thing drastically changed 10 days ago. Right now my thought is to run down the caches, blow it out, maybe find some reliability tests to run, chkdsk, then start up again, one app at a time. Dave |
Mod.Sense Volunteer moderator Send message Joined: 22 Aug 06 Posts: 4018 Credit: 0 RAC: 0 |
Dave, clearly you've thought this through pretty well on your own. We always try not to presume when reading a post, so while you may be rolling your eyes at some of the posts, it's just part of the process of learning what you have and have not done, or considered or observed etc. Anyway... I can only tell you I've never seen anything like that before. I'm not the world's expert, but I have been around BOINC, and the boards for a very long time. So... I tend to lean towards the suggestion posted previously about the BOINC reinstall and reattach approach. If the problem is a corrupted file or disk sector, this will clear it up. I can't think a hardware problem could be so... predictable. I mean both tasks continued to run etc. Have you had it happen again?? Would be good to note exactly which SETI and which Rosetta task were in progress at the time. Rosetta Moderator: Mod.Sense |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,827 RAC: 3,166 |
Dave, clearly you've thought this through pretty well on your own. We always try not to presume when reading a post, so while you may be rolling your eyes at some of the posts, it's just part of the process of learning what you have and have not done, or considered or observed etc. I found at least 1 of the units still on his list: https://boinc.bakerlab.org/rosetta/result.php?resultid=221448595 about half way down it says "pikes Pulses Triplets Gaussians Flops 0 0 0 0 15990568271187 Optimized SETI@Home Enhanced application Optimizers: Ben Herndon, Josef Segur, Alex Kan, Simon Zadra Version: Windows MMX 32-bit based on S@H V5.15 'Noo? No - Ni!' Revision: R-2.4V|PX|FFT:IPP_MMX|Ben-Joe Speed: 1 x 332 MHz Features: MMX" And here is the other unit: https://boinc.bakerlab.org/rosetta/result.php?resultid=220768136 same message, same place. There is more to the Seti stuff embedded that what I quoted. I wonder if Boinc is going too fast for the pc to keep up? It says 332mhz is the speed of the machine and since Boinc swaps every 60 minutes, I wonder if it is getting swamped with the data and is writing the saved stuff in the wrong place. Personally I would run just one project on this machine. Pick one that doesn't take too long and stick with it. |
Dave Mickey Send message Joined: 29 Dec 07 Posts: 33 Credit: 4,136,957 RAC: 0 |
Ahhh, mikey, very nice catch. Each of them is fooling with the other, not just rah messing with sah...!! (at least in terms of output files). Running a memtest86 now, and then we'll check the file system. Get rid of the dust bunnies, and then start up again one at a time. Seeing that the file I/O is confused in both directions really starts to point me towards flaky hw and or file system. Just from instinct, the hard disk on this sys would be suspect #1 - if so, maybe a CDROM based linux would be a cool way to go. Generally, do those install most of what they need to run in memory, or do they have to go to the CD drive a lot? (the CD drive on this thing is as old as the rest of it, so it wouldn't be good to beat on it) @Mod.Sense - I have no problem with any of the helpful posters, hope it didn't seem that way - I just wanted to aggregate those thoughts to combat thread bloat. Dave |
Dagorath Send message Joined: 20 Apr 06 Posts: 32 Credit: 29,176 RAC: 0 |
Just from instinct, the hard disk on this sys would be suspect #1 - if so, maybe a CDROM based linux would be a cool way to go. The trouble is BOINC and the project applications running under BOINC need to save data to the disk. If your hard disk is indeed toast and you remove it then all you have left is the CD ROM. Since it's ROM, BOINC can't save data to it. But no sense worrying about that until you run chkdsk and/or other diagnostics on your HD and see if it's toast. If your HD is bad then watch your local newspapers for people giving away an old computer for free. Or ask friends. You'll be amazed at how many people have old computers sitting around just waiting to get tossed out or recycled. Scavenge an HD from one of those, test it, install Linux on it if it's good. Another alternative is to install Linux on a USB memory stick. Dotsch recently released Dotsch/UX, a Linux + BOINC combo that installs everything you would need including BOINC. The only trouble I can forsee is that USB memory sticks might need USB 2.0 but the computer might have only USB 1.2 which may not work. Or it might work but just not as fast as it would with USB 2.0. For more informations have a look at http://www.dotsch.de/Dotsch_UX. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,827 RAC: 3,166 |
Ahhh, mikey, very nice catch. Each of them is fooling with the other, not just rah messing with sah...!! (at least in terms of output files). The cd-rom based systems write stuff to memory when it needs to save something. I have never found them that helpful and am always leery in case the machine goes down. That may just be me, I have personally downloaded Dotsch's new system and will try it. I will not try it today though, so don't be looking for a review. As has been said before if this system is only running because you have a license for it, Ubuntu Linux is an easy way to get around that. I have detailed my own trials and tribulations with Linux here http://malariacontrol.net/forum_thread.php?id=642 The short version is it is fairly easy and it is very stable and requires no input or monitoring on my part for my 2 Linux machines to pump out the units. One machine is a dual core Intel with HT on each core, making the OS think it is a quad core, and it keeps up with the true quad core AMD machine in credit output!! Linux is easy, free and just sits there and runs. |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,827 RAC: 3,166 |
The cd-rom based systems write stuff to memory when it needs to save something. I have never found them that helpful and am always leery in case the machine goes down. That may just be me, I have personally downloaded Dotsch's new system and will try it. I will not try it today though, so don't be looking for a review. Okay I did it just for you guys...I installed Dotsch's system on a pc that used to run Rosetta just fine thru Windows. The cd runs fine, it boots right up into Ubuntu and the Boinc icon is right there in front of you. Nice job!!! You click on the icon and it asks if you want to join a project, you say yes and the list of projects comes up, except there is nothing on the list, it is blank. Not a problem I came over here got the info and put it in, it attached just fine and then came the preferences. I did not have enough free disk space, it was running in ram only, so no downloading of anything. I needed 44 more ?b free space. I don't remember if that was mb or gb, but I only have 1 gb of ram in that machine and it is not enough. If it is mb than another gig of ram would do the trick, if it is gb than there is no way you can run Rosetta thru Dotsch's system. I am guessing it is mb, because gb would mean a TON of usuage of the hardrive and I don't see that normally. All in all it was a good setup, it is clean, fast and other than having no list of projects to choose from, easy to use. And as I intimated you can get that info from each project on your own really easily. Ram memory is the only problem I see and as long as you have 2 gig of it in the machine, there shouldn't be any problem going diskless with Dotsch's system. |
Dagorath Send message Joined: 20 Apr 06 Posts: 32 Credit: 29,176 RAC: 0 |
Mikey, thanks for volunteering to be the crashtest dummy :) It sounds like Dotsch/UX created a RAM disk then BOINC attempted to download the Rosetta files to it but it was too small. I haven't tried Dotsch/UX yet but it sounds to me like you need a 1 or 2 GB USB memory stick (a thumb drive) plugged into a USB port if you want to configure Dotscg/UX to be "diskless". I gather there are install options that will install the OS and BOINC to the thumb drive so that you can boot from the thumb drive rather than the CD. After boot, the thumb drive is used to store BOINC's data directory. At least that's what I gather from looking at the info at Dotsch/UX website but again... I've never actually tried it. P.S. As for the project list being empty, its sounds like Dotsch forgot to include an all_projects.xml file. That's the file that stores the names and URLs of the projects in the project list. BOINC client updates all_projects.xml about once a week so eventually you would get one. BOINC FAQ Service Official BOINC wiki Installing BOINC on Linux |
mikey Send message Joined: 5 Jan 06 Posts: 1895 Credit: 9,178,827 RAC: 3,166 |
Mikey, thanks for volunteering to be the crashtest dummy :) Thanks...I feel like a dummy sometimes. There were no options when installing, although since I was doing other things at the same time, I may have missed the screen and it just used the defaults and moved on. I did not try putting in a thumb drive, I may have to fire the machine back up and try that. It looks like a standard Ubuntu install but since I am not a Linux guy I can't say for sure. I do have Ubuntu loaded onto 2 machines and they run just fine with no interventions on my part at all. |
Dotsch Send message Joined: 12 Feb 06 Posts: 111 Credit: 241,803 RAC: 0 |
Thank you very much Mikey for testing my distribution and your report. If I understood your report right, you have used the BOINC client directly from the CDROM without installing directly to a USB or HDD ? - In the moment this is not really the good way, because your data would not be written to any media and saved for recovering from any power failures and you have got to save the data manualy. I recommend you to create a USB stick or install the OS to a hard disk to get persistant data of your BOINC data. Eric Meyers had some very good ideas for a improvment which makes a "live cd" mode posible, which should handle this. I think this is very good idea and I will develop and impment this in the next releases. The idea is that the users can boot from CD, insert a USB stick or connect a network share, and the BOINC client data would be written permanent or periodcaly to the USB stick. Edit: Your're both right Dotsch/UX is based on Ubuntu and from users sight it's usable like Ubuntu, and has the tools to setup a BOINC diskless/USB/hard disk system as easy as posible. |
Dotsch Send message Joined: 12 Feb 06 Posts: 111 Credit: 241,803 RAC: 0 |
P.S. As for the project list being empty, its sounds like Dotsch forgot to include an all_projects.xml file. That's the file that stores the names and URLs of the projects in the project list. BOINC client updates all_projects.xml about once a week so eventually you would get one. The all_projects.xml would normaly automaticly downloaded from the BOINC servers. If it would not been downloaded, eventualy the network was not configured. Per default the installation uses DHCP till you configure it manualy. |
Message boards :
Number crunching :
SETI infected by Rosetta?
©2024 University of Washington
https://www.bakerlab.org