boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over

Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96649 - Posted: 20 May 2020, 8:55:22 UTC
Last modified: 20 May 2020, 8:56:56 UTC

Do you have another account here? Because there is no sign of any computers on the account you used to post here, let alone them having got any work, or produced any errors.

When you first join a project (any project) there will be a lot of downloading as not only do you have to get Tasks to process, you also need to get the applications to process them. Different types of Tasks will also require different support files.
However once all of these files have been downloaded, then the actual data files downloaded to process are generally only a few hundred kB- although the result files being sent back can be as much as 30MB (some times more), usually a lot less.

If file transfers tend to be sticky (it says it's uploading/downloading but nothing is actually happening), in the BOINC Manager (Advanced view), Activity, select "Suspend network activity", then re-select "Network activity based on preferences".


It may also be necessary to use a proxy server to work around the problems with your net connection.
Grant
Darwin NT
ID: 96649 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96662 - Posted: 20 May 2020, 15:05:46 UTC

The BOINC Manager will take care of retrying downloads that get interrupted. BOINC also has settings where you can limit bandwidth usage if you like. "Avg. work done" is over the last 10 days, and during most of those, it sounds like you did no work because you were not attached to the project, so sort of a meaningless number hours after you signup.
Rosetta Moderator: Mod.Sense
ID: 96662 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96686 - Posted: 21 May 2020, 7:56:23 UTC - in response to Message 96684.  
Last modified: 21 May 2020, 7:58:47 UTC

Should I be concerned?
Yes, you need to figure out how many accounts you have, and what they are.
The account you are posting here with has no computers doing any work at all (as the linked to Account page shows), you need to log in to the project using the name & email address that you used to attach the computer to Rosetta that is presently processing work.
Grant
Darwin NT
ID: 96686 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96691 - Posted: 21 May 2020, 10:48:01 UTC - in response to Message 96689.  

Then I will reinstall boinc and try starting over with a fresh account.
Or just attach to the project with the account you are using here.
Grant
Darwin NT
ID: 96691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96718 - Posted: 22 May 2020, 7:58:21 UTC - in response to Message 96713.  

What am I doing wrong?
No idea.
I have only ever used the graphical Manager. I left the command line behind a very long time ago.


The other option would be instead of posting here using this present account, log off from the site, and log back on using the account the computer is using (that actually makes more sense, as the other account will have all the history of the work the computer has done, where as this account doesn't have any processing history *slaps self*).
Grant
Darwin NT
ID: 96718 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96739 - Posted: 22 May 2020, 23:05:52 UTC
Last modified: 22 May 2020, 23:07:21 UTC

OK, from what you have posted previously,

$     boinccmd --get_state
======== Projects ========
1) -----------
   name: Rosetta@home
   master URL: https://boinc.bakerlab.org/rosetta/
   user_name: Macuilxochitl
   team_name: 
   resource share: 100.000000
   user_total_credit: 26253.309237
   user_expavg_credit: 560.492141
   host_total_credit: 6023.143093
   host_expavg_credit: 560.492141

GUI URL:
   name: Your tasks
   description: View the last week or so of computational work
   URL: https://boinc.bakerlab.org/rosetta/results.php?userid=283434
   jobs succeeded: 17
   jobs failed: 58
   elapsed time: 585442.120870
   cross-project ID: b234b0bee793944832bb02a56190d855
So work is being done, and it is earning Credit for that computer on that account.

The user ID for that account is
283434

The user ID for the account you are posting with here is
2157465


So it looks like you've had an account for quite some time, for some reason you then created a new account- but your computer is still on the old account. But since you have logged in here with the new account, you can't see the computer. And when you go to check out your account using the BOINC Manager on the computer, you can't- because you are not logged in on that account.

If you click on "Log out" at the top right hand corner of this page, that will log you out from this web site using the new account.
If you then follow Step 2 below, that should allow you to log back in using your original account, the one that has the computer on it.
Forgot your account info?

You might wan to triple check everything before doing it (i just got up & i'm still not quite awake yet; it's been a looooong and tiring week), but it should get you logged in to this website using the account that your computer is on.
Grant
Darwin NT
ID: 96739 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96741 - Posted: 23 May 2020, 2:01:56 UTC - in response to Message 96739.  

Alright, thanks, that seemed to work. I'm not sure why I have 2 accounts. When I set up BOINC on this box I tried to log in with the username and password I had on record from 12 years ago, but it rejected my password, so I asked to reset my password, but for some reason it seemed to create a second account, I'm not sure what happened, but now everything seems ducky. At least I have some confidence that my computer work is being used.

I just ordered 3 case fans, so we'll see if I can begin to really do some crunching.
ID: 96741 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96742 - Posted: 23 May 2020, 2:26:21 UTC

Glad you got that sorted.
Now you can start hunting down what is going on with the system- it's putting out a lot of errors.
Grant
Darwin NT
ID: 96742 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96743 - Posted: 23 May 2020, 5:19:43 UTC - in response to Message 96742.  

I see the errors in the manager, but I wouldn't know where to look for the source. Maybe it is because I am using the proprietary Nvidia (linux) graphics driver? I guess I could try some memtest.

I'm not seeing anything in the GUI Event log that flags my attention. If I were going to guess I'd say maybe the errors I was getting trying to transfer data to and fro given my marginal internet connection is to blame.

I took a cursory look in my home directory but didn't see any log file to examine. Can you suggest where I might look?
ID: 96743 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96744 - Posted: 23 May 2020, 6:07:18 UTC - in response to Message 96743.  
Last modified: 23 May 2020, 6:09:43 UTC

Can you suggest where I might look?
Unfortunately Linux error messages see to be on par with old DOS ones- next to useless.
It's very unlikely to be related to the video driver (possible, but very unlikely). And the internet issues are also not likely to be the cause- the files were downloaded OK. While several Tasks crashed & burned straight away, others started processing and then crashed. But it is a possibility.
One or 2 of the errors appear to be related to the Tasks themselves- there are issues with some of the Work Units, but all of the others are dying only on your system.


A quick search shows that "process got signal 11" errors are either a problem with the programme (yet others aren't having the issues you are), or it's a hardware problem.

Since you've got your account sorted out, and hopefully the internet issues sorted out, the usual suggested fix it is to Reset project (on the BONC Manager Project tab).
What it does is clears out all of your local files (data and application) for the project, then re-downloads new copies, then downloads new Tasks to process. Given you have had internet issues, it is possible a file or two is corrupted & is responsible for your high error count.
This should eliminate the Rosetta software/libraries/databases being at fault.

If after doing that the problems still occur, then it's a case of testing RAM, making sure the CPU isn't overheating. possibly even turning off hyperthreading & see how things go- with that number of cores & threads, your present system RAM will result in some memory issues as some Tasks can require as much as 3GB of RAM. You generally need to allow for 1.3GB of RAM per core/thread in use to avoid running in to memory limitation issues. But it shouldn't result in the errors that you are seeing.
It's also worth checking the rails of your power supply- if the voltages are dropping under load, that can also result in "process got signal 11" errors.
Grant
Darwin NT
ID: 96744 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96775 - Posted: 25 May 2020, 6:10:19 UTC - in response to Message 96744.  

Well, I let my work unit finish and then Reset the project, as suggested. It has been cranking for 3.5 hours and the 'tasks failed' has remained at 82, so maybe that did the trick. I'll keep an eye on it. It could be the rocky upload may have been to blame. I stuck a USB wifi dongle on the machine and used my neighbor's much faster internet connection to download my work, which went quickly without any interruptions, so maybe it was cleaner.

I've got some fans on the way, maybe it will reduce my temps from ~70c, though that doesn't seem excessive. I don't think the RAM is a limiting factor, I've never gone past 10 GB out of 16, but I guess I could stick in another 8 GB at some point if it becomes an issue. I really hope the power supply is not an issue, those suckers are expensive at the moment. Also, maybe a source of error is that my Geforce 730 GT is a refurb I got for $20. But I get the impression that the GPU isn't that important for Rosetta. Still, Nvidia is a crummy choice for Linux, and I get screen weirdness way to often. I wish I could find a cheap AMD processor and use the open source driver, but unfortunately, desktop display adapters are largely a thing of the past. Folks that are not gaming use onboard graphics, which is actual faster than this damn card anyway, and it is hard to find a decent video card for < $80 or so.
ID: 96775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96776 - Posted: 25 May 2020, 6:15:22 UTC - in response to Message 96775.  

AMD processor AMD video card
ID: 96776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96801 - Posted: 27 May 2020, 1:05:51 UTC - in response to Message 96776.  

Man, this simultaneously sucks and blow, not unlike the two case fans I just installed to make boinc work better.

Errors are ongoing, for a minute I thought I was through that, but I'm gone from 82 to 115 since I reset the project without increasing my completed units from 26. And now it looks like my communication with rosetta is mucked up again, the Transfers tab show my Download is pending. Specifically: "Download: pending (project backoff: 00:30...." This is using a wifi dongle and my neighbor's much faster wire, speedtest says: Download: 42.18 Mbit/s

Oh well, at least my new blue LED fan is pretty. Guess I'll try some memtest before I give up on the project. Ah, the download just restarted and mostly went comfortably until it got to the last tiem in the Transfers tabs, then it stopped again, but after a minute it retried and now I'm cranking again. My temps went right back up to 73C. despite the new fans, but it looks like I'm using a greater percent of my CPU according to htop. Ah crap, while I was typing my CPU use dropped off again and now I'm getting: Status: Communication deferred... Oh well. I'll report back after running memtest.
ID: 96801 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96802 - Posted: 27 May 2020, 1:16:42 UTC - in response to Message 96801.  

https://imgur.com/ViVw0CA

Oy.
ID: 96802 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 96810 - Posted: 27 May 2020, 13:11:07 UTC
Last modified: 27 May 2020, 13:11:34 UTC

Looks like you are getting download errors:

<core_client_version>7.16.6</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
  <file_name>database_357d5d93529_n_methyl.zip</file_name>
  <error_code>-120 (RSA key check failed for file)</error_code>
  <error_message>signature verification failed</error_message>
</file_xfer_error>
</message>
]]>


Perhaps you have an anti-virus that is blocking the zip file from downloading?
Rosetta Moderator: Mod.Sense
ID: 96810 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96818 - Posted: 28 May 2020, 0:38:56 UTC - in response to Message 96810.  

No, I'm using Linux, I don't use AV.

But I guess I may have learned why I was getting so many failed units, if I haven't figured out my DL issues.

I hadn't been able to run memtest because even though it was installed it wasn't one of my Ubuntu grub choices, I'm not sure why. Maybe I installed the system in UEFI mode, I don't know if that is a factor.
But I booted from a live Debian image and was able to run memtest, and it started kicking up errors pretty quickly.

I have a G.SKILL Ripjaws V Series 16GB 288-Pin DDR4 SDRAM DDR4 3200 stick and was running it at its XMP-2 profile, its rated speed, which is the rated speed of the RAM. So I set the RAM speed to 2133 MHz, the lowest speed, and it passed memtest. And I've been running it at that speed for 5 hours and have gotten no further errors. Also, for some reason I was using a little of my swap partition even though I always had plenty of reserve memory. Now I'm using 8 of 16GB of RAM, but no swap at all.

After I'm finish using the machine for the day I'll reset the memory to its XMP 1 profile (which is probably ~2933 MHz or so) and run some memtest on it. If it is stable maybe I'll try pushing it up just a little bit. I'm not sure how much memory speed affects BOINC crunching speed.

I'm just relieved that it doesn't look like my PSU is at fault, that would have been expensive to fix.
ID: 96818 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 96851 - Posted: 29 May 2020, 23:58:46 UTC - in response to Message 96818.  

I dialed the memory timings down from 3200MHz to 2933MHz, which seems like the maximum I can squeeze out of this stick and still pass a round of memtest. But I'm still getting a few errors. Over maybe 10 hours I've gone from 133 total errors to 136.

How bad is that? Are errors to be expected or do any errors indicate a serious issue? Maybe I should dial the RAM down to 2800 or try to RMA the stick?
ID: 96851 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Grant (SSSF)

Send message
Joined: 28 Mar 20
Posts: 1762
Credit: 18,534,891
RAC: 176
Message 96857 - Posted: 30 May 2020, 0:55:26 UTC - in response to Message 96851.  
Last modified: 30 May 2020, 0:59:50 UTC

How bad is that?
Extremely bad. You should not get any errors. However there will be some tasks that are cancelled by the project that will be classed as an error, and there will be some tasks that do error out.
Actual computation errors (unless there are a batch of bad Work Units) should be 2% or less of you Total Task number.
So you should have no more than 2 Computation errors for that system.

Some of your errors are related to the download issues, but the others are computation related and show memory problems (or data corruption).



Maybe I should dial the RAM down to 2800 or try to RMA the stick?
You need to revert your CPU & memory clocks and voltages to stock values. Computation Errors show that the overclock is not stable.
Grant
Darwin NT
ID: 96857 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2207
Credit: 42,137,986
RAC: 21,421
Message 96877 - Posted: 30 May 2020, 14:32:08 UTC - in response to Message 96851.  
Last modified: 30 May 2020, 14:39:40 UTC

I dialed the memory timings down from 3200MHz to 2933MHz, which seems like the maximum I can squeeze out of this stick and still pass a round of memtest. But I'm still getting a few errors. Over maybe 10 hours I've gone from 133 total errors to 136.

How bad is that? Are errors to be expected or do any errors indicate a serious issue? Maybe I should dial the RAM down to 2800 or try to RMA the stick?

I didn't understand the relevance of this earlier in the thread, so I didn't want to interfere, but I just looked up your CPU and it says it can't access RAM faster than 2667.
Obviously it has been, with errors, but it sounds like a good idea to step it down until it's fully successful. There's often a margin, so 2800 is worth a try next.
RMA might be tricky if it's only failing at a speed you already know your CPU can't handle in the first place.

The other thing is you have a 6/12-core processor with 16Gb of memory. With the project's RAM demands recently, you'll struggle to run more cores without more. You mentioned you could add another 8Gb - that sounds like a good idea too
Also, ensure you have the latest BIOS. Some updates improve the stability of higher speed RAM.
ID: 96877 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Macuilxochitl

Send message
Joined: 11 Oct 08
Posts: 13
Credit: 134,700
RAC: 0
Message 97263 - Posted: 6 Jun 2020, 21:32:38 UTC - in response to Message 96877.  

My $85 Ryzen 5 1600 AF can handle higher RAM speeds even though AMD rather conservatively says that its rated speed is only 2667MHz. From what I've read the motherboard is more of a constraint than the CPU, at least up to about 3200MHz. My motherboard's QVL list mentions many kits that have been tested to run substantially faster than 2667MHz. https://www.asrock.com/mb/AMD/B450M%20Pro4/index.us.asp#Memory Of the 290 RAM kits that ASRock tested, 61 of them were rated at 3000 or better and none of them tested as running slower than 2933MHz, and all of the 29 3200MHz kits apparently ran at their rated speeds, and the 7 tested '2933MHz' sets also tested running at their rated speeds. I do so love playing with spreadsheets!

I reclocked my memory down to 2800MHz and have not gotten any additional errors over the last few days, running maybe 6-8 hours a day, so I guess that is where I'll stay. I am a bit disappointed that my memory does so much worse than all the other relatively fast sticks tested by ASRock, but probably it won't hurt my folding unduly.

I'm not about to overclock my CPU, with the stock AMD processor fan I'm hitting rather high temps (80C) even at the rated default clock speed of 3200MHz (max burst speed is apparently 3700MHz without overclocking, but I've never seen the processor go faster than 3500MHz). On hot days I even reduce the CPU limits in BOINC preferences to keep the machine from overheating. Given my unimpressive performance I apparently wasn't too lucky in the hardware lottery, but what the heck, I built the system for about $300 and tax, if you don't count the case and power supply I recycled from an old Athlon XP 1700+ build.

I'm only using 9GB of RAM now, and have never seen it go over 10GB on this (or any) machine, so I have 5.6GB in the bank, but if I ever see the RAM usage go over 12GB I'll order another stick, RAM prices seem to be falling at the moment after climbing for a few months.
ID: 97263 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Questions and Answers : Unix/Linux : boincmgr with rosetta downloaded lots of data and when I rebooted it seemed to start over



©2025 University of Washington
https://www.bakerlab.org