CPU Optimization, GPU utilization: so sad!

Message boards : Number crunching : CPU Optimization, GPU utilization: so sad!

To post messages, you must log in.

Previous · 1 · 2 · 3

AuthorMessage
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 60555 - Posted: 8 Apr 2009, 16:38:32 UTC
Last modified: 8 Apr 2009, 16:39:48 UTC

Hi Mod.Sense,

Indeed, I am not expecting the kind of improvements with Rosetta@home that Einstein@home saw. I'm always being conservative and estimating a 20-40% improvement, not a 800% one.

I do understand that cleaning up and rewriting the algorithms without any SSEx work can make a big difference, and I'm all for that too. I just think SSEx is another potentially powerful tool that should probably be tried (or at least if it's not applicable at all, I'd like to know why).

The executive summary of what I'm saying is:

1) SSEx extensions are not cutting edge (esp. SSE and SSE2). They've been in CPUs for many year, so legacy support shouldn't be a deal-breaker

2) The Rosetta code changes a lot, but there are probably parts of it that are static.

F.ex., maybe the part of the code that decides where to search in 'protein shape space' is being tweaked all the time to make it 'smarter', but maybe the part that calculates the energy of one particular 3D protein conformation stays the same (because the laws of physics don't change). Maybe that 'energy calculator' is what eats up the most cycles because of the complex floating point math, and maybe it could use SSE2 for a big speedup.

3) It might be possible to validate a change to a single sub-module more easily than many optimizations all over the code.

f.ex. You plug a certain data-set through the current 'energy calculator', look at the results. Then plug same data-set through an optimized SSE2 'energy calculator' and see if the results are the same. If they are over a very large data-set, this means you might be able to plug this new faster module into the big Rosetta@home code edifice without changing the rest. (I'm speculating here, but if R@H is modular, this might be possible).
ID: 60555 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,179,826
RAC: 3,209
Message 60567 - Posted: 9 Apr 2009, 11:13:03 UTC - in response to Message 60555.  

Hi Mod.Sense,

Indeed, I am not expecting the kind of improvements with Rosetta@home that Einstein@home saw. I'm always being conservative and estimating a 20-40% improvement, not a 800% one.

I do understand that cleaning up and rewriting the algorithms without any SSEx work can make a big difference, and I'm all for that too. I just think SSEx is another potentially powerful tool that should probably be tried (or at least if it's not applicable at all, I'd like to know why).

The executive summary of what I'm saying is:

1) SSEx extensions are not cutting edge (esp. SSE and SSE2). They've been in CPUs for many year, so legacy support shouldn't be a deal-breaker

2) The Rosetta code changes a lot, but there are probably parts of it that are static.

F.ex., maybe the part of the code that decides where to search in 'protein shape space' is being tweaked all the time to make it 'smarter', but maybe the part that calculates the energy of one particular 3D protein conformation stays the same (because the laws of physics don't change). Maybe that 'energy calculator' is what eats up the most cycles because of the complex floating point math, and maybe it could use SSE2 for a big speedup.

3) It might be possible to validate a change to a single sub-module more easily than many optimizations all over the code.

f.ex. You plug a certain data-set through the current 'energy calculator', look at the results. Then plug same data-set through an optimized SSE2 'energy calculator' and see if the results are the same. If they are over a very large data-set, this means you might be able to plug this new faster module into the big Rosetta@home code edifice without changing the rest. (I'm speculating here, but if R@H is modular, this might be possible).


At my work we were just sent a new program by the State to do our work. It was a new version of an old program that they had modified one portion of to better show what they wanted in these lean budget times. They only changed one small part of the program, the ran the tests and all was fine so they sent it out with a 2 week deadline for the various localities to install it. Problem is the code they changed didn't JUST affect that small section, it affected several other parts too. Where this is going is that there is no 'standard' for how to write a program. Yes there are language standards, but each individual programmer has his or her own style, making it easiest for them. That also means that when changing a program someone else wrote is extremely difficult and time consuming because you have to follow each and every little line to see what else it affects. Some programmers like the modular approach, like you ask about. Others think the old adage of 'why write twice when you only have to write once' way is best. That all means that you have a very difficult time optimizing someone elses coding! I have a friend that used to be a programmer for a bank, when the ATM's went down she was always having to go down, or in some rare cases online, and write new code to fix them. She, and her fellow programmers, did not fix what was there, they ALWAYS added to it. So she just added lines and rem'd out the old lines. That made for a million lines of code in no time and no hope of it ever being smaller! That alone made the ATM's do things slower!!!! In the end the bank was bought out by another bank and the new banks programming was used in the ATM's. The bank she worked at was First American, a HUGE bank. If they couldn't afford a rewrite, Rosie may never, without some strong volunteer support.
ID: 60567 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 60574 - Posted: 9 Apr 2009, 14:28:12 UTC

No need for a 10th example. I get it. Changing code can have unintended effects. But the Rosetta team is changing code all the time, and you would expect they know how to do it and how to test things to make sure they work (internally, and then on Ralph). All your examples about how changing code can create problems also applies to non-SSE changes.

SSE isn't black magic, it's not a little-used thing that is undocumented. It's been around for a decade and is in many other scientific programs.

I'm not saying I know something they don't, I'm not saying I know in advance it'll work. I'm just saying they should look into it if they haven't because the potential benefits could be big for the science of the project.
ID: 60574 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1895
Credit: 9,179,826
RAC: 3,209
Message 60602 - Posted: 11 Apr 2009, 8:05:20 UTC - in response to Message 60574.  

No need for a 10th example. I get it. Changing code can have unintended effects. But the Rosetta team is changing code all the time, and you would expect they know how to do it and how to test things to make sure they work (internally, and then on Ralph). All your examples about how changing code can create problems also applies to non-SSE changes.

SSE isn't black magic, it's not a little-used thing that is undocumented. It's been around for a decade and is in many other scientific programs.

I'm not saying I know something they don't, I'm not saying I know in advance it'll work. I'm just saying they should look into it if they haven't because the potential benefits could be big for the science of the project.


And I don't think anyone is disagreeing with you, we are all just trying to point out the reasons why the Project may not just jump in with both feet supporting your idea. I think if the Project has a volunteer willing to do the work right now, it would already be started. But coders can make a TON of money, my friend was making almost 1/4 million a year, and there is no time frame as to how long it will take. It takes as long as it takes. Paying for that in Boinc projects is usually not an option. Finding a volunteer coder that the project trusts is another issue that must be dealt with. What if the code screws things up so bad they have to just dump their work and start over, can you imagine the bad wrap they will get? It is not an easy decision, but I do agree probably a worthwhile one if it works.
ID: 60602 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 60693 - Posted: 17 Apr 2009, 13:28:42 UTC

I haven't posted for a long time and suddenly feel the urge to interject my two cents worth. This is my opinion only and is not meant to criticize anyone specifically.

First, optimizing the code is not a new topic and arises over and over. The project team has stated in previous threads that they did not expect any real benefit from SSE/2 optimizations versus just leaving the code as is. I accept their decision whether it is accurate or not. Why can't the public accept this?

People need to differentiate between 1) using a static analytic methodology (referred to in this thread as the "algorithm") to discover protein conformations and thereby medical treatments and 2) testing the algorithm's effectiveness while constantly improving on it's predictive behavior. A static algorithm likely can be expected to be optimized but a constantly changing algorithm IMO is probably not worth optimizing since the optimization could be lost in the next round of improvement.

Second, all the pleading and begging and posturing is based on an assumption that optimizing the Rosetta code will translate directly into saving countless lives sooner. This is a big leap for cause and effect, it's altruistic, wishful thinking not based on known quantities. No good scientific pursuit should be conducted with this kind of thinking. And apparently the project staff does not have the resources of time, money and expertise to do anything other than what they are now. Unless the staff is a bunch of idiots, if there was any substantive gain to be made from optimization one has to believe they would pursue it in their own best interest. We clients reserve the right to offer suggestions but if they are not accepted I don't think we should dwell on it ad nauseum.
ID: 60693 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 60695 - Posted: 17 Apr 2009, 15:44:14 UTC - in response to Message 60693.  
Last modified: 17 Apr 2009, 15:51:41 UTC

First, optimizing the code is not a new topic and arises over and over. The project team has stated in previous threads that they did not expect any real benefit from SSE/2 optimizations versus just leaving the code as is. I accept their decision whether it is accurate or not. Why can't the public accept this?


Could you please show me where the project said that? It is possible that I missed it (or forgot it?), and other people in this thread seem to have missed it to, because you are the first to mention that the project has said that they didn't expect any "real benefit" from SSE/SSE2.

People need to differentiate between 1) using a static analytic methodology (referred to in this thread as the "algorithm") to discover protein conformations and thereby medical treatments and 2) testing the algorithm's effectiveness while constantly improving on it's predictive behavior. A static algorithm likely can be expected to be optimized but a constantly changing algorithm IMO is probably not worth optimizing since the optimization could be lost in the next round of improvement.


Isn't it what I did above? I said that the part which is constantly changing probably can't be optimized easily, but that maybe parts of the code (which could plausibly be some of the most CPU intensive parts of the molecular dynamics engine) might be static and worth optimizing.

Second, all the pleading and begging and posturing is based on an assumption that optimizing the Rosetta code will translate directly into saving countless lives sooner. This is a big leap for cause and effect, it's altruistic, wishful thinking not based on known quantities. No good scientific pursuit should be conducted with this kind of thinking.


I disagree. A lot of scientific pursuit - specifically those that deal more directly with medical science - should be conducted with the ultimate goal of helping people in mind, and with urgency to match. If you're trying to understand how neutron stars work, take your time or not, people won't die. But if you're working on a cure for cancer, it does make a difference if you don't grab opportunities to speed up your research.

In fact, I'd love to hear your reasoning on this... How can predicting the shapes (and thus functions) of proteins and designing new ones computationally NOT save countless lives down the road (directly or indirectly, because other techniques will build on this)? It would be a HUGE fundamental breakthrough. And if that is correct, it then follow that getting "there" (wherever that is) sooner will help more people, in the same way that if antibiotics had been discovered a few years sooner than they have, people who died in the meantime could have been saved.

If they're possible, SSE optimizations would only be a small part of "getting there", of course. But they would help in the following way: With the same amount of CPUs, you could run X# more models than you otherwise could in a fixed amount of time, thus on average increasing the precision of your predictions. If that difference is high enough, this could have a bigger impact than some other more direct changes to the algorithms, afaict.

And apparently the project staff does not have the resources of time, money and expertise to do anything other than what they are now. Unless the staff is a bunch of idiots, if there was any substantive gain to be made from optimization one has to believe they would pursue it in their own best interest. We clients reserve the right to offer suggestions but if they are not accepted I don't think we should dwell on it ad nauseum.


If the project has indeed looked at this seriously and rejected it, they obviously have done a terrible job of communicating it to the volunteers who make Rosetta@home possible, because most of us here (including the moderator) don't seem to be aware of this statement, which would be easy to point to.

Things change. Not that long agot he Rosetta code was in Fortran (iirc), and a few years before that, a decent ratio of total teraFLOPS probably came from non-SSEx CPUs. Maybe with the new code, new algorithms and new CPUs, these optimizations that weren't possible before would now make more sense? Or maybe not. I just don't know.

But there's ZERO harm in having passionate project volunteers speculate about what might help the project, and if the project people never come here and tell us what's what, there's a good chance that people will keep speculating in good faith.
ID: 60695 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60696 - Posted: 17 Apr 2009, 16:36:12 UTC

I have zero plans to search 4 some years of forum posts to find the definitive statement that you demand. But almost from the day that Rosetta opened this has been a topic. I was one of the early joiners of the project (id: 269) and back then was one of the more active posters and debaters (and on several projects).

As others have stated, this issue has been raised, discussed, and like it or not it is not a likely source of significant gain for the likely pain.

If you really want to help, add Ralph to your project list and help debug 1.62 Mini ...

Passion is good. Interest is good. Beaten up dead horses just stink ...
ID: 60696 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 60702 - Posted: 17 Apr 2009, 19:32:38 UTC
Last modified: 17 Apr 2009, 19:38:26 UTC

If the post you refer to is 4 years old, this horse is quite alive. A lot has changed since then.

Fact is, the people who say "it won't work! forget it!" don't seem to know any more than the people who say "it could work!", which is why it would be very helpful for the project to say something about it. If people keep asking the same question, it's obviously because they haven't had an answer.
ID: 60702 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60705 - Posted: 17 Apr 2009, 21:40:03 UTC - in response to Message 60702.  

If the post you refer to is 4 years old, this horse is quite alive. A lot has changed since then.

Not necessarily. The fact that the code has changed does not mean that the fundamentals of the code's organization has changed as well. If the fundamental structure is the same, then the same reason that 4 years ago indicated that the vectorization instructions would not speed up the algorithm won't speed up the new incarnation either.

You cling to the myth that the vectorizing instructions are a solution to all speed problems, or that their magic will speed up all programs. This is not the case. There are some problems and some algorithms that are not amenable to vectorization. Fundamentally, if these instructions would have a significant impact then this application would be a superb candidate for CUDA. The project has stated that it isn't ...

Four years ago they stated that the experiments with the instruction set enhancements did not give significant improvements or introduced issues that I discussed in other posts.

Fact is, the people who say "it won't work! forget it!" don't seem to know any more than the people who say "it could work!", which is why it would be very helpful for the project to say something about it. If people keep asking the same question, it's obviously because they haven't had an answer.

Again, this is something that I used to do for a living, that is computer systems engineering. At one point in my life I spent 12-18 hours a day documenting BOINC and all things related, including Rosetta@Home. I have studied computers and owned them since 1975, have a BS in Computer Science and a masters in Software Engineering. So, yes, maybe I am conceited, but I think I do know a little bit about this subject ...

Yes Virginia there is a Santa Clause, but this horse is dead ...
ID: 60705 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 60707 - Posted: 17 Apr 2009, 22:06:43 UTC - in response to Message 60705.  

Not necessarily. The fact that the code has changed does not mean that the fundamentals of the code's organization has changed as well. If the fundamental structure is the same, then the same reason that 4 years ago indicated that the vectorization instructions would not speed up the algorithm won't speed up the new incarnation either.


That's very possible. But maybe it's not the case. All I'm saying is that if the last time we heard about it was 4 years ago - and it didn't seem to be that clear since people have asked about it every since - that an update on the situation might not be too much to ask since it seems to be a recurrent question that interests lots of the volunteers.

You cling to the myth that the vectorizing instructions are a solution to all speed problems, or that their magic will speed up all programs. This is not the case. There are some problems and some algorithms that are not amenable to vectorization. Fundamentally, if these instructions would have a significant impact then this application would be a superb candidate for CUDA. The project has stated that it isn't ...


Don't tell me what I cling or don't cling to, please.

All I've said here is that IF it could work, the benefits could be XYZ, and that there are some reasons that aren't good enough to be dealbreakers IMO (legacy CPUs, etc).

As for CUDA and the PS3/XBOX, etc.. It was my understanding that the big problem there was that this would require a separate code base to build and maintain, that Rosetta@home uses too much RAM, and that those platforms aren't mature (and that until recently the code was messy and fortran). SSEx, if it can be used, could be implemented in a single code base, and it's quite mature.

Four years ago they stated that the experiments with the instruction set enhancements did not give significant improvements or introduced issues that I discussed in other posts.


All I've been asking is to know if it could work or not. I'd love to see this 4 years old post to see if it contains, but apparently it's lost.

Again, this is something that I used to do for a living, that is computer systems engineering. At one point in my life I spent 12-18 hours a day documenting BOINC and all things related, including Rosetta@Home. I have studied computers and owned them since 1975, have a BS in Computer Science and a masters in Software Engineering. So, yes, maybe I am conceited, but I think I do know a little bit about this subject ...


I didn't say that nobody knew anything about coding or BOINC, it just seems that in this thread all that everybody can do is either ask questions (and not get answers), or say that 4 years ago someone said something about it. It's all so unspecific as to be unsatisfying as an answer to this question, IMHO.

Is there no way to get to the bottom of this, so that in all future instances we can point people to the answer when they ask the question again?

People here seem to want build strawmen, as if I'm rooting for SSEx like its a sports team, and so they have to root for non-SSEx so this becomes a binary adversarial thing. But I don't care about petty debates, I'm just REALLY curious to know if this could help improve the science of Rosetta because I happen to care a lot about this project (as you can see from my number of credits) and I want what's best for it. If SSEx can't work, so be it, let's move on. I just haven't heard anything convincing on that front yet and I'd love for someone to shine some real light on this...
ID: 60707 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Mod.Sense
Volunteer moderator

Send message
Joined: 22 Aug 06
Posts: 4018
Credit: 0
RAC: 0
Message 60708 - Posted: 17 Apr 2009, 23:49:25 UTC
Last modified: 18 Apr 2009, 0:01:37 UTC

There are literally scores of threads on the topic.
http://www.google.com/search?hl=en&q=sse3+optimize+site%3Abakerlab.org&btnG=Google+Search&aq=f&oq=

So the question becomes how much can be said about it?

Dr. Baker talked about making the entire application public in 2005.
https://boinc.bakerlab.org/rosetta/forum_thread.php?id=349&nowrap=true#2771. He posted that the highest level optimizations available by the compilers used on each platform are used.

Mats made several posts about the subject in the thread on 64bit optimizations and on cache effects.

Who? posted a lot of information on various performance ideas.

2007 started with a thread on CPU optimization

Discussion on QX6700 or XEONs turned in to a discussion on SSSE3.

I've posted about it enough times I can't even find my own posts on the related subjects. There are many optimization techniques, many potential computation platforms, many target operating systems, and many hardware combinations that have all been discussed numerous times in the passed. Follow the Google link above and start a thread that indexes all of the discussions on the subject and in the course of doing so you will find references to posts on Ralph, to posts from project scientists, to posts from industry experts, and posts from people that have done optimizations for other BOINC projects, and posts from other enthusiasts that have studied it and decided not to further pursue it.

Here are some other keywords of interest:
Optimize
64bit
GPU
PS3
CUDA
SSE
SSE2
SSE3
SSE4
processing
efficiency
bottleneck
faster
improvements
virtual memory


...and many of the discussions degraded into credit battles and were deleted due to the resulting ...um... "disregard for the posting requirements".
Rosetta Moderator: Mod.Sense
ID: 60708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 60715 - Posted: 18 Apr 2009, 3:53:48 UTC

@Mod.Sense

Thank you for the research.

@Michael,

At one point I had resolved not to address this again ... alas, the weakness of the mind.

The idea that everything changes in 2 years or 4 years or even 10 years in computers is another myth. When I used to teach I pointed out to my students in any number of classes that there was little new under the sun. Want to know what is in store for desktop computers, look to the history of the "Mainframe" and the supercomputer ... whatever was used in those machines has made it into desktops ... we are now at the stage where we are "bolting-on" vector processors ... aka GPUs ... nothing new under the sun.

The same is true of coding and computer languages. And algorithms. If the algorithm that underpins Rosetta is not amenable to vectorization 4 years ago, it is not any more amenable now. And that is the point that you refuse to acknowledge. Yes, SSE and all its antecedents make some things faster. But not all things.

If you don't want to give people the impression that you cling to things, then don't do so ...

You have been told, and now have historical links to the past discussions since you do not seem to trust those of us who have taken the time to discuss this with you...

The question is not wether the technology of SSE is mature and that has never been an argument against its use by anyone here that I can recall. I know I never said it was not mature enough for consideration. It will however, complicate the development process and likely not give significant performance gains.

People can ask questions and get answers. And, several people, myself included have tried to answer yours. With you popping back with "but I know it can work".

And, the answer of no is only unsatisfying to the person that is not open to alternatives. The question under your question is how to get from where we are to somewhere else the fastest. Your answer, against all explanations is that SSE and enhancements like that could save the day. But that is not the only way to get more rapidly from here to there.

If you look long enough and hard enough I am sure you can find where I posted the sea story where a program I was working on only failed when compiled with "optimizations" on ... others I have worked on returned invalid results when "optimized" ... we can spend hours talking about that though it would be easier on me if we did it on Skype than here as this typing can be hard for me when I am not doing that well ... but it is usually boring arcane drivel if you are not interested in minutia ...

Mod.Sense did provide a laundry list of links so you can go look at them so that you can see the answers there are almost certainly what we have been giving here ...

I am not rooting for not using SSE, I am only interested in getting from here to there, and to do that I offer my computers, and my time on the boards answering questions as best as I can. IF you really and truly want to help Rosetta get from here faster, then sign up for Ralph. The work is intermittent, and you have to do extra to log errors and tell them of the bad things that happen ... but there is where you can really make a difference ...

Oh, and I never measure a person by their credits for the simple reason that I know how I got so many and it has little to do with me ... it has to do with buying more computers. There is that wonderful parable about this very topic. Of course, I do not disrespect people just because they contribute either ... I just pay it no mind because crunching for projects is easy ...
ID: 60715 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Nothing But Idle Time

Send message
Joined: 28 Sep 05
Posts: 209
Credit: 139,545
RAC: 0
Message 60716 - Posted: 18 Apr 2009, 6:53:37 UTC

To Michael G.R. -- Nobody here wants to denegrate your ideas, we can only provide history as we remember it. If you need answers to questions that nobody in the fora can provide you should plead directly to David Kim and company.
ID: 60716 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Michael G.R.

Send message
Joined: 11 Nov 05
Posts: 264
Credit: 11,247,510
RAC: 0
Message 60720 - Posted: 18 Apr 2009, 18:20:52 UTC
Last modified: 18 Apr 2009, 18:26:51 UTC

Mod. Sense:

Thanks for the links! I'll check those out.

Paul:

We seem to be going in circles, so it's probably best to leave it at that.

Idle Time:

That's what I was starting to think of doing. Thanks for the suggestion.

To everybody:

I appreciate your time. Thanks.
ID: 60720 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3

Message boards : Number crunching : CPU Optimization, GPU utilization: so sad!



©2024 University of Washington
https://www.bakerlab.org