Rosetta@home using AVX / AVX2 ?

Message boards : Number crunching : Rosetta@home using AVX / AVX2 ?

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next

AuthorMessage
h

Send message
Joined: 30 Nov 08
Posts: 1
Credit: 51,212
RAC: 0
Message 79422 - Posted: 19 Jan 2016, 7:14:30 UTC

This is my first post here.

The fallout opinion is that the code of rosetta can't go open, simply because there is comparison with other such software mostly proprietary so others will exploit in known ways openness of this code or other way-expose some stolen parts or just ideas which may be covered with patents not owned.

Money lead the way and we are just poor volunteers.

Because this launch is for free. Developers of rosseta not care about efficiency.

Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already.

I want to raise some thumb about the behavior of the watch dog timer in that application (3.65)

No heartbeat from core client for 30 sec - exiting

This message cause the Clean Energy Project 2 of world community grid to restart application and nullify time elapsed for example after 12 hours of wasting electricity. I quit from this project. It is just not fair. I must say that I am not for points and badges and other virtual goodies for Pavlov's pet but if project is inefficient just tell the people that this is it and nothing can be done. In which I doubt.

Here, at least, for fair play, the elapsed time is being kept correctly.
But this, in no way means that time is wasted efficiently. The volunteers processors may just produce huge mass of random numbers and not useful results.

So what. Anytime you can switch to SETI and expect close encounter of third kind.
ID: 79422 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 79424 - Posted: 20 Jan 2016, 9:19:17 UTC - in response to Message 79422.  

Because this launch is for free. Developers of rosseta not care about efficiency.


Thanks to threads and discussions about optimization, now i'm convinced that they haven't adequate resources (and, perhaps, the skills) to optimize it.
So, yes, open source code may be a solution
ID: 79424 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 79611 - Posted: 24 Feb 2016, 9:10:39 UTC - in response to Message 79422.  

Developers of rosseta not care about efficiency.
Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already.
This message cause the Clean Energy Project 2 of world community grid to restart application and nullify time elapsed for example after 12 hours of wasting electricity.


This is the point.
Admins say that the computational power is "enough" and that they are not sure that optimizations of the code (64 bit, SSEx, etc) give advantage to project.
But they are using OUR electricity and they have to use it as best as can.
If rsj5 says that with simple 64 bit recompilation we have 10/15% plus, they have to consider seriously this change.
I think it's a kind of respect for the volunteers.
ID: 79611 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1831
Credit: 119,526,853
RAC: 5,737
Message 79613 - Posted: 24 Feb 2016, 9:52:31 UTC - in response to Message 79422.  
Last modified: 24 Feb 2016, 9:54:26 UTC

This is my first post here.

The fallout opinion is that the code of rosetta can't go open, simply because there is comparison with other such software mostly proprietary so others will exploit in known ways openness of this code or other way-expose some stolen parts or just ideas which may be covered with patents not owned.

Money lead the way and we are just poor volunteers.

That's not the reason - it's not open source because it is a valuable asset that is sold commercially which provides an income stream. It also probably helps with controlling the code-base as they control who can input into the software.


Simple look to executable it is just renamed x64 bit, but in reality is just 32 bit as some volunteers mentioned already.

That's because BOINC requires a 64-bit version for 64-bit platforms, so the 32-bit version is in a wrapper.
ID: 79613 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 79618 - Posted: 24 Feb 2016, 14:31:14 UTC
Last modified: 24 Feb 2016, 14:33:42 UTC

as with the discussions in this thread:
CERN Engineer Details AMD Zen Processor Confirming 32 Core Implementation, SMT
https://boinc.bakerlab.org/forum_thread.php?id=6790

i'm thinking that cpu manufacturers are increasingly taking the 'short cuts' and simply deliver more 'cores' and 'pushing' all the hard work of performance / optimization to the software developers to use very specific and very limited processor features such as CUDA/Open CL that requires vectorised processing on very simplified cores.

it used to be that the top line cpu manufacturers aim to deliver better performing CPUs (deeper and better instruction level parallelism, more intelligent out-of-order execution etc) but this stance has changed drastically to an extent that manufacturers simply build *more simplified cores* that provides very limited specialised functionality (e.g. vector processing)

many of the higher ends ones are championing 'special' vector processing e.g. opencl/cuda/hsa etc. these notably includes AMD and Nvidia.

little effort is spend to even attempt 'deeper and better instruction level parallelism, more intelligent out-of-order execution etc' as it requires *much more* effort on the part of CPU designers and manufacturers

that said all those vector processing / SIMD / OpenCL / CUDA / HSA / AVX etc etc is not necessary 'more efficient' they requires huge amount of power / energy to run in particular the high end GPUs. And they simply shift the responsibility of optimization to software / application developers, while they get away selling more 'cores'
ID: 79618 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 79621 - Posted: 24 Feb 2016, 17:21:11 UTC - in response to Message 79618.  

little effort is spend to even attempt 'deeper and better instruction level parallelism, more intelligent out-of-order execution etc' as it requires *much more* effort on the part of CPU designers and manufacturers....

that said all those vector processing / SIMD / OpenCL / CUDA / HSA / AVX etc etc is not necessary 'more efficient' they requires huge amount of power / energy to run in particular the high end GPUs.


You think at future, with ARM cores into x86 cpu or FPGA tech into Xeon processors.
But SSEx extensions exist NOW and run in modern entry-level cpu.
We are not speaking high-end GPUs (we understand that it's impossible to have gpu code for rosetta and Opencl/Cuda is a "dream"), but cpus may be used at the max!!
ID: 79621 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 79622 - Posted: 25 Feb 2016, 0:44:58 UTC - in response to Message 79621.  
Last modified: 25 Feb 2016, 1:11:54 UTC



You think at future, with ARM cores into x86 cpu or FPGA tech into Xeon processors.
But SSEx extensions exist NOW and run in modern entry-level cpu.
We are not speaking high-end GPUs (we understand that it's impossible to have gpu code for rosetta and Opencl/Cuda is a "dream"), but cpus may be used at the max!!


a recent processor on the now rather hotly discussed Intel compute stick
http://www.engadget.com/2016/01/22/intel-compute-stick-2016-review/

did away with even SSEx, yup no SSE, just more cores & 64 bits
http://ark.intel.com/products/87383/Intel-Atom-x5-Z8300-Processor-2M-Cache-up-to-1_84-GHz

and that's a latest model available today

i won't be surprised if at all if Intel adopts a similar approach & introduce those GPU style 'co-processors' that probably use say OpenCL vectorised processing, i.e. 1000s simplified of 'vector cores' (that does basic maths) but won't address general programs

along with AMD, Nvidia and the rest, they would claim that their approach can achieve teraflops, petaflops on the gpu but only very basic highly limited functionality compute that only address very specific use cases

it is useless to have 100,000 vector processors/cores if a job at hand cannot be vectorized due to various dependencies within the algorithms/codes, it can only run on 1 of those 100,000 cores or worse case it can't be run due to the limited functionality on those vector processors

a simple function
f(x) = f(f(x-1))

would defeat the means to parallelize it as the results depends on the output of a previous iteration.
ID: 79622 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
sgaboinc

Send message
Joined: 2 Apr 14
Posts: 282
Credit: 208,966
RAC: 0
Message 79625 - Posted: 25 Feb 2016, 2:50:40 UTC

i won't be surprised if at all if Intel adopts a similar approach & introduce those GPU style 'co-processors' that probably use say OpenCL vectorised processing, i.e. 1000s simplified of 'vector cores' (that does basic maths) but won't address general programs


actually u don't really need to wait for that, the future is here today

http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html
http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html
http://spectrum.ieee.org/semiconductors/processors/what-intels-xeon-phi-coprocessor-means-for-the-future-of-supercomputing
https://en.wikipedia.org/wiki/Xeon_Phi
ID: 79625 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 79626 - Posted: 25 Feb 2016, 8:21:03 UTC - in response to Message 79625.  

actually u don't really need to wait for that, the future is here today

http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html


Yeap, Phy it's an incredible co-processor, but i think that, if admins want to use it, they have to re-write large part of the code.
I'm speaking to add support, for example, to x64 and SSEx (with SIMPLE recompilation of source) and see what happens: largely test this new app on Ralph, debug it, etc.
First tests, last year, demonstrated some improvements....
ID: 79626 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 79627 - Posted: 25 Feb 2016, 8:46:33 UTC - in response to Message 79622.  

a simple function
f(x) = f(f(x-1))

would defeat the means to parallelize it as the results depends on the output of a previous iteration.


We know the problems of parallelization of the code and we know that it's (almost) impossible on Rosetta.
We are discussing about "little" optimization.

ID: 79627 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 79835 - Posted: 2 Apr 2016, 18:10:46 UTC - in response to Message 77856.  

ID: 79835 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe
Avatar

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 80432 - Posted: 25 Jul 2016, 21:18:45 UTC

Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges?

I'm pretty sure everyone is pretty sick of it being brought up again and again as am I sick and tired of waiting for a simple, definite answer from the people who are calling the shots...

Answer A: "We're working at it and here are the preliminary results..."
Answer B: "No can do."

Not that I'm thinking about leaving rosetta@home but I'm thinking about "emotional disinvestment".

There is a link in the navbar that says "Community". Let's face it, there is no such thing.

That's an 'A' for scientific effort and an 'F' for community work... just close the forum and set up a bug tracker.
ID: 80432 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 80433 - Posted: 26 Jul 2016, 8:18:09 UTC - in response to Message 80432.  

Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges?

I think that if we "see something" we see it at the end of CASP

Answer A: "We're working at it and here are the preliminary results..."
Answer B: "No can do."

Answer C: "We don't care"


ID: 80433 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,001,518
RAC: 6,291
Message 80449 - Posted: 30 Jul 2016, 15:30:44 UTC - in response to Message 80433.  
Last modified: 30 Jul 2016, 15:32:50 UTC

Just curious... is there any progress worth speaking of? Any decision making? Any kind of code refactoring or optimization for the worst kludges?

I think that if we "see something" we see it at the end of CASP

Answer A: "We're working at it and here are the preliminary results..."
Answer B: "No can do."

Answer C: "We don't care"



I think it is:

Answer D: The project leadership is pushing new algorithm development while the server infrastructure is creaking like a 4-story mobile home.
https://d.justpo.st/media/images/2013/07/66f81a0a59d1786af2e10027746e2873.jpg

They should carefully evaluate the role/responsibilities of the top "project manager" first. I suspect there is some confusion about role, responsibilities and goals.

-------------

If they do not stabilize the serer infrastructure, Rosetta could collapse under the weight of its own success. Then compute throughput is going to drop to zero ... regardless of how good their CASP development has been. 8-)

The last time I looked at their server hardware configuration (assuming that their description was relatively current), it looked like they would have disk IO bottle necks on their server and memory size problems on their client network machines.

I saw that KRYPTON indicated there is some activity addressing the aging equipment. Last time I looked at it, I guessed that something like $50k in disk/memory upgrades would make a difference.


--------------


As to AVX/AVX2?

David hooked me to 2 developers.

June 13th:

Developer "F": commenting on my recommendation for homogeneous coordinates ...

"Storing 3d cartesian coordinates as homogenous coordinates is well established practice. For example, Eigen::Geometry using homogenous coordinates in geometric expressions to support SIMD parallelism."

"Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring. I'd be opposed to changes that broadly affect the codebase outside of the numeric namespace, it would be much better to arrive at a solution that offers a simple typedef to replace xyxVector<Real> that offers a SIMD-compatible implementation."



---- I gave them the Vtune profiles which showed the hot instruction sequences and hand modified the instruction sequences to show how they shrank when using AVX. I am looking for a C++ programmer to help me with the "template" modifications. Nothing more from Developer "F".



Developer "L": after I replied to David with: "If you do find an interested developer, .... grumble, grumble, grumble, ...." Developer "L" replied ...

"I am in fact very interested in vectorization and would like to chat with you about it soon; I'm currently swamped with a few deadlines and projects, but anticipate that I'll have quite a bit more free time soon."

---- I have not heard back from Developer "L" and probably need to ping him.



--------------
I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes.
ID: 80449 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 80450 - Posted: 30 Jul 2016, 18:43:20 UTC

Awesome, rjs5. Awesome work.
ID: 80450 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sid Celery

Send message
Joined: 11 Feb 08
Posts: 2121
Credit: 41,179,074
RAC: 11,480
Message 80451 - Posted: 31 Jul 2016, 3:23:27 UTC - in response to Message 80449.  

They should carefully evaluate the role/responsibilities of the top "project manager" first. I suspect there is some confusion about role, responsibilities and goals.

-------------

If they do not stabilize the server infrastructure, Rosetta could collapse under the weight of its own success. Then compute throughput is going to drop to zero... regardless of how good their CASP development has been. 8-)

--------------

I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes.

I'm no coder (far from it) but I've worked with a few, good and less good. A good one is worth their weight in gold. If anyone can wangle an on-site visit for a couple of days, they should commit to it. Keep plugging away.
ID: 80451 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 80452 - Posted: 31 Jul 2016, 18:30:08 UTC - in response to Message 80449.  
Last modified: 31 Jul 2016, 19:08:36 UTC

Answer D: The project leadership is pushing new algorithm development while the server infrastructure is creaking like a 4-story mobile home.
https://d.justpo.st/media/images/2013/07/66f81a0a59d1786af2e10027746e2873.jpg

:-O

I saw that KRYPTON indicated there is some activity addressing the aging equipment. Last time I looked at it, I guessed that something like $50k in disk/memory upgrades would make a difference.

Waiting for info about donations/crowdfounding

"Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring.

I don't understand "F". He want to see results BEFORE introducing modifications??

I am now retired and have been decompressing. I have fixed all the family, friends and neighbors computers so maybe it is time to revisit Rosetta vector changes.

Family is the most important thing, i think
ID: 80452 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [VENETO] boboviz

Send message
Joined: 1 Dec 05
Posts: 1994
Credit: 9,555,377
RAC: 6,312
Message 80453 - Posted: 31 Jul 2016, 19:08:09 UTC - in response to Message 80450.  

Awesome, rjs5. Awesome work.


+1

P.S. This thread was opened Oct 2014, i hope we see "something new" before 2020 :-P

ID: 80453 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dr. Merkwürdigliebe
Avatar

Send message
Joined: 5 Dec 10
Posts: 81
Credit: 2,657,273
RAC: 0
Message 80454 - Posted: 31 Jul 2016, 20:00:45 UTC - in response to Message 80453.  

Awesome, rjs5. Awesome work.


+1

P.S. This thread was opened Oct 2014, i hope we see "something new" before 2020 :-P

OMG...Tempus fugit

I could have sworn it has been only a few months.

Probable cause for the delay: The "NIH syndrome" or "We have always done it that way!"
ID: 80454 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
rjs5

Send message
Joined: 22 Nov 10
Posts: 273
Credit: 23,001,518
RAC: 6,291
Message 80459 - Posted: 1 Aug 2016, 20:53:05 UTC - in response to Message 80452.  
Last modified: 1 Aug 2016, 20:59:01 UTC

"Without profiling data I'd be very skeptical of claims of performance improvement in the range he's suggesting. I'd want to see an oprofile run showing that these vector arithmetic is producing hot instructions before undertaking any major refactoring.

I don't understand "F". He want to see results BEFORE introducing modifications??



I think that Developer "F" was talking about needing real data for a major rewrite ... "major refactoring". I think that "F" agrees with me about "homogeneous coordinates" being a sensible change. There are MANY things that can be done to significantly improve performance without a major rewrite.


The first change I talked about was introducing "homogeneous coordinates". This is very nice because, it does not "really" change the "project code". You can introduce the C++ TEMPLATE typedef changes, recompile and you should get the EXACT SAME ANSWER with the new compile options.


The second place where substantial improvement can be accomplished with little effort is by upgrading the server to steer optimized applications to target crunchers. Build optimized apps and target machine capabilities.


8-)
ID: 80459 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 9 · Next

Message boards : Number crunching : Rosetta@home using AVX / AVX2 ?



©2024 University of Washington
https://www.bakerlab.org