Any plans for a rosetta cuda client

Message boards : Number crunching : Any plans for a rosetta cuda client

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59115 - Posted: 29 Jan 2009, 0:00:39 UTC - in response to Message 59113.  

my limited understanding is that one of the primary reasons there may be difficulties in porting Rosie to PS3 (and possibly, gpu?) is the limited amount of available ram...

but i could also see where fp concerns could arise.


You want a list? :)

* I/O bandwidth ... how to keep the high speed beast fed from main memory/disk
* Memory capacity ... how big of a problem can be worked (as you said)
* FP accuracy in the sense of single vs. double precision
* FP accuracy in the sense of how closely calculations on the GPU match external calculations (this may or may not be important)
* Suitable math operations are available (I have not looked at the API so do not know the extent of the math package)
* Ability to segregate the GPU processing for the screen separate from the model

"Noise" at the end of a FP number during math operations can slowly consume the accuracy of the model to the point that you only have a few usable digits. I worked on a math model that had issues with this... you could watch the accuracy of the results deteriorate to the point where we only had 6 digits of reliable accuracy... proven with tedious calculations with a pocket calculator ...

The issue is that FP simply has no representations for most numbers in the covered number space. For example the 4381 Mainframe I used at college could not represent the value of 0.5 because the hexidecimal base FP was unable to represent this number. My experience was that as you moved up and down the number line the decision to round up or down was inconsistent for this very reason ...

I can't recall specific places, but for example 2.5 might be representable but 3.5 might not ... yet in both cases you want to round up ... yet, detecting that "edge" was impossible ...

The same problem exists in the binary based FP we use, the numbers that cannot be represented are just different.

Effectively we trade accuracy of representation for a dynamic band of numbers that is "good enough" ... but the problem remains, what to do when you add two numbers that can be represented but the result is a number that cannot be represented exactly ... this is one reason that most FP accellerators at the CPU level do the math using internal 80-bit representations and the output is truncated when reported. And this is also the reason you can see variations in the output when slight changes are made to the order of the operations where in one case the intermediate results are carried forward internally instead of being reported and then reloaded ...

But I digress ...
ID: 59115 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
The_Bad_Penguin
Avatar

Send message
Joined: 5 Jun 06
Posts: 2751
Credit: 4,271,025
RAC: 0
Message 59117 - Posted: 29 Jan 2009, 1:16:09 UTC

umm, yeah, i was just gonna say the same thing :)

good points, i just don't know what F@H does. IIRC, early (90nm?) Cell BE's had single precision on FP, and the newer (65nm?) Cell BE's have double precision. But, afaik, both are currently crunching the same wu's over there...
ID: 59117 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59125 - Posted: 29 Jan 2009, 6:42:52 UTC - in response to Message 59117.  

umm, yeah, i was just gonna say the same thing :)

good points, i just don't know what F@H does. IIRC, early (90nm?) Cell BE's had single precision on FP, and the newer (65nm?) Cell BE's have double precision. But, afaik, both are currently crunching the same wu's over there...


The Cray computers which were supercomputers that specialized in vector processing had no divide and extraordinarily lousy FP number system ... mostly to increase speed ... were widely used. If you know how bad the number system is you can compensate in your modeling.

I am just glad that GPU Grid finally has their CPU use under control ... now it is where I think it should have been all the time (below 1%) which may mean slightly lower GPU Grid throughput but should greatly increase my total processing up ... it will go up more when I add another GPU card ...

Many GPUs used to (I am not sure about the current generations) drew heavily on the type of processing that the Cray computers used to excel at ...

Interesting days and happy times ... one guy at Milky Way is working on a version for ATI cards which makes for an interesting dilemma ....
ID: 59125 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 59131 - Posted: 29 Jan 2009, 12:43:01 UTC - in response to Message 59115.  

my limited understanding is that one of the primary reasons there may be difficulties in porting Rosie to PS3 (and possibly, gpu?) is the limited amount of available ram...

but i could also see where fp concerns could arise.


You want a list? :)

* I/O bandwidth ... how to keep the high speed beast fed from main memory/disk
* Memory capacity ... how big of a problem can be worked (as you said)
* FP accuracy in the sense of single vs. double precision
* FP accuracy in the sense of how closely calculations on the GPU match external calculations (this may or may not be important)
* Suitable math operations are available (I have not looked at the API so do not know the extent of the math package)
* Ability to segregate the GPU processing for the screen separate from the model

"Noise" at the end of a FP number during math operations can slowly consume the accuracy of the model to the point that you only have a few usable digits. I worked on a math model that had issues with this... you could watch the accuracy of the results deteriorate to the point where we only had 6 digits of reliable accuracy... proven with tedious calculations with a pocket calculator ...

The issue is that FP simply has no representations for most numbers in the covered number space. For example the 4381 Mainframe I used at college could not represent the value of 0.5 because the hexidecimal base FP was unable to represent this number. My experience was that as you moved up and down the number line the decision to round up or down was inconsistent for this very reason ...

I can't recall specific places, but for example 2.5 might be representable but 3.5 might not ... yet in both cases you want to round up ... yet, detecting that "edge" was impossible ...

The same problem exists in the binary based FP we use, the numbers that cannot be represented are just different.

Effectively we trade accuracy of representation for a dynamic band of numbers that is "good enough" ... but the problem remains, what to do when you add two numbers that can be represented but the result is a number that cannot be represented exactly ... this is one reason that most FP accellerators at the CPU level do the math using internal 80-bit representations and the output is truncated when reported. And this is also the reason you can see variations in the output when slight changes are made to the order of the operations where in one case the intermediate results are carried forward internally instead of being reported and then reloaded ...

But I digress ...



ID: 59131 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
J Langley

Send message
Joined: 21 Feb 07
Posts: 2
Credit: 2,874
RAC: 0
Message 59169 - Posted: 29 Jan 2009, 22:21:19 UTC - in response to Message 59115.  

The same problem exists in the binary based FP we use, the numbers that cannot be represented are just different.


Which is why IBM added a decimal floating point unit to their POWER6 processors.

Obviously this is no panacea, since 1/3 is not accurate in base-10 either. But it's a pity there don't seem to be many BOINC projects with Linux on POWER science apps as well as Linux on x86, especially since POWER6 runs at > 4GHz.

Okay there are far more x86 Linux systems than POWER Linux, but I imagine some of the top crunchers might be interested in a Linux on POWER setup if only there were science apps to run.
ID: 59169 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59175 - Posted: 30 Jan 2009, 0:57:15 UTC

@Chilean

Um, confused?

Try this Wiki article on FP ...

But, what I mean is that when you look at the binary representations of decimal numbers ... the change of a single binary digit covers some distance on the base ten number line.

so

......|........|........|.......|......

Where the "pipe" symbol "|" shows places where successive increases in one binary digit allows us to accurately represent the decimal number "correctly" or exactly ... all those numbers in between, shown in my simple example are decimal numbers that cannot be represented because the values there fall in-between succsively higher binary values.

Essentially there are not enough bits to exactly record all decimal numbers ... so we "fudge" and pick the closest binary representation ... and thusly, in my statement I said that when you add two numbers and get a result that is not representable I meant that:

......|........|........|.......|......
...........^

If the actual result is a value at the carat, we cannot actually represent it in binary ... so, we pick the binary value to the left or right and call it close enough ... and it usually is ...

BUT, this does mean that there is always error in our calculations ... OVER TIME, especially in very iterative applications (like Rosetta) these errors can add up in ways that, ahem, invalidate the model ...

Does that help?

If it doesn't ask questions ... I got lots of time ... and little to do ...

ID: 59175 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Chilean
Avatar

Send message
Joined: 16 Oct 05
Posts: 711
Credit: 26,694,507
RAC: 0
Message 59183 - Posted: 30 Jan 2009, 11:59:05 UTC - in response to Message 59175.  

@Chilean

Um, confused?

Try this Wiki article on FP ...

But, what I mean is that when you look at the binary representations of decimal numbers ... the change of a single binary digit covers some distance on the base ten number line.

so

......|........|........|.......|......

Where the "pipe" symbol "|" shows places where successive increases in one binary digit allows us to accurately represent the decimal number "correctly" or exactly ... all those numbers in between, shown in my simple example are decimal numbers that cannot be represented because the values there fall in-between succsively higher binary values.

Essentially there are not enough bits to exactly record all decimal numbers ... so we "fudge" and pick the closest binary representation ... and thusly, in my statement I said that when you add two numbers and get a result that is not representable I meant that:

......|........|........|.......|......
...........^

If the actual result is a value at the carat, we cannot actually represent it in binary ... so, we pick the binary value to the left or right and call it close enough ... and it usually is ...

BUT, this does mean that there is always error in our calculations ... OVER TIME, especially in very iterative applications (like Rosetta) these errors can add up in ways that, ahem, invalidate the model ...

Does that help?

If it doesn't ask questions ... I got lots of time ... and little to do ...



And what makes it impossible for it to "land" or choose the exact number?
ID: 59183 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dagorath

Send message
Joined: 20 Apr 06
Posts: 32
Credit: 29,176
RAC: 0
Message 59188 - Posted: 30 Jan 2009, 15:12:39 UTC - in response to Message 59183.  
Last modified: 30 Jan 2009, 15:14:25 UTC

Chilean wrote:
And what makes it impossible for it to "land" or choose the exact number?


In a nutshell, there aren't enough bits in a computer to represent the fractional portion of every imaginable number. Same applies to the whole number portion.

Remember that in spite of the fact that computers work very fast they have limitations that we humans do not have. When we want to write the number pi, for example, to 100 decimal places we just go ahead and do it because we have the option to use as much space on our paper as we need. If we run out of room on one sheet we just continue on another sheet and staple the 2 together. It isn't that way with computers, they don't have as much "space" as they want on their "paper". Their "paper" is their physical RAM and the registers inside their CPU. RAM and registers consist of a finite number of memory locations. So, when you design a computer, you are forced by finite RAM and finite registers, to limit the fractional portion of numbers to a finite number of bits. By limiting the fractional portion to, for example, 64 bits, you lose the ability to represent all the fractional parts that require 65 or more bits.

OK, but you know that computers have calculated the value of pi to more than 100 decimal places. Well, it turns out that in spite of the finite number of bits in the hardware, programmers can do a few tricks in the software to sort of (but not really) give computers more bits than they actually have. Unfortunately, using those tricks slows down the computation speed and the more "pretend" bits you give them the slower things go.

That's a very quick and simple answer to your question, as simple as I can get without going into long and complicated examples involving binary arithmetic.
BOINC FAQ Service
Official BOINC wiki
Installing BOINC on Linux
ID: 59188 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59190 - Posted: 30 Jan 2009, 19:53:44 UTC - in response to Message 59183.  

And what makes it impossible for it to "land" or choose the exact number?


Though certainly not impossible, it is unlikely ... as Dagorath said there are only a finite number of bits and an infinite number of values to be represented.

It is like the paradox of the Mandelbrot set where you have a finite contained space which contains an object which has a boarder of infinite length ... the deeper you drill down on the edge the more detail and the more edge you find ...

Since I did not have a infinite width to post ... I only put in a few dots between values. There are actually an infinite number of values between each representable value.

Now, do you want to get into the issues of different sizes of infinite sets? :)

There is something about cans of worms ... we need a bigger can now ...
ID: 59190 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 5 Jan 06
Posts: 1896
Credit: 9,954,441
RAC: 28,792
Message 59199 - Posted: 31 Jan 2009, 12:00:55 UTC - in response to Message 59188.  

Chilean wrote:
And what makes it impossible for it to "land" or choose the exact number?


In a nutshell, there aren't enough bits in a computer to represent the fractional portion of every imaginable number. Same applies to the whole number portion.

Remember that in spite of the fact that computers work very fast they have limitations that we humans do not have. When we want to write the number pi, for example, to 100 decimal places we just go ahead and do it because we have the option to use as much space on our paper as we need. If we run out of room on one sheet we just continue on another sheet and staple the 2 together. It isn't that way with computers, they don't have as much "space" as they want on their "paper". Their "paper" is their physical RAM and the registers inside their CPU. RAM and registers consist of a finite number of memory locations. So, when you design a computer, you are forced by finite RAM and finite registers, to limit the fractional portion of numbers to a finite number of bits. By limiting the fractional portion to, for example, 64 bits, you lose the ability to represent all the fractional parts that require 65 or more bits.

OK, but you know that computers have calculated the value of pi to more than 100 decimal places. Well, it turns out that in spite of the finite number of bits in the hardware, programmers can do a few tricks in the software to sort of (but not really) give computers more bits than they actually have. Unfortunately, using those tricks slows down the computation speed and the more "pretend" bits you give them the slower things go.

That's a very quick and simple answer to your question, as simple as I can get without going into long and complicated examples involving binary arithmetic.


So is a 64 bit OS better than a 32 bit OS at this? I know most of us use 32 bit OS's but maybe IF, and yes I wrote it big, a 64 bit OS can handle more, they could begin porting it to that and wait for us users to catch up.
ID: 59199 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Paul D. Buck

Send message
Joined: 17 Sep 05
Posts: 815
Credit: 1,812,737
RAC: 0
Message 59202 - Posted: 31 Jan 2009, 16:07:34 UTC - in response to Message 59199.  

So is a 64 bit OS better than a 32 bit OS at this? I know most of us use 32 bit OS's but maybe IF, and yes I wrote it big, a 64 bit OS can handle more, they could begin porting it to that and wait for us users to catch up.


No ...

The edge that a 64 bit OS gives you is that it allows the direct addressing of a much larger memory space. That is all ...

As far as numbers go, that is set by the programming language and single precision values are 32 bits wide giving a accuracy of 6-8 digits (usually) while double precision is 64 bits and gives 10-16 digits of precision (both of the base ten number represented).

Where 64 bit OS really give an advantage is in large database applications where the direct addressing of the data space is accommodated ...
ID: 59202 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Any plans for a rosetta cuda client



©2025 University of Washington
https://www.bakerlab.org