Discussion of the new credit systen (2)

Message boards : Number crunching : Discussion of the new credit systen (2)

To post messages, you must log in.

Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

AuthorMessage
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 26312 - Posted: 7 Sep 2006, 22:17:20 UTC

Dr. = Developer ?
ID: 26312 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 10,544
Message 26313 - Posted: 7 Sep 2006, 22:22:43 UTC - in response to Message 26295.  
Last modified: 7 Sep 2006, 22:30:36 UTC

I know what type of Benchmarks Power Macs can have, because I reviewed Power Macs credits under the old system. And, I know what type of credits they were getting. I know the applications do not use their potential more efficiently.

The problem is, as David Kim mentioned, it's much easier for someone to tweak or change a compiler to optimise the code in a benchmark which is a very small, simple applicaiton, than it is in a complex program such as Rosetta. It might simply not be possible to do with the Rosetta code. For example, a benchmark could be made to get an incredible FPU benchmark score on the Cell (co)processor as used in the PS3, but getting the Rosetta code to run efficiently on it is another matter, partly due to its tiny cache. I'm sure if there is a compiler that the lab can use to optimise Rosetta to make better use of the PPC architecture then I'm sure they will use it. It's certainly not as straightforward as optimising a benchmark that just counts integer/fpu performance on a very limited scale (and quite possibly not very accurately).

Because of this, you can't assume that because PPC based macs were getting benchmarks similar to x86 (intel/amd) chips with some optimised BOINC clients that the same is possible with the Rosetta code. If anyone is willing to try then the lab have said they'll make a version of Rosetta available for testing I believe. The other problem with PPC is that it has been discontinued (at least in macs...) so there is much less incentive to place resources on optimising the code for it.

I think I do not need to recommend a compiler for the Power Mac. The fact is the developers know what is needed to be used . They have talked about the "optimizer" that could solve the problem. They know.

Where have you read this?

cheers
Danny
ID: 26313 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile David E K
Volunteer moderator
Project administrator
Project developer
Project scientist

Send message
Joined: 1 Jul 05
Posts: 1018
Credit: 4,334,829
RAC: 0
Message 26314 - Posted: 7 Sep 2006, 22:33:26 UTC

Jose,

I agree. It would be nice to optimize for Mac PPC but it is not trivial and there are no altivec people in the lab. In an ideal world, we'd have rosetta optimized for all platforms we support, at the code and compiler level. We do our best with the resources we have. For example, in our recent annual rosetta meeting we were lucky to have a breakout session where Ross Walker from the San Diego Supercomputer Center (SDSC) talked about code optimization. It was difficult enough to transition over to windows before the start of the project last year (VS2005 helped because optimization with the previous version was buggy).

If Mac didn't decide to go with intel, I would have pressed harder for PPC optimization.
ID: 26314 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dcdc

Send message
Joined: 3 Nov 05
Posts: 1832
Credit: 119,688,048
RAC: 10,544
Message 26315 - Posted: 7 Sep 2006, 22:35:42 UTC - in response to Message 26310.  
Last modified: 7 Sep 2006, 22:36:45 UTC

To be honest with you, I don't think the Linux issue has been resolved. As long as the perception that the current credit system still under evaluate the performance under Linux persists , the issue is there . Perception is many times more powerful than reality and that is why I would like to see reality and perception to be one and the same.

I wish I know how to put an end to that. That is why I would like to see a complete statistical analysis of the issue.

I posted here with what I think is the info we need to be able to see what the optimal configurations are with regard to CPU and OS. It'd be useful to have an accurate list showing the performance of different configs (the main factor bening the CPU I expect). It'd be a big help to those buing new crunchers as you can then make an informed decision, for example to go for core or x2, and how worthwhile things like cache and RAM are.

However, as I posted above, finding out that an OS or hardware config isn't running the code as quickly as we'd like, and making it run faster are two very different things!
ID: 26315 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ananas

Send message
Joined: 1 Jan 06
Posts: 232
Credit: 752,471
RAC: 0
Message 26316 - Posted: 7 Sep 2006, 22:46:49 UTC
Last modified: 7 Sep 2006, 22:49:36 UTC

Define one not too long running reference workunit, post the required parameters to run that WU on Rosetta for maybe 2 hours and ask people to send the results to you (for validation) together with the BIOS, hard- and software information you need plus the real runtime. We had that in other DC projects and many people sent results.

If it is possible to force Rosetta to use a specific start value instead of the random seed, this option should of course be used.
ID: 26316 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Whl.

Send message
Joined: 29 Dec 05
Posts: 203
Credit: 275,802
RAC: 0
Message 26317 - Posted: 7 Sep 2006, 22:49:28 UTC - in response to Message 26301.  


P.S Sorry, it seems anything I say or post stirs up some poster/s here.

One thing that really annoys me about your posts, is the size of those GIF files.
It is a real pain in the arse scrolling all over the place to read everybody elses posts.

ID: 26317 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile dgnuff
Avatar

Send message
Joined: 1 Nov 05
Posts: 350
Credit: 24,773,605
RAC: 0
Message 26319 - Posted: 8 Sep 2006, 0:06:47 UTC - in response to Message 26270.  
Last modified: 8 Sep 2006, 0:16:30 UTC

-- Deleted -- Mats already addressed the issue far better than me.


ID: 26319 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Ingleside

Send message
Joined: 25 Sep 05
Posts: 107
Credit: 1,514,472
RAC: 0
Message 26327 - Posted: 8 Sep 2006, 1:40:28 UTC - in response to Message 26310.  

I believe the optimization that is required and is known to solve the Mac issue should be implemented .

To be honest with you, I don't think the Linux issue has been resolved. As long as the perception that the current credit system still under evaluate the performance under Linux persists , the issue is there . Perception is many times more powerful than reality and that is why I would like to see reality and perception to be one and the same.

I wish I know how to put an end to that. That is why I would like to see a complete statistical analysis of the issue.

Just in case: I do not do Linux ( too complicated for me ) and I do not belong to the Mac cult. :) I just don't like small sample statistics and conclusions based on small sample statistics. I don't like the use of " it seems to have been solved " in lieu of " it has been solved " . The compliance auditor that still lurks in me is trying to get answers.

Alas it seems that my search for answers in an attempt to find solutions irritate some people. Worst, there are some people that do not understand why , if I left the project, I am still trying to look for answers. That is too complicated to answer here.

Self-Exile , even though justified , is a weird state of being. Suffice to say I cared about this project and I do still care.

That said. I think I should stop bothering this thread and let all that want and or care still look for answers and ways to make the system fair and to make the system attractive to all kind and type of crunchers. (Alas something that is not now.) keep doing their work.

Pax

Well, I'm by no means a good statisticians (that doesn't look right, but too tired trying to fix it), but, let's still play a little with numbers...

Now, as I've already posted, if 10% is trying to cheat by artificially inflated claims, you can setup a table like this:

Overclaim - increase in average granted credit per model:
5x - 40%
4x - 30%
3x - 20%
2x - 10%
1.5x - 5%
1.1x - 1%


So, since Linux is just Underclaims, let's just expand this table a little. Going by BoincSynergy, there's 22632 Linux/Mac-computers in Rosetta, of total this is 12.7%. Note, no idea how many of the computers is actually active or not, but let's still use 12.7%.

Underclaim - decrease in average granted credit per model:
10% - 1.27%
20% - 2.54%
30% - 3.81%
40% - 5.08%
50% - 6.35%
60% - 7.62%
70% - 8.89%
80% - 10.16%
90% - 11.43%
100% - 12.7%

Meaning, even if all Linux/Mac-users claims zero credit for all their work, they'll only influence the average granted credit with 12.7%. Now, not sure how much more windows is claiming than Linux/Mac, but would guess on less than 2x, meaning the influence is less than 6.35%

With some crunchers running "optimized" clients, they'll trying to increase average granted, and Linux/Mac unoptimized will try to decrease average granted. Does they cancel eachother out, possibly, but can't guarantee this.


Anyway, since the new credit-system is the average of all results returned for a specific wu-type, the only real chance someone trying to get significant boost from a high claim is to be one of the 1st. to return. This in practice would mean running with 0.001 days cache-size, and 1h run-preference. A Linux/Mac-user can of course also try this, but if they're unlucky and is #1, they'll get much less credit than if they're #2 to return...

In practice, appart for being the Lucky/Unlucky #1 to return, the granted credit will quickly average-away. So, in practice, there shouldn't be any significant (yes still unspecific) difference between platforms.

That Mac is really slow crunching is a different problem, and isn't due to the BOINC-benchmark.


But, being a little more specific at the end, remember, if all windows-users has returned all their wu, and by some unlucky strike of fate all Linux/Mac-users returns their result afterwards, the 1st. linux/mac-result will get the same granted as average for all the windows-users, while for the last linux/mac-result returned, you'll at the absolute worst get 12.7% less than the average for windows-users. But, remembering the table, this is if all linux/mac-users claimed zero credit, more realistically would expect windows is less than 2x higher benchmark, meaning the absolute worst-off is 6.35% lower for the last result.

The other way around, all linux/mac returned before any windows-results, will be much worse, since the last windows-user will get roughly 2x (again not sure how much higher windows-benchmark is), but wouldn't expect it due to the users trying to get their credit-boost at the start...


In any case, delaying crediting till 1000 results or something is in, should remove any large startup-spikes...

"I make so many mistakes. But then just think of all the mistakes I don't make, although I might."
ID: 26327 · Rating: 1 · rate: Rate + / Rate - Report as offensive
SekeRob

Send message
Joined: 7 Sep 06
Posts: 35
Credit: 19,984
RAC: 0
Message 26371 - Posted: 8 Sep 2006, 14:21:01 UTC - in response to Message 24684.  

Status report

August, 23rd

The new credit system went live.

August, 24th, 11h23 UTC

Currently all results returned are not granted credit but are set to "pending". This is due to the fact that the validator stopped working and has nothing to do with withholding credits for whatever reason.

[edited because initial assumptions were wrong]


I came specially over to crunch a few and see for myself how the new credit system works.....well first impressions are lasting impressions.....u must have nailed it right on the head.....getting credit for my Stock Machine on Stock WOS on my Stock BOINC 5.6.0 and the claim worked out 0.8% lower from what u computed the work was worth....totally aligned with the BOINC credit principles. Love it.

ciao

Coelum Non Animum Mutant, Qui Trans Mare Currunt
ID: 26371 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mod.DE
Volunteer moderator

Send message
Joined: 23 Aug 06
Posts: 78
Credit: 0
RAC: 0
Message 26376 - Posted: 8 Sep 2006, 15:20:42 UTC - in response to Message 26371.  

Hi Sekerob,

Thanks for your nice words and encouragement. I have moved your post to the discussion thread, since the sticky thread shall be not used for dicussions. I hope you don't mind.
I am a forum moderator! Am I?
ID: 26376 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 26380 - Posted: 8 Sep 2006, 17:01:01 UTC

I suppose I explain that "noticable" in my post some ten or so posts ago is equivalent to "not greatly different" or "+/- 10%". In a post in the "How much credit per hour is possible?" I showed my measurements of credit per hour per GHz as around 6.0 - 6.7 or some such. There is abour 10-12% difference between these, but that's on a relatively small set of samples, so statistically they aren't the best of numbers. I haven't got my statistics spreadsheet available here (I'm in California, not in England where my other machine happens to be), so I can't give you more detailed information at this point.

But the overall general results I have seen is that (with the new credit system) the performance per core per clock-frequency is similar enough to not say that Windows or Linux is significantly different. As tralala pointed out (and I have in another post) pointed out that Linux benchmarks are quite different from the Windows ones, but the code in Rosetta is pretty similar between Linux and Windows, so the performance difference will be small.

--
Mats
ID: 26380 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Jose

Send message
Joined: 28 Mar 06
Posts: 820
Credit: 48,297
RAC: 0
Message 26381 - Posted: 8 Sep 2006, 18:18:11 UTC - in response to Message 26380.  

I suppose I explain that "noticable" in my post some ten or so posts ago is equivalent to "not greatly different" or "+/- 10%". In a post in the "How much credit per hour is possible?" I showed my measurements of credit per hour per GHz as around 6.0 - 6.7 or some such. There is abour 10-12% difference between these, but that's on a relatively small set of samples, so statistically they aren't the best of numbers. I haven't got my statistics spreadsheet available here (I'm in California, not in England where my other machine happens to be), so I can't give you more detailed information at this point.

But the overall general results I have seen is that (with the new credit system) the performance per core per clock-frequency is similar enough to not say that Windows or Linux is significantly different. As tralala pointed out (and I have in another post) pointed out that Linux benchmarks are quite different from the Windows ones, but the code in Rosetta is pretty similar between Linux and Windows, so the performance difference will be small.

--
Mats


10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable.
ID: 26381 · Rating: 0 · rate: Rate + / Rate - Report as offensive
casio7131

Send message
Joined: 10 Oct 05
Posts: 35
Credit: 149,748
RAC: 0
Message 26419 - Posted: 9 Sep 2006, 4:12:53 UTC - in response to Message 26381.  

10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable.


i don't think the 10% that Mats is talking about is a significance level (in the sense of a statistical test), but is the difference in credit achieved between windows and linux.
ID: 26419 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 26424 - Posted: 9 Sep 2006, 5:46:31 UTC - in response to Message 26419.  

10% is twice what is considered the level for statistical significance . An under count of 12% basically means one in eight doesnt get counted. If those are your results , they are in no way minimal. That is a level of undercounting, under representation that is not acceptable.


i don't think the 10% that Mats is talking about is a significance level (in the sense of a statistical test), but is the difference in credit achieved between windows and linux.


There's ten percent (or so) difference between the highest average and the lowest average of my machines. If I average those numbers themselves, the spread is +/- 5% (or so). I'm currently working from memory (as described in the previous post). I have four Linux machines and two Windows machines, one of which is a laptop. None of my machines have exactly the same configuration when it comes to processor type and sockets.

My fastest machine (per clockspeed) is a Linux machine, so Windows certainly doesn't get a HIGHER result. In fact, I think Windows is actually the slowest machine (but it's also a socket 754 processor, which none of the others are - but I can't say if that's part of the reason why it's lower credit, or just simply because the Windows version is slower - or just that machine isn't working quite as fast for some other reason...)

--
Mats

ID: 26424 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bad_Wolf

Send message
Joined: 31 Jul 06
Posts: 4
Credit: 191,553
RAC: 0
Message 29435 - Posted: 16 Oct 2006, 6:58:30 UTC
Last modified: 16 Oct 2006, 7:19:23 UTC

Just my 2 cents opinion:

If real speed is the problem, why don't add a little 10 secs benchmark before the initialization? In this way , with the WU's result and times, will come the real base to calculate the math done and the points to give.

[edit]
Another way could be an average speed for every single class of CPU.
For each host you have the CPU used and the BOINC benchmark result... it shouldn't be difficoult to calculate such average...
[/edit]
ID: 29435 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 29462 - Posted: 16 Oct 2006, 13:58:43 UTC - in response to Message 29435.  

Just my 2 cents opinion:

If real speed is the problem, why don't add a little 10 secs benchmark before the initialization? In this way , with the WU's result and times, will come the real base to calculate the math done and the points to give.

[edit]
Another way could be an average speed for every single class of CPU.
For each host you have the CPU used and the BOINC benchmark result... it shouldn't be difficoult to calculate such average...
[/edit]



Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower.

It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time].

Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think.

--
Mats
ID: 29462 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Bad_Wolf

Send message
Joined: 31 Jul 06
Posts: 4
Credit: 191,553
RAC: 0
Message 29473 - Posted: 16 Oct 2006, 18:46:13 UTC - in response to Message 29462.  


Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower.

It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time].


Hosts' data have the number of CPUs installed, and having a big (because it's BIG) number of hosts in the database probably the average wouldn't be so far from reality


Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think.

--
Mats


Maybe i didn't explain myself, sorry, english is my second language.
I meant to ADD a benchmark (maybe a simple loop increasing a variable for 10 secs or less) before starting to crunch the data.

BadWolf
ID: 29473 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Mats Petersson

Send message
Joined: 29 Sep 05
Posts: 225
Credit: 951,788
RAC: 0
Message 29511 - Posted: 17 Oct 2006, 12:33:53 UTC - in response to Message 29473.  


Except that it's hard to determine from the information available to the application all the necessary parameters. For example, an Athlon 3800+ may be a single or dual core model - running at 2.4 or 2.0GHz - the dual core would thereofore per core be around 20% slower.

It's possible to find out what cache-size the processor is, but finding out how fast the memory is, and how much effect the speed of the memory has is much harder [as that partly depends on what else is going on in the machine at the same time].


Hosts' data have the number of CPUs installed, and having a big (because it's BIG) number of hosts in the database probably the average wouldn't be so far from reality


Yes, but each machine will have a different setup for memory and how well that memory provides data to the CPU, which is hard to measure. The CPU performance on it's own is already being measured, and that is the basis of the current score-system.

There are also other factors: If the system is getting hot or low on power (in a laptop) it may reduce the speed of the processor, which means that it takes longer to do the calculation...


Running rosetta for 10 seconds without majorly changing how Rosetta works would not achieve anything useful, because it wouldn't finish working out a single model (decoy) of a protein in that time - not even enough to figure out how long it would take, I would think.

--
Mats


Maybe i didn't explain myself, sorry, english is my second language.
I meant to ADD a benchmark (maybe a simple loop increasing a variable for 10 secs or less) before starting to crunch the data.


BadWolf[/quote]

And that's how it works today - there as benchmark to measure integer and floating point performance, and then the machine is left to do the real task of calculating Rosetta. This however has two potential problems:
1. There are different "clients" that calculate the benchmark results differently, including people who use an "optimized" client, which gives results that aren't quite comparable to the actual calculation capacity of the processor.
2. There's no measurement of the overall system performance, just a tiny benchmark (Dhrystone for integers, Whetstone for floating point) which fits nicely in the cache of just about any processor available today (anything more than about 16KB of L1 cache and it will fit in the L1 cache) - so processors with small caches get exactly the same result as those with large ones - but in reality, a large cache will be better than a small one.

The current approximation, I think, although it may not be ideal, it's a close approximation of "pay for the amount of work done".

--
Mats

ID: 29511 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile River~~
Avatar

Send message
Joined: 15 Dec 05
Posts: 761
Credit: 285,578
RAC: 0
Message 29516 - Posted: 17 Oct 2006, 14:56:39 UTC - in response to Message 29511.  
Last modified: 17 Oct 2006, 15:08:05 UTC

... so processors with small caches get exactly the same result as those with large ones - but in reality, a large cache will be better than a small one...


And don't overlook the length of the floating point pipeline.

Two cpus may score the same float speed on the benchmark, but the data is predictable therefore the pipeline runs efficiently.

Suppose the processorts come to 1GHz float speed (makes the sums nice), and one is a three stage pipe and the other a five stage pipe.

The first cpu actually takes 3 ns to do a float, and gets the throughput by having three on the go at once. The second takes 5ns to do a float, but has 5 on the go at once.

The snag comes when which number to calulate next is depends on the result of the last crunch. The first cpu's pipeline stalls for 2ns, the second for 4ns.

This can also happen if the data are needed in a weird order (eg FFT tends to do better the shorter the pipe, an important point if you want to crunch on Einstein and perhaps on SETI).

If I remember rightly, a Pentium M has a shorter pipe than a Pentium 4. If so, then an M will do better than a 4 at the same benchmarked float speed, and this advantage will increase the more often the floating results are used to make decisions in the code.

So on two critical aspects of floating point performance, benchmarks measure what the chip can do at its best (no cache stalls, no pipe stalls). That is further than you'd hope from being a measure of what the same chip does under real conditions -- and on a project like Rosetta those real conditions may be very different beween different kinds of WU, seeing the project experoments with different stategies.

It is worse still.

We have issues of different pipes and caches. But then, if it is a dual core chip, do they share the cache, have their own separate caches, or what? If separate caches, how do the cache controllers deal with the case where both caches are trying to access the off-chip memeory at once? All thse variables, and we are not even starting to ask about different motherboards yet...

For all these reasons benchmarks are very crude.

It does seem to me that running a selection of similar tasks on a random selection of boxes taken from the real user pool is less crude, especially with a large enough sample.

River~~

ID: 29516 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Profile Seventh Serenity

Send message
Joined: 30 Nov 05
Posts: 18
Credit: 87,811
RAC: 0
Message 29586 - Posted: 18 Oct 2006, 15:24:31 UTC

I've just switched back to Rosetta@Home from WCG because of the unfairness with credit on Linux systems. I'm more for the science of course, but since Rosetta@Home is still partly based around the HIV/AIDS virus, I'll be running R@H until WCG get their fixed credit system in place.
"In the beginning the universe was created. This made a lot of people very angry and is widely considered as a bad move." - The Hitchhiker's Guide to the Galaxy
ID: 29586 · Rating: 0 · rate: Rate + / Rate - Report as offensive
Previous · 1 . . . 3 · 4 · 5 · 6 · 7 · 8 · Next

Message boards : Number crunching : Discussion of the new credit systen (2)



©2024 University of Washington
https://www.bakerlab.org