Changing the Computing Landscape Forever - Bom artigo sobre GP-GPU e futuro mercado.

ToTTenTranz

Power Member
Sempre gostei muito dos artigos deste site. Normalmente estou 100% de acordo com as opiniões e argumentos que eles mostram, e os artigos primam pela imparcialidade e clareza com que são escritos.
Recomendo.


PenStarSys disse:
Changing the Computing Landscape Forever


Why GP-GPU May Hurt Intel

I realize the sub-title of this article may be a bit extreme, but we are on the cusp of a change in computing that many people may not realize is here. It is only now that we are seeing larger announcements and greater talk about General Purpose GPUs and how it will affect the future of not only the desktop applications, but of high performance computing itself. While we may be a generation away from seeing products that will fulfill the promise of GP-GPU, the groundwork is being done by many competitors to take advantage of this exciting new world.


What is GP-GPU

The modern DX9 based GPU is a floating point monster that can be used as a powerful stream processor. For example, the X1900 (R580) features 48 pixel shader pipelines that can handle 32 bit operations. Now, this is not IEEE 32 compliant, but it is still 32 bit. By utilizing specific programming, this floating point potential can be tapped and utilized when properly applied. This does not mean that Windows can be run on a GPU, as it is most definitely not X86 based! A current GPU can be viewed as a very strong tool in specific instances. For better or for worse, the primary functionality of a GPU is still graphics processing, and the architecture is designed to do just that from the very beginning.

Most of the GP-GPU software people are of the opinion that ATI hardware is faster than NVIDIA hardware when it comes to GP-GPU, and I am not one to deny that. When looking down from 10,000 feet we see that overall shader performance does favor ATI, and their branching unit (which can be very, very handy in GP-GPU applications) is more advanced than what NVIDIA offers in the 7x00 series. Add onto that the programmable memory controller and Ringbus architecture, we can see that in such a general purpose setting the ATI hardware is the better candidate. This does not mean the hardware is perfect though.

Both ATI and NVIDIA know of the potential of GP-GPU, and over the past few years both companies have brought in people to help achieve higher performance and greater compatibility with their hardware for GP-GPU purposes. ATI may be ahead in the game right now, but the truth is both companies are working very hard to include greater GP functionality in their upcoming products. Both R600 and G80 should have a much greater amount of performance and functionality over previous architectures.

So if the stakes are so high, why are we really only hearing about GP-GPU right now? There appear to be several reasons. It wasn’t until DX9 GPUs hit the market that people started to think of these products as floating point machines. ATI’s first generation could only do FP24, which is not terribly useful for most scientific applications. NVIDIA’s FX series had FP32, but performance was not great when utilizing it and the overall architecture was too constrained for general purpose use. When the GeForce 6 series came out, it gave enough features and performance for work to really start in GP-GPU, plus the addition of branching (which is a key feature for general purpose programming). ATI’s X1xx series was later released, and with its advanced memory management and branching performance, programmers had a lot more flexibility (and a lot more interest) to develop software that will take advantage of such hardware. It also helped that the installed base of computers with SM 3.0 functionality finally became significant.
The software and hardware have finally become mature enough for this functionality to be exposed. The announced applications are only scratching the surface when it comes to potential for GP-GPU, and this will have a tremendous impact on all future High Performance Computing applications, not to mention what will trickle down to the desktop.




GP-GPU and Why AMD Bought ATI

The reasons for AMD acquiring ATI are many, and a lot of them have already been covered. AMD will be able to leverage ATI’s chipset business to augment the CPU business, get their foot into the mobile and TV market, plus the added variety of products ATI offers will provide balance to what AMD has. While these reasons are valid, I think the one overriding reason is in fact the potential of GP-GPU, and how such technology leveraged in the right way could upset the entire industry. We merely have to look at two of AMD’s latest technology releases to see where this is going.

The first is Torrenza. AMD is opening up the cache coherency HyperTransport bus to 3rd party members who wish to create “co-processors” that will work hand in hand with Opteron and accelerate computing or special functionality. This will be a dedicated, high speed link that can have its own local memory. IBM, Cray, Fujitsu, and Sun have all signed on to provide parts that will fit in the socket or HTX slot. This of course is of great potential for GPUs. It tightly connects CPUs and GPUs, and while PCI-E will be the connection of choice for years to come in the mainstream market, it does offer the option for this type of graphics integration. When GP-GPU comes into play, this allows a highly complex GPU to be placed beside an Opteron and provide massive floating point performance, all without having to use a large board with expensive local memory and power circuitry.

The second and more important announcement is that AMD will release a CPU in 2008 that will have the graphics portion integrated in. At face value this appears to be a cost optimized product that would address the value marketplace. This appears correct, but the true potential of this integrated portion is akin to the first floating point unit integrated into an X86 CPU 20 years ago. AMD will market this as a budget SKU, but we can also bet that there will be a “performance enhanced Opteron” which will use the integrated graphics functionality as a dedicated, high powered, floating point co-processor that can be utilized by a large amount of applications that WILL be ready for it in 2008.

This future processor could very well be the next great product for High Performance Computing. The idea of a quad core Opteron processor with an integrated graphics portion which will have at least four to sixteen dedicated floating point units that can be used in stream processor applications has made people around the industry stand up. With all of the other system level advantages AMD currently has over Intel (HyperTransport, Integrated Memory controller, Torrenza initiative, etc.) the ability to leverage the tremendous floating point power of up to 16 shader pipelines for scientific, rendering, and desktop applications is simply stunning. Remember as well that these shader units will be at least DX10 capable, so they will not be divided into pixel and vertex pipelines.
When we also realize that these products will be running in the 3 GHz range, the potential throughput of such a co-processor is astounding. Since we are also seeing high quality physics effects in upcoming games that will utilize the floating point potential of GPUs and standalone physics processors, the GPU portion of the upcoming AMD CPU may not in fact be merely a low end part. The graphics portion can still be utilized to do specific functions even when paired with a standalone GPU. Having a physics co-processor directly on the CPU can bring a far greater level of performance and immersiveness.






Where Intel Sits

ATI and NVIDIA are well aware of the potential of GP-GPU, and both companies are investing a lot of man-hours and money into making this “hobby” into a viable industry. AMD obviously saw the importance of such a product, as they are acquiring ATI. Intel has been slow getting into the game. Only within the past few months has Intel been really trying to bring in boatloads of engineers with an understanding of graphics. It is not that Intel wants to compete in the discrete graphics market, but it fears being left behind by AMD/ATI and NVIDIA when it comes to High Performance Computing.

While Intel has designed integrated graphics, it is far from being considered high performance even for the integrated market. Both ATI and NVIDIA have products that are upwards of four times as fast in 3D graphics as the Intel products, and the latest generation of Intel graphics shows the same type of performance deficit. While it is a unified architecture, it still does not look to compete. Integrated graphics up until recently has been a bullet point for Intel, and something they could use to make sure they retain marketshare by providing an “all-in-one” solution for business clients and standard users. Now that the potential of GP-GPU has reared its head, Intel has finally taken note. Intel will have a standalone graphics chip within two years, but they are increasingly looking very late to the party.

By the time that Intel will have released its standalone product, ATI/AMD could potentially have a product that is two generations beyond what Intel will be able to offer at that time. NVIDIA will also be well ahead. While Intel will hire a lot of people, and entice others from ATI and NVIDIA, there is no substitute for the overall experience that the current graphics companies have, not to mention a solid foundation of products, IP, and distinct architectures over the years that can be leveraged in future products. Intel is playing catchup, and they may not have a good competing product in the standalone graphics market until 2010. This leaves AMD/ATI and NVIDIA three years to place their technology into every single nook and cranny they can find which demands high performance floating point computing.

Another log thrown upon the fire is Intel’s latest investment in Imagination Technologies, the creator of the PowerVR architecture. This company does have extensive experience in 3D graphics, as well as programmable graphics parts. While Intel has licensed technology from Imagination, we are now expecting to see a deeper collaboration between the two. It certainly seems that Intel may have found a partner to help it catch up to the others in this arena if internal work does not proceed fast enough.
Intel has also opened up its front side bus architecture to 3rd parties as well, but the technical challenges of integrating such a product is well above that of designing a HyperTransport 3.0 compliant part. Also of note are the performance and scalability disadvantages of the GTL+ bus over HyperTransport. There is definitely interest in using Intel’s bus, but it is more a matter of economics rather than technological superiority (far more servers use Intel products than AMD at this time). Intel is working on its own version of HyperTransport (CSI), but it is not expected anytime before 2009 at the earliest.








Why GP-GPU is Important

The world has an insatiable need for high performance computing, and stream computing in particular. Medical research, seismic research, economic forecasts, weather modeling, content creation, simulations, rendering, consumer applications, high energy physics, and a host of other potential applications are waiting for a power efficient, yet performance oriented product that will meet their needs (or at least some of their wants). AMD hopes to attract these types customers.

While Intel has the fastest X86 processor at this time with the Core 2 architecture, there is no roadmap for including an integrated graphics portion that can handle GP-GPU. If Intel really gets moving, we can maybe expect one in 2010. Because Intel does not have the experience or product IP that ATI does, AMD could have a distinct advantage. Because AMD is also not standing still in their CPU business, we can expect to see far more competitive parts coming out on the X86 side as well. The combination of a fast, quad core X86-64 processor combined with a four to sixteen unit GP-GPU integrated on the same die that will consume between 65 watts and 100 watts of power has the HPC crowd anxiously awaiting its arrival.

Why is this so exciting? Think of a 1U or ½ U server integrating 4 quad core CPU’s, each with a powerful integrated GP-GPU at its disposal. The computational power of such a box is at this point in time unheard of for its size and power consumption. Then take into account potential 1U and up servers utilizing these quad core products each with a dedicated co-processor from another partner doing other specialized functions above and beyond what the CPU/GPU combination can do. We are talking of the potential of a single rack being able to perform at the same computational level as a 10,000 CPU supercomputer, and all of this within three years.

Now we see why Intel is very interested in getting graphics people on board, because they realize they are about three years behind the competition. Intel has most definitely talked to NVIDIA about either acquiring or licensing its technology, but knowing Jen-Hsun, he has his sights set far above being merely a graphics supplier. We also can see the first glimmerings of why NVIDIA has been hiring CPU designers and engineers for the past year. While NVIDIA will not be putting out an X86 based processor anytime soon, I would not bet against such a move in 10 years time.







AMD Ascendant

This past September I still had my doubts about the acquisition of ATI by AMD, but after reading about ATI’s GP-GPU announcement, and the potential of that type of computing, I feel that the management of AMD has a good idea of where the industry is heading. AMD may not have the best CPU product at this time, but they are working to radically shake up the industry with this move. The steps AMD has taken in the past ten years have pretty much made it a contender, and if it is able to act on its roadmap then there is no reason why AMD may not in fact be the performance dominant X86 provider in three years time.

There are two potential weaknesses that could derail AMD. The first is being able to provide enough parts by 2010 to explode its marketshare. AMD is taking steps to make sure that they could be in a position to take upwards of 50% of the market by 2010, and this is by converting Fab 30 into Fab 38, as well as the new Fab that will be constructed in East Fishkill, NY. Many analysts considered AMD to be insane to take on so much debt in such little time, but if AMD is to capitalize on its potential then it has to have production to back it up. The second potential problem could be successfully integrating ATI and AMD’s design teams to create this fully integrated product. There are many hurdles before them on the design side, but if they can successfully complete such a part, then the company could have a tremendous product to bring to market.

It appears that AMD is poised to make a significant impact from 2008 through 2010 that could potentially make the first three years of the Opteron and Athlon 64 seem insignificant. If AMD can execute on this roadmap, and if my predictions of the importance of GP-GPU do in fact come true, AMD will be in a position that is far superior to where Intel will be. AMD has already broken through most of the big barriers that it has encountered in its past, and it is no longer considered a “second best” competitor to Intel. AMD now services every major OEM in the world, and with their “big tent” philosophy, many of these OEM’s see greater flexibility in providing customers with specialized products. Companies such as IBM and Sun will enhance their AMD based servers with co-processors that will only be available on their own products. This potential differentiation will be very good for the market, and it will allow many competitors to add something different and new to the mix.
When Core 2 was released, I was pretty certain that AMD would not get the chance to trump Intel again. Now, I am not so sure. While AMD’s CPU designs will most likely be near to or equal in performance to Intel’s latest and greatest, the surrounding architecture could very well catapult AMD beyond what Intel can offer. Intel is full of very bright people, and they have apparently caught onto the trend and are doing their best to catch up, but it appears as though AMD could have a multi-year lead in getting to this very lucrative and important marketplace first. If my predictions come true, then AMD could very well have a single core product that could serve the low end with integrated graphics, as well as the high end HPC with a quad core processor with a high performance co-processor on die. The implications of this vision are simply stunning for AMD and the industry as a whole.






Editor's Note- October 3, 2006

There is one thing that I had totally skipped over in this editorial, and that was Intel's IDF showcase of their 80 core processor on a single die. Intel is definitely not sitting still when it comes to stream computing, but we are not exactly sure what form it will take at the end of the day. Intel's 80 core accelerator is definitely exciting, especially when combined with its high speed memory "sandwich". Though each core is simple, it has a tremendous amount of bandwidth per core, with a decent amount of dedicated cache per core.

I also had a very good discussion with David Kanter at Realworld Tech, and while he thought my premise was interesting, he wasn't terribly in tune with my conclusions. His first thought is that the graphics portion of the upcoming 2008 product from AMD would be clocked much lower than the CPU, and may not be the stream computing powerhouse as I had presented it. Also, he believes Intel has been working on higher end graphics for far longer than people realize. It is also not a slam dunk to think that integrating ATI and AMD will be straightforward and easy, and getting their design teams together to create something as complex as a CPU with an integrated GPU may not bear fruit anytime soon.

This is very much a speculative article, and I am sure that I will be joining in discussions at a tech site near you.


Fonte: http://www.penstarsys.com/editor/company/amd/gpgpu/index.html
 
Última edição:
Muito bom artigo que mostra exactamente o que ja vem sido falado.

Desde a compra da ATI, ate á grande capacidade de general porpuse dos GPU´s da ATI, seguindo a plataforma Torrenza.
Tudo isto ta a ser planeado e nao ta a sair por acaso.

Actualmente é a AMD que tem que ter cuidado com a Intel para nao se deixar para trás, mas no futuro a Intel terá que ter muitos bons trunfos para conseguir aguentar com a pressao que a AMD vai execer e inovar.
 
Computação vectorial não é nenhuma novidade. Alguns mainframes de há 20 anos já tinham processadores vectoriais e a maioria dos supercomputadores japoneses da Fujitsu também. A diferença é que antigamente eram lá postos intencionalmente para "fazer contas" e agora trata-se de aproveitar os sofisticados GPUs para algo mais do que rendering 3D em tempo real. Realmente dava pena ver tanto "desperdício". :D A introdução da "física" nos jogos também tem ajudado a dinamizar o stream computing. Por último só temos que agradecer à rica industria do jogos electrónicos por nos possibilitar estas grandes capacidades de computação a preços de saldo.
 
Até á parte da nota do editor parecia que a AMD tinha tudo estudado e que a Intel estava uma desgraça. Nem se ficava com a ideia que domina actualmente em toda a linha com o Conroe e o Xeon.

A integração do GPU no CPU parece ser possivel o que não me parece viável é arrefecer aquilo tudo junto. Com os coolers "standard" vemos nos cpus 50ºC e nas ATIs X19XX valores de 75ºC.
Para juntar tudo não estou a ver como vão fazer. Aliás estou, têm que oferecer soluções em termos de GPU limitadas.

Não se falou aqui no CELL. Daqui a 2 ou 3 anos ninguem sabe como vai estar o CELL. Com a PS3 a sair no proximo mês os 2 ou 3 anos que se espera por esta mudança tb são os 2 ou 3 anos que se tem para programar e continuar a desenvolver o cell. Adicionar ou subtrair um dos co-processadores para GPU não é descabido. Aliás o conceito do Cell parece muito bom e levantará menos problemas para o arrrefecer.
Não estou mesmo a ver como é que se meteria um CPU com grafica integrada ao nivel do que temos hoje em 1U como é sugerido a meio do artigo.

Por fim a mostra do processador com 80 cores por parte da Intel está muito interessante e tb é esquecido até ao final do artigo. No final tenho duvidas que uma solução CPU e GPU separados eventualmente com mais uns co-processadores para coisas específicas como o que temos agora com a AGEIA PhysX não faça melhor o truque. Em particular por causa do calor dissipado.
 
Última edição:
Back
Topo