Gráfica Lucid's Hydra Engine - The future of SLI and Crossfire

cL!cK

Power Member
PcPerspective disse:
What is the HYDRA Engine?

At its most basic level the HYDRA Engine is an attempt to build a completely GPU-independent graphics scaling technology - imagine having NVIDIA graphics cards from the GeForce 6600 to the GTX 280 working together with little to no software overhead with nearly linear performance scaling. HYDRA uses both software and hardware designed by Lucid to improve gaming performance seamlessly to the application and graphics cards themselves and uses dedicated hardware logic to balance graphics information between the CPU and GPUs.

Why does Lucid feel the traditional methods that NVIDIA and AMD/ATI have been implementing are not up to the challenge? The two primary multi-GPU rendering modes that both companies use are split frame rendering and alternate frame rendering. Lucid challenges that both have significant pitfalls that their HYDRA Engine technology can correct. For split frame rendering the down side is the need for all GPUs to replicate ALL the texture and geometry data and thus memory bandwidth and geometry shader limitations of a single GPU remain. For alternate frame rendering the drawback is latency introduced by alternating frames between X GPUs and latency required for inter-frame dependency resolution.

How it Works

HYDRA is a dedicated silicon with sole purpose of scaling GPUs. Though there is no graphics processing logic on the HYDRA chip, what the chip can do is redistribute graphics workloads across multiple GPUs in real-time. The HYDRA technology also includes a unique software driver that rests between the DirectX architecture and the GPU vendor driver.

The distribution engine as it is called is responsible for reading the information passed from the game or application to DirectX before it gets to the NVIDIA or AMD drivers. There the engine breaks up the various blocks of information into "tasks" - a task is a specific job that HYDRA defines that can be passed to any of the 2-4 GPUs in the system. A task might be something like a specific lighting effect, a post processing run, a specific model being drawn, etc. The company founders on hand at the meeting were a little vague about the algorithms that decide how, and what parts, of the DirectX data are going to be defined as "tasks" - it is obvious that this is part of the magic that gives HYDRA its power; it is with these task definitions that the hardware logic can efficiently distribute the work load across many GPUs.

Once the tasks have been created, they are then sent over the PCI Express bus to the HYDRA chip where they are VERY quickly processed and split between 2 to 4 GPUs. The HYDRA Engine passes off these tasks to the GPU, awaits a result and return of finished data or pixels, and is then responsible for passing that information on to one of the GPUs for final output to a monitor. At the outset, this doesn't sound that much different than what NVIDIA and AMD already do with AFR and SFR rendering modes, but after seeing the HYDRA technology at work it is obviously something very different.

By essentially intercepting the DirectX calls from the game to the graphics cards, the HYDRA Engine is able to intelligently break up the rendering workload rather than just "brute-forcing" alternate frames or split frames as both GPU vendors are doing today in SLI and CrossFire. And according to Lucid all of this is done with virtually no CPU overhead and no latency compared to standard single GPU rendering.

To accompany this ability to intelligently divide up the graphics workload, Lucid is offering up scaling between GPUs of any KIND within a brand (only ATI with ATI, NVIDIA with NVIDIA) and the ability to load balance GPUs based on performance and other criteria. The load balancing is based on a couple of key data points: pre-existing knowledge from the Lucid team about the GPU in question and the "response time" of the GPU when being sent data from the HYDRA Engine chip. The HYDRA driver will actually recognize the GPUs in a system and will estimate how much processing power each holds but will then fine tune that estimate based on real-time performance of the GPU in action. If a GPU is sent a "task" to perform and the return time on it is slower than expected, the HYDRA engine will back off slightly and send more "tasks" to the less-loaded GPUs. All of this is updated on the fly, in real time as the game is running.

HYDRA Engine Hardware Implementation

From a purely hardware perspective, the HYDRA chip takes in a single PCIe x16 connection and outputs two full PCIe 2.0 x16 connections. Depending on the partner's implementation method, that could connect to two GPUs or split into four x8 PCIe 2.0 connections for four GPUs. What might you find the HYDRA chip on in the future? There are two likely scenarios for potential designs: on a motherboard or on a graphics board.

On a motherboard, including a HYDRA Engine chip would allow ANY chipset to support BOTH SLI and CrossFire technology since it is completely chipset independent and doesn't require SLI or CrossFire licensing. That would enable said motherboard to offer 2-4 GPU scaling with NVIDIA or AMD graphics cards - a VERY compelling solution but also likely an expensive one.

Pic 1 - Esquema de implementação numa Motherboard

The HYDRA technology would also likely find its way onto custom design graphics boards in place of the standard PCIe bridge - ala the Radeon HD 4870 X2. Lucid is claiming nearly linear scaling on up to 4 GPUs compared to 50-70% with SLI or CrossFire and thus a board vendor could really make a top performing part and stand out from the crowd or potential build one with slower chips for a new price-performance option.

As for the chip itself, obviously Lucid is being very close lipped about it. The chip runs very cool and draws just about 5 watts of power. Inside the chip you will find small RISC processor and the custom (secret sauce) logic behind the algorithm powering the HYDRA Engine. The production chip was JUST finished yesterday and will be sampling to partners soon - though they wouldn't indicate WHO those partners were.

FONTE

Pic 2 e Pic 3 - Rendering alternado, workload de cada gráfica mostrado em monitores separados

Sei que o post está enorme, mas não dá para resumir muito se queremos ter uma ideia correcta das possibilidades que isto trás para o mercado das gráficas..Honestamente não sei como é que a ATI ou a Nvidia ainda não se lembraram disto.

Por agora ainda só funciona com Dx9, mas em finais deste ano já deverá funcionar com o Dx10.1.

Implementações começam em princípios de 2009.
 
No papel parece optimo!
Esperemos que esta Lucília não seja comida no entretanto e ponha qualquer coisa que se veja cá fora...
 
Se isto for assim tão revolucionário como quer parecer não acredito que os 2 gigantes gráficos não queiram comprar e assumir a tecnologia para si proprio.
Acabei por não perceber bem se já existe alguma review com o prototipo para podermos comprovar a veracidade de algumas afirmações feitas neste texto.
 
Deixa ver se percebi isto ... eu tenho uma 6600GT com tecnologia SLI, posso comprar outra grafica diferente com tecnologia SLI e por as 2 a funcionar ?
 
Isso seria bom dar uso à aquelas placas PCI ranhosas a apanhar pó. Pena que as motherboards actuais já não trazem muitos slots PCI. :p
 
Se isto for assim tão revolucionário como quer parecer não acredito que os 2 gigantes gráficos não queiram comprar e assumir a tecnologia para si proprio.
Acabei por não perceber bem se já existe alguma review com o prototipo para podermos comprovar a veracidade de algumas afirmações feitas neste texto.

Exactamente..seria muito chato se um dos big players a comprasse. O que era bom era a empresa em si não se deixar comprar e desenvolvesse ela mesma a tecnologia, mas acho que a intel já tem mão nisto..pelo que li é uma das grandes accionistas da empresa.

Quanto à existência de protótipos..

The production chip was JUST finished yesterday and will be sampling to partners soon - though they wouldn't indicate WHO those partners were.
Como quem diz: já existem mas por enquanto ainda não os distribuímos

O artigo completo tem duas páginas, eu postei só o essencial e mesmo assim é um texto enorme..>_>

Aconselho a todos os que realmente se interessam a ler o artigo original.

EDIT: Pormenor - na demo que fizeram ao pessoal da Pc Perspective, houve uma altura em que estiveram a jogar Crysis a mais de 60 fps num ecrã enquanto que no outro tinham um web browser activo. Aparentemente podiam muito bem estar a correr um filme a 1080p que o impacto de performance no Crysis seria practicamente nulo.

Isto só vem reforçar uma ideia que eu já tinha há algum tempo, um chip de hardware a coordenar a divisão de tarefas entre duas placas / GPUs é muito mais eficiente do que o controlo por software, até mesmo pelas latências.
 
Última edição:
Exactamente..seria muito chato se um dos big players a comprasse. O que era bom era a empresa em si não se deixar comprar e desenvolvesse ela mesma a tecnologia, mas acho que a intel já tem mão nisto..pelo que li é uma das grandes accionistas da empresa.

Quanto à existência de protótipos..

Como quem diz: já existem mas por enquanto ainda não os distribuímos

O artigo completo tem duas páginas, eu postei só o essencial e mesmo assim é um texto enorme..>_>

Aconselho a todos os que realmente se interessam a ler o artigo original.

EDIT: Pormenor - na demo que fizeram ao pessoal da Pc Perspective, houve uma altura em que estiveram a jogar Crysis a mais de 60 fps num ecrã enquanto que no outro tinham um web browser activo. Aparentemente podiam muito bem estar a correr um filme a 1080p que o impacto de performance no Crysis seria practicamente nulo.

Isto só vem reforçar uma ideia que eu já tinha há algum tempo, um chip de hardware a coordenar a divisão de tarefas entre duas placas / GPUs é muito mais eficiente do que o controlo por software, até mesmo pelas latências.

uma vez um prof meu disse-me:

tudo o que é feito por software, pode ser feito por hardware e mais rápido...
 
http://www.techreport.com/articles.x/15367

Assuming the Hydra 100 does work as advertised, the big questions now are "How does it really perform?" and "Who will make use of it?" As for the first question, we got a demo of Crysis running at 1920x1200 at the highest quality levels available in DirectX 9. The test system was using a pair of GeForce 9800 GTX cards, and performance ranged between 40 and 60 FPS on the game's built-in frame rate counter. The game played very, very smoothly, and I didn't perceive any latency between mouse inputs and on-screen responses.
 
Existem posts bastante interessantes nos comentários desse artigo, vou meter aqui os que na minha opinião são os mais pertinentes:
Slaimus disse:
From looking at their patent ( http://www.google.com/patents?id=TE6AAAAAEBAJ ) it looks like instead of splitting up the screen by pixels, they do it by depth (BSP tree?).

By that logic, I can see these advantages compared to SLI/CF:

- The chip looks to be able to divide up geometry by depth, so not all of them has to be duplicated, and T&L tasks can also be split up.
- There is an speed up of Z-lookups since there are fewer depth levels
- Fewer pixels have to drawn since there is an early Z-test

And these disadvantages:

- Pixel shaders effects that read back framebuffer values may not work properly if the needed pixels are in the other graphics card's memory.
- Edge pixels may not blend perfectly
- Much of the Z logic are already present in existing graphics cards, and would be wasted.
axeman disse:
If we compare this idea to multiple CPU cores, this seems more like running two instances of one program, or two separate programs, which is easy, compared to running one program that uses multiple threads to accomplish one task, which is much harder.

Of course all types of load balancing are happening in a modern system, but AFAIK, software that starts out with multi-threaded logic is much more likely to be able to use 100% of available execution resources than trying to split a work load later on. Lucid seems to have caught onto the idea that if the work load is split an earlier stage, better and more _consistent_ results can be achieved.

That's why this is probably a better solution than trying to "split" what boils down to hardware instructions once the API calls are already made, which is what SLI drivers need to do; it's better to make multiple API calls to separate devices via the driver, and then combine the results.

But if this really does work, how long until something similar becomes part of the API itself? I mean, if the API calls can be intercepted, and re-interpreted into two work loads, it makes sense that the API should be able to deal with multiple GPUs in the first place.

With the explosion of multi-core CPUs, general purpose software (should be) starting to be designed with this in mind. Up to now, since 3d graphics is so easily parallelizable (is that a word?) in hardware no one has really though about multiple GPUs as anything other than a niche market.

But sooner or later, it seems transistor density, and therefore die size (NVIDIA was rumoured to already be facing yield issues because of die size), the GPU computing market will have to face the "multicore" reality more seriously...

AMD is probably already on the right path with developing smaller, cheaper cores, and slapping them on the same board that they still can sell relatively cheaply. One issue here is the extra memory resources needed, but memory, even graphics memory continues to both increase in density and fall in price (for now)... What turned out to be a better design, Netburst or Core2 ?

But I digress; Lucid's approach might be GPU agnostic, but it is not API agnostic. The biggest challenge Lucid probably faces is API changes, which in the case of Microsoft, are not altogether infrequent, although this could very well slow as the API becomes more flexible, allowing for greater and greater flexibility by virtue of shader programs. The hardware and driver/software of their solution has to be able to deal with every single possible API call without exception. I'm not a developer, but this seems that this will get very complicated. I guess the advantage is that if the RISC chip that powers this is powerful enough, and flexible enough, the software end of it can be tweaked to do whatever it needs to do.

Além disto, esta tecnologia é uma séria ameaça ao SLI, os fabricantes de mobos podem muito bem optar por meter um chip Hydra nos seus produtos a ter que pagar a licença da Nvidia..

Como já tinha dito, a Intel é uma das principais subsidiárias desta empresa..começo a perguntar -me até que ponto é que a Lucid não tera a haver com a Intel também dizer que a performance do Larrabee escala linearmente.
 
@Blastarr: na continuação do que estavamos a falar no outro tópico sobreo larrabee...

http://www.fudzilla.com/index.php?option=com_content&task=view&id=11450&Itemid=1

Sources close to Intel have told us that Intel is working together with Lucid on some secret project. If Lucid can pull of its promised close to linear performance out of two chips, more GPUs could mean even more performance virtually linear so to say.

Last week when we meet companies President and VP of Business development Mr. Offir Remez he did confirmed us that more chip means more linear performance. They don’t have to stop at three or four GPUs like Nvidia and ATI do, they can surpass this number.

With this in mind imagine how much good this chip could do to Larrabee and we’ve already reported that Larrabee has 50 to 80 cores. With the help of Lucid Hydra technology there might be place for some sort of collaboration, but if Lucid is so good, someone will buy it out sooner rather than later.

In case that Intel doesn’t need its internal Lucid, which we do believe will be the case, Intel might use the Lucid chip on its motherboards simply to stab both ATI and Nvidia and its Crossfire and SLI in the back. Once Larrabee comes out, there will always be possibility to put two Larrabee cards together and get more performance.
 
ELSA Japan and LucidLogix to Introduce High Performance Computing Products
ELSA Japan, a leading computer graphics solution provider and Lucidlogix (Kfar Netter, Israel, CEO - Moshe Steiner) announce an agreement to deploy Lucid’s HYDRA based chip in ELSA Japan High Performance products.

The companies have teamed up to transform high performance computing in the Japanese marketplace. For the first time, a product based on Lucid’s HYDRA technology will be used in a new line of ELSA Japan high performance systems for the HPC, broadcast and medical markets.



ELSA Japan and LucidLogix to Introduce High Performance Computing Products

End of Mar 2009, The new solution will allow ELSA to provide a cost affective solution based on multi GPUs from any vendor. First products will feature dual and quad GPU configurations.

By combining Lucid’s component with Elsa Japan’s PCI-Express end-point device and remote graphics offering, a fully scalable and flexible system can be achieved for the first time at affordable price points.

“We are pleased to partner with ELSA, which has the reputation for providing leading performance computing solutions to the Japanese market. ELSA’s selection of Lucid products for graphics and high performance computing platforms demonstrates our commitment to deliver a unique and powerful parallel processing architecture,” said Offir Remez, President of Lucid. “HYDRA technology will allow ELSA to combine multiple GPUs on one device, for efficient, high performance in compute intensive, large scale visualization scenarios.”

"Partnership with Lucid is very important for our customers who require high performance computer. We can provide scalable performance and configurable solutions to break through the performance barrier.” said Jun Nagai, president, ELSA Japan Inc.

Será k é desta?
Pode ser k a inclusão desta tecnologia em Desktops esteja mais próxima, vamos ver.
 
já começa a aparecer algo. Isso ai é uma conf para "raw computing power".

Reparem nas 4 GT200's (só tem 1 ficha 6 pinos) sem portas (logo placas para CUDA only). Claro que seria melhor 4 variantes "GTX 295" :D

Claro que isso depende de um outro PC para trabalhar. Aquela porta grande deve ser um conector PCI-e x16 ou x32 externo :P
 
Esta solução da Elsa é igual ao Tesla S1070, mas em vez de ocupar 1U, ocupa 4U (pelas imagens).
A vantagem é que podes usar placas de venda ao publico e assim sair mais barato.
Mesmo assim, a solução da nVidia parece mais elegante e ocupa muito menos espaço.
 
E tas a esquecer que a board usa o padrão ATX, logo qualquer caixa ordinária serve e não apenas rackmount.

estas placas deviam ser mais baratas que gráficas desktop equivalentes, o custo de produção é bem mais baixo (não tem portas, nem tem NVIO para os sinais de vídeo.
 
E tas a esquecer que a board usa o padrão ATX, logo qualquer caixa ordinária serve e não apenas rackmount.

estas placas deviam ser mais baratas que gráficas desktop equivalentes, o custo de produção é bem mais baixo (não tem portas, nem tem NVIO para os sinais de vídeo.

Não estou a ver o facto de ser ATX uma vantagem, porque:
http://www.nvidia.com/object/tesla_supercomputer_wtb.html

Não faltam soluções com 4 placas destas em desktops e ao contrário desta, não é preciso duas caixas (computador + graficas).
A não ser que o preço seja melhor que as ofertas da nVidia, mas como usam graficas Testa e é preciso uma board + caixa à parte, não estou a ver como.
Ainda por cima, a Elsa não é parceira da AMD e aí podia ser um produto a concorrer com a nVidia.
 
Back
Topo