Intel Roadmap

Nemesis11

Power Member
screenshot10cb.png


screenshot25hq.png


screenshot38uo.png



Sobre o Conroe:

In order to have any inkling of what Conroe will offer, we need to take a step back for a minute. The last truly new architecture that Intel introduced was the IA-64/EPIC platform for Itanium (although depending on how you look at it, some would say that NetBurst actually came after IA-64). Prior to that, Intel had the P6 architecture, which was preceded by P5 (Pentium), 486, 386, etc. all the way back to the first parts Intel made. At present there are three major architectures that are all in production at Intel: P6 (Pentium Pro/II/III now evolved to Pentium M), NetBurst (Pentium 4 and derivatives), and IA-64/EPIC used in Itanium processors. P6 isn't actually the real name of the architecture for Pentium M, of course - Intel has never come forth with an official name. While Pentium M does use an extension of P6, the Banias and Dothan cores really change things quite a bit. We'll talk about how in a moment, but we'll refer to the architecture as P6-M for the remainder of this article. When we say P6-M, we mean Banias, Dothan, and Yonah. Let's take a quick look a the benefits and problems of each architecture before we talk about Conroe.

Prescott is used on the recent Pentium and Celeron chips and has a 31 stage pipeline, coupled to a separate 8 stage fetch/decode front end. (Earlier Northwood and Willamette cores use a 20 stage pipeline with the 8 stage front end.) Together the total pipeline length comes in at 39 stages - over twice the length of the current AMD K8 pipeline. In fact, the next longest pipelines outside of Intel aren't even out yet: Cell and Xenon are both around 21 stages long. The benefits of a long pipeline are in raw clock speeds. It's no surprise that NetBurst is the only chip currently shipping in speeds greater than 3 GHz, and Cell and Xenon are slated to join that "elite" group of processors in the future.

While a lengthy pipeline allows for high clock speeds, it also introduces inefficiencies in cases where a branch prediction misses. When that occurs, everything following the branch instruction has to be cleared from the CPU pipeline and execution begins again - a penalty of as much as 30 cycles in the case of Prescott. (Of course, it could be even longer if there's a cache miss and main memory needs to be accessed, but that delay would occur with or without the branch miss so we'll ignore it.) In order to avoid the full penalty of a branch misprediction (39 cycles), Intel decoupled the fetch/decode unit from the main pipeline and turned the L1 cache into a "trace cache" where instructions are stored in decoded form. The trace cache is actually a very interesting concept and certainly helped improve performance. It basically allows many instructions to skip 1/4 to 1/3 of the standard pipeline. While Intel no longer holds the performance crown, it wasn't until the launch of the K8 that Intel really lost the lead.

In terms of internal functioning of the NetBurst pipeline, each clock cycle at most three traces (instructions decoded into micro-ops) can be issued from the trace cache to the queues within the main pipeline. The NetBurst queues (schedulers) can then dispatch up to six micro-ops per cycle, but there are restrictions and in many cases there are execution slots that can't be filled on any given cycle. Based on the number of traces issued per clock, most would call NetBurst a three-wide issue design. That makes NetBurst the same as AMD's K7/K8 cores as well as the P6/P6-M cores in terms of issue rate. Purely from a theoretical standpoint, NetBurst could execute 3 instructions per clock, multiplied by the clock speed to give the final performance. Nothing ever reaches the theoretical performance of course - if it did, then NetBurst would still be over 35% faster than any other architecture, given its high clock speeds. Branch misses, cache misses, instruction dependencies, etc. all serve to reduce the theoretical performance offered.

Moving on to the Pentium M core, you can find out some of the details of what was changed in our Dothan investigation from last year. The basic idea is to take the P6 core and add some of the latest technologies to the design. To recap the earlier article, the Pentium M has several major design features. First, it goes with a more moderate pipeline length: longer than P6 to allow higher clock speeds, but shorter than NetBurst. (Intel isn't saying more than that, though guesstimates would put the length around 14 to 17 stages.) Next, Intel added micro-ops fusion to the core, which helps some instructions move through the core faster and avoids delays associated with out-of-order cores. Micro-ops fusion in essence eliminates dependency problems on certain instructions, since they are "fused" together. The core also has a dedicated stack manager that helps improve memory access efficiency as well as lower power use. Better branch prediction is another major improvement relative to the P6 design - take something like the branch prediction of NetBurst and put it on the P6 core and that's a rough description of what was done. Branch prediction is one of the features of an architecture that generally makes all code run a bit faster, and it once again reduces inefficiencies. The number of execution units remains the same as in P6, which means there's less wasted power on idle parts of the chip, while the faster system bus of NetBurst helps to keep the processor fed with data. Finally, power saving features were added to the cache, allowing the CPU to only fully power up small areas of the L2 cache for each cache access. The end result is a processor that has certain limitations but ends up achieving a very high performance per Watt rating, which is important for a mobile part. As we've shown in several articles, Pentium M makes for an attractive laptop processor but still can't compete with desktop parts in certain tasks.

Moving on to the final architecture, we come to IA-64/EPIC. While similar in some ways to VLIW (Very Long Instruction Word) architectures of the past, Intel worked to overcome some of the problems with VLIW (specifically the need to recompile code for every processor update) and called their new approach EPIC: "Explicitly Parallel Instruction Computer". In contrast to the P6, NetBurst, K7, and K8 architectures that can issue up to three instructions per cycle, the current Itanium 2 chips can issue eight instructions per clock. From a purely theoretical standpoint, the fastest Itanium 2 running at 1.6 GHz actually has more computational power than any other Intel chip. Throw in dual core designs with HyperThreading - HyperThreading that actually works much better than NetBurst HTT due to the wide design of EPIC - and each cheap not only has the potential to issue eight instructions per clock, but it should actually come relatively close to that number. Another difference between Itanium and the other designs is that large amounts of cache are present in order to keep the pipelines fed with data. Current models ship with up to 9MB of L3 cache, while future parts like the Montecito will have 24MB of L3 cache (and a transistor count of 1.7 billion transistors - about eight times the transistor count of the Pentium D Smithfield core)!

Of course, with the wide issue rate of Itanium 2 (the original Itanium "only" had a 6-wide issue rate), you need a lot of execution units. NetBurst has 7 execution units in Prescott: two simple integer units (which can function as 4 integer units if you count the double pumped design), a complex integer unit, two FP/SIMD units, and dedicated memory load and store units. If you want to count the simple integer units as 2 each, you could make a stretch and say NetBurst has nine execution units. AMD's K7 and K8 both have nine execution units as well, only they go for a less customized approach and instead have three each of the integer, FP/SIMD, and memory units. Each of AMD's units is fully functional, unlike the "simple" and "complex" integer units in NetBurst. In contrast to these architectures, the current Itanium 2 chips have six ALUs (Arithmetic Logic Units), three BRUs (Branch Units), two FPUs, one SIMD, two load units, and two store units - call it 16 functional units if you prefer, though the specialization of some of them makes it slightly less than that. While Itanium 2 is very wide, the length of the pipeline is only 8 stages - less than any other modern x86 processor by a significant amount. That certainly plays a role in the reduced clock speeds, but like Athlon 64, lower clock speeds with a more efficient architecture can outperform long pipelines in many instances. In order to extract all of the potential performance from Itanium, however, a lot of work needs to be done during code compilation. This is the Achilles' heel of VLIW designs where processor updates require the code to be recompiled, and while EPIC doesn't require that you recompile the code, newer compiler optimizations can improve performance significantly.

All that talk about other Intel architectures (as well as some of AMD), and yet we still haven't said exactly what Conroe is. The simple truth is that no one other than Intel and people under strict NDA really know for sure what the Conroe architecture will entail. There is a point to all of this discussion of previous architectures, though. While we've really only skimmed the surface of the designs, hopefully you can see how wildly different each architecture is from the others. NetBurst is long and narrow, EPIC is short and wide, and P6-M is a medium length pipeline that is narrower than either of the others but requires less power. The high clock speeds and resultant power levels have created problems for NetBurst, but there are still cases where it substantially outperforms P6-M. Itanium is still a better solution for certain types of big business work (databases in particular) than any of the other Intel architectures. While all three architectures have their strong points, none of them qualify as a universally superior solution. Having fallen behind AMD performance in many areas, we seriously doubt that Intel wants to create a design that merely aims at being "faster than AMD in most areas." Whether or not they can succeed is of course a question for the future.

If we don our speculation hats for a minute, we'd say that Conroe will return to more typical pipeline lengths and also reduce the maximum clock speed of the processors based off it relative to NetBurst. A 20 pipeline stage design, give or take, seems to be reasonable - we heard a few people at WinHEC suggest that NetBurst was hubris in terms of pipeline lengths, and that 20 or fewer stages is where all foreseeable pipelines - Intel and otherwise - are heading. The concept of a trace cache also seems to have merit, so some variant of that concept could show up in Conroe - micro-ops fusion plus a trace cache larger than that of NetBurst sounds interesting to us at least, though we're not at all sure it's feasible. Along with the shorter, more efficient pipeline, Conroe could also look into going to a wider issue rate. Some people have argued (rather convincingly) that x86 code is not conducive to issuing more than 3 instructions per clock without expending significant die resources, however, and current designs rarely manage issuing three instructions per clock anyway. A better solution could be to simply add more execution units, branch prediction, prefetch logic, etc. to ensure that the core can actually reach the maximum issue rate more frequently. Taking something like Pentium M and adding more FP/SIMD computational power isn't too much of a stretch (though that seems to be where Yonah is already heading).

If any of these ideas make the final design of Conroe, it's basically just an educated guess. The main point right now is that new architectures from Intel are not a frequent occurrence, so we expect it to be substantially different than P6/P6-M, NetBurst, and EPIC. Depending on how much collaboration there is between the various CPU design teams, we could see many elements of all three architectures or we could see a design largely derived from one or two of the others. If you consider that the Northwood to Prescott changes were pretty significant and Intel still didn't dub Prescott a new architecture, Conroe (and derivatives) ought to be a more significant redesign than going from 20 to 31 pipeline stages, adding EM64T, and changing cache sizes. Chances are that by the time we know more, we'll be under NDA as well until the official launch, so consider this our last chance at some enthusiast speculation.

http://www.anandtech.com/printarticle.aspx?i=2492

Mais sobre o Conroe:

Intel promised us in earlier conversations that its engineers are working on reducing the power consumption of its desktop processors. But it was unclear when this will happen. Documents seen by Tom's Hardware Guide now indicate that the new processor architecture code-named "Conroe" and scheduled for the second half of 2006 will deliver on this promise. If we believe our sources, then Intel is targeting a power consumption of about 60 to 70 watts per processor - or 30 to 35 watts per core.

According to an updated roadmap, Conroe will carry over the 900 sequence number that will be introduced with Presler, the 65 nm version of the current Pentium D 800. Conroe will launch as 940, 950 and likely as 960 version with clock speeds that have not been specified yet. However, we know that Conroe will fit in the LGA775 package of the Pentium D 800 and 900, support Intel's virtualization technology, and will not offer Hyper-Threading capability. The processor will be available in two versions - with 2 MByte and 4 MByte of L2 Cache.

http://www.tomshardware.com/hardnews/20050808_190135.html

Intel to unveil Pentium 4 successor at IDF

Santa Clara (CA) - Intel confirmed that it will showcase its "next generation" desktop processor architecture at the upcoming Intel Developer Forum (IDF) Fall. Code-named Conroe, the new processors promise a significant lower power consumption than the current NetBurst architecture that is used in the Pentium 4 and Pentium D 800.

http://www.tomshardware.com/hardnews/20050812_023337.html

O Conroe vai ser apresentado dia 23 de agosto na IDF.
A 2ª parte de 2006 vai ser intessante, com o Conroe da Intel e a mudança de socket/DDR2 da Amd.
 
Última edição:
Dúvida.... quando se vê no quadro clock de 3.8 é os dois cores, certo? Ou seja, 1.9 + 1.9 ?

O Presler já vai ser baseado no Pentium M (aka P3 modificado) ? Se calhar aí já rende comprar Intel outra vez! :) E como eu não espero fazer upgrade até essa altura...
 
greven disse:
Dúvida.... quando se vê no quadro clock de 3.8 é os dois cores, certo? Ou seja, 1.9 + 1.9 ?

O Presler já vai ser baseado no Pentium M (aka P3 modificado) ? Se calhar aí já rende comprar Intel outra vez! :) E como eu não espero fazer upgrade até essa altura...

O Presler ainda é baseado no Pentium 4. O Conroe deverá ser um core totalmemte novo.

A velocidade do 3.8 é d 3.8 Ghz para os 2 cores.
 
O 3.8 que está no quadro é o Cedar Mill, que é single core.
O Presler (dual core) dão a entender que ainda vão sair versões acima de 3.4.

O Conroe deve ser anunciado dentro de dias na IDF.
 
O Presler parece nice, embora deve ser tipo P-4 6xx para o 5xx, cache bump...

Espero que o Conroe seja Dothan biased :001:

É meter mais pressão a AMD :D Pelos vistos o gigante ainda respira... lol

O que o core "Cedar Mill" vai trazer de novo? Apenas a tal tecnologia de "Virtualização" ??

Esta tecnologia é para poder correr dual SO ao mesmo tempo?
 
Noticias muito interessantes.
Parece que o dia 23 é o grande dia para ficarmos a saber mais uns pormenores.
Acho que vou adiar a mudança para dual core:).

Sim a virtualização vais poder correr dois SO diferentes. Vai ser bonito vai.:)
 
O Presler é a continuação do 8XX e não dos 5XX/6XX. O Presler são dois Cedar Mill "colados".

presler2rg.jpg


As diferenças do Presler/Cedar Mill é os 65 nm, o VT e no caso do Presler, ter mais L2 (2 X 2MB).
Talvez existam outras diferenças. É esperar pela IDF.
 
Continuo a não considerar estes Intel como sendo dual-core.
Simplesmente não se pode dar o nome dual-core a um chip que usa dois cores separados, que só comunicam entre si por via de um Front Side Bus externo na motherboard.
É uma solução enganadora e feita em cima do joelho à pressa, com performance a corresponder (baixa).
É o mesmo que dizer que o Geforce NV45 é um chip novo, apesar de continuar a ter a bridge HSI (no mesmo packaging, tal como estes dois cpu's).

Um chip dual core, ou tem os dois cores on-die, unificados, e a "falar" entre si internamente, ou não é um dual core.
 
O baixo desempenho destes Dual Core da Intel deve-se mais ao facto de que os Pentium 4 são "fracos" do que por comunicarem entre si atraves do FSB. Apesar de tudo não há tanto fluxo d dados entre ambos para sobrecarregar o FSB ao ponto de servir como grande limitador de desempenho.
A solução da AMD é mais eficaz, mas não é o ponto fulcral da vantagem de desempenho dos X2. Esta vantagem deve-se mais á exelencia dos K8 que os equipa.
 
Winjer disse:
O baixo desempenho destes Dual Core da Intel deve-se mais ao facto de que os Pentium 4 são "fracos" do que por comunicarem entre si atraves do FSB. Apesar de tudo não há tanto fluxo d dados entre ambos para sobrecarregar o FSB ao ponto de servir como grande limitador de desempenho.
A solução da AMD é mais eficaz, mas não é o ponto fulcral da vantagem de desempenho dos X2. Esta vantagem deve-se mais á exelencia dos K8 que os equipa.

bingo, we have a winner :)
 
O problema não é o fluxo de dados ou a largura de banda, mas sim a latência.
Não se pode comparar um bus externo, partilhado por dois cores, memória, motherboard, etc, etc, com dois cores que partilham informação ao nível dos próprios núcleos, à mesma velocidade dos mesmos.
 
Acho que aqui temos o problema do costume, as 2 principais empresas de processadores para o mercado doméstico em guerra, as novidades a aprecerem, mas preços baixos, resultado porventura de um projecto tipo X2 3800+ em que se procura uma baixa do preço de produção, nem q se tenha de criar um novo core para produzir um preço mais atractivo para o end user (vulgo Zé Povinho) nem ve-los...

Acho utópico dizer q isto acontecerá nos proximos tempos por parte da intel, como se vê no roadmap, vamos a ver é se a moda pega do lado da AMD, preços atractivos, boa qualidade, e um nome forte são os ingredientes para o sucesso... Ainda se vê muita gente a comprar P4 só pq se chamam P4, qdo têm AMD's melhores e mais baratos, mas isto é outra história e já me estou a alargar...
 
Parece-me interessante, pelo menos a parte dos 60 – 70W.
Já era tempinho de dar valor à eficiência em vez da força (inútil) bruta.
 
Towards Yonah Extreme Edition?

It for a long time is known, technique and marketing very often sails in opposite ways. However, if marketing often tries to go more quickly than the technique by advancing "limiting" arguments very, it is for once the reverse which could occur. Indeed, Yonah (Pentium M doubles core 65nm) is supposed to leave in Q1' 06 with a maximum frequency approximately 2.1/2.2 GHz, as defined by the roadmaps marketing which already made the turn of the Web. However, we obtained several concordant information on the technical capabilities of INTEL relating to Yonah.

We know that Yonah east sleep and already in mass production and that the last stepping is designed to function without problem with 2.5 GHz. In short, if marketing follows, Yonah with 2.5 GHz could be launched well rather than envisaged. The principal risk (not very probable considering the performances on paper) would be that Yonah face of the shade with Conroe, which will be, announced to him with... Go, one keeps still a little suspense; -)

http://www.x86-secret.com/?option=newsd&nid=907

Launch Conroe with 2.93 GHz

We recently learned some more on the frequencies of launchings of future CPU INTEL of new generation. Succeeding current Netburst architecture, like in Pentium M of the type Dothan and Yonah, the next generation of processors is declined in version Desktop (Conroe), Mobile (Merom) and Server (Woodcrest). Inspired of architectures with pipeline runs like Pentium M, these new CPUs Dual Core, planned for in one year, will be equipped with a L2 cache unified of 2 Mb or 4 Mb according to versions'.

Level frequency, it is thus to 2.93 GHz that the first CPU of this generation should be born. Level FSB, it will act of a FSB1066 (266 real MHz) basic. This said, our Albano-Croatian sources seem unanimous to affirm that the variation Mobile, Merom thus, will be also completely compatible pine-with-pine with the platform Napa (Yonah) and will thus use a FSB667. On another side, my small finger with me thinks that a version XE of Conroe equipped with a FSB1333 (333 MHz) is also very probable... Who will live will see.

EDICT: As at x86, we are very lazy and that one does not want to remake a news to give single information, even exclusive: Napa and Yonah will be announced on January 5, for the opening of THESE of Las Vegas. Paf.

http://www.x86-secret.com/?option=newsd&nid=909

Venha de lá um Conroe.
 
Back
Topo