Hammer Preview

D1S0RD3R · 13 de Setembro de 2002

The K8 processor, named Hammer afterwards, is actually similar to the existing Athlon XP, but at the same time there are striking differences. We also got to know the name of the architecture AMD implemented in Hammer. It's "x86-64", that is, 64-bit x86 (by analogy with x86-32). In this article we'll try to make it clear what x86-64 Hammer actually is and how it differs from Athlon, Pentium 4 and Itanium.
But first, let's think what do we actually need 64-bit processors for? The answer is simple - the present-day applications started setting extremely resource-hungry tasks to computers.
32-bit processor can address the memory only with these 32 digits, so the maximum memory capacity in modern x86 systems is limited to 232 = 4,297,967,296 Bytes = 4GB. It should be mentioned that Xeons can emulate 36bit addressing, that is, address up to 64GB, but tricks like that lead to worse performance. Moreover, the maximum memory size an application thread can use is equal to the same 4GB. That was one of the causes people started thinking of constructing 64bit processors.

Basic Principles of x86-64 Architecture
x86-64 is the 64bit architecture AMD developed for its Hammer processor family. In contrast to the 64-bit IA64 architecture used in Intel Itanium processors, x86-64 is based on the existing x86-32 architecture. It means the x86-64 based processor can run all the existing 32-bit applications without any difficulty. There are quite a lot of them now, you know, and they cost a lot of money. And applications like that can be run without any performance losses, unlike in case of Intel Itanium where the x86-32 instructions have to be emulated. So we don't have to wait until the developers recompile their products for the new platform in order to start using Hammer systems. On the contrary, the new AMD processor has all the advantages of its predecessors, but adds a few extra possibilities to them, that can be employed afterwards.
To support both 32bit and 64bit code and registers, the x86-64 architecture allows the processor to work in two modes: Long Mode with two sub-modes (64bit and Compatibility modes) and Legacy Mode. You can read in the table what these modes are meant for:

So,

64bit mode features the support of:
64bit virtual addresses;
8 new 64bit general-purpose registers;
GPRs extension to 64bit (including the "old" EAX, EBX and so on);
64-bitinstruction pointer;
New relative instruction pointer (RIP) method of data-addressing;
Continuous address space with a single space for instructions, data and the stack.
That is, the 64bit mode is the 64-bitness as it is.
Compatibility mode provides binary compatibility of the existing 16- and 32-bit applications with the 64bit operation system. It is implemented according to the principle of separate code segments. But unlike the 64bit mode, here the segmentation works as usual, using the protected mode semantics. The running application views the processor as an ordinary x86 CPU in the protected mode. However, the operation system regards the mechanisms used for address translation, work with interrupts and exceptions, and system data structures as if they were of 64bit Long Mode;
As an addition to Long mode, x86-64 is supposed to support Legacy mode thus providing binary compatibility with 16- and 32-bit operation systems. That is, in Legacy mode the processor functions as an ordinary 32bit x86 CPU. None of the 64bit instructions is involved here. The mode provides full compatibility with all existing x86 architectures. It includes the support of segmented memory and the 32bit GPRs and instruction
But despite the external similarity, the processor core of the 8th generation CPU has undergone certain changes. To cut it short, we would like to mention the following:

Level 1 cache remained the same. It's size is 128KB, 64KB are for data and 64KB - for instructions.
The maximum size of level 2 cache, which the core can address, is lowered from 8MB to 1MB because of the Hammer architecture. Anyway, though Athlon processors could in theory support 8MB L2 cache, they never really did so. Hammer is intended to come to the server market, so, in case a larger than 1MB cache memory is necessary, AMD is going to use level 3 cache;
The processor pipeline is 2 steps bigger. It will allow Hammer to work at higher clock-rates than Athlon does;
Hammer will feature an improved branch prediction unit;
8th generation processors will also feature larger translation lookaside buffers (TLB).
Let's dwell a bit more on some of the innovations. First of all, note that L2 cache in Hammer remained 16-channel associative one, like in Athlon. L2 cache also remained exclusive. Anyway, AMD claims that Hammer's cache was developed independently of the same unit of Athlon XP. So the analogy doesn't necessarily mean that they'll have the same efficiency. For example, there are hopes for a long-awaited increase in L2 cache bus width. The 64bit bus used in the Athlon CPU family looks a sort of out-dated and we tend to believe that this bus in Hammer will have at least 256bit width as Pentium 4 has.

By the way, Pentium 4 showed to the entire world the importance of a long pipeline. The longer pipeline in comparison to all the predecessors and competitors allowed Pentium 4 to work at incredible core frequencies, unattainable by processors of other architectures. But the performance of Pentium 4 working at the same core frequency appeared much lower than that of the competitors, as high clock frequencies do not necessarily imply that more instructions are processed per clock. Unavoidable mistakes in branch predictions make Pentium 4 empty its gigantic pipeline and stay idle until it is full again. All this leads to lower performance.

Fresh Idea: Memory Controller in the CPU

One of the main innovations in Hammer is a memory controller integrated into the processor core. The same approach was used by Transmeta in its Crusoe solution. AMD decided to evolve the idea a bit further. The main advantage of a built-in memory controller over the ordinary one, which is placed in the North Bridge of the chipset, is that it works at the processor core clock and, as a result, has lower latency. And the higher frequency of the processor is, the lower will be the latency.
One more advantage of the integrated controller is that AMD won't depend on chipset makers as it comes to the work with the memory. There were cases when a poor memory controller of the chipset greatly limited the overall system performance. The manufacturers even had to release revisions to avoid the problems with the memory (remember the KT266 case). Moreover, the data won't now be transferred by the processor bus, so there'll be one "bottleneck" less.

The Hammer memory controller will work with DDR memory of the PC1600/2100/2700 standards and will be 64- or 128-bit wide. It means that either one or two memory channels can be involved. And as AMD decided to promote Hammer in the server market, the ECC memory support looks quite natural.
AMD claims that its memory controller will support "future memory standards", too. It seems to be about the fact that as soon as DDR II comes out (next year) and then the other memory standards (such as DDR III), the memory controller will be modified accordingly

One More Innovation: HyperTransport

HyperTransport (former LTD, Lightning Data Transport) is a high-speed "point-to-point" data transfer bus developed by AMD and first implemented by NVIDIA in its nForce chipset to connect North and South Bridges. Saying that HyperTransport is widely used in Hammer systems is the least you can say about it. HyperTransport in Hammer systems means much more

.

This bus is used to connect the processor and the chipset, different parts of the chipset developed by AMD for Hammer, and different processors in multiprocessor systems (see below) by means of additional HyperTransport controllers built into the processor. To cut it short - everywhere… Why? What's so good about HyperTransport? Well, it really has a lot of good about itself: high speed, low latency, simple design (few wires). The maximum data-transfer rate provided by HyperTransport is 6400MB/sec one way. It can be easily changed by setting the width to 2, 4, 8, 16 and 32 bits and the frequency to 400, 600, 800, 1000, 1200 and 1600MHz, thus getting the necessary data-transfer rates (from 100 to 6400MB/sec forth and back. For example, to connect processors in multiprocessor Hammer systems HyperTransport will provide 3.2GB/sec each way.

(Desculpem lá, mas ali onde diz "solo" na mobo não vêm o Simbolo do Tomázio??????'

)

Brothers in Arms: ClawHammer and SledgeHammer

Right now AMD is planning to produce two Hammer modifications: ClawHammer and SledgeHammer. The first one is intended for desktop PCs and low-end dual-processor servers. It'll be shipped under the well-known Athlon brand, possibly with some suffix, such as Pro, Ultra or 64. The latter is the server version of Hammer targeted at two-, four- and eight-way servers. The official name of SledgeHammer is already known. It's Opteron.

As you see, the ClawHammer core size is even smaller than today's 0.18micron Athlon XP (129sq.mm). It will allow AMD to lower manufacturing costs and tag acceptable prices to the new processors. Unofficial sources say that at launch ClawHammer will cost about $400 while mainboards for it - about $200. It's not that expensive for the high-performance sector, where these processors are expected to fit actually. Compare it with the price of top Pentium 4 models - $500-600. Well, it's not surprising as Pentium 4 has a larger die size - 131mm2. Even if we take into consideration the fact that Intel uses 300mm wafers and AMD - 200mm one, the manufacturing cost of future AMD processors won't be higher than that of Pentium 4.
It should be mentioned that ClawHammer was cut down significantly compared to SledgeHammer in order to reduce the manufacturing costs. One of the most disappointing things is L2 cache cut by four times: ClawHammer cache size won't even reach the level of modern Pentium 4. It seems that high-performance memory subsystem could make up for the smaller cache, especially since Hammer processor is equipped with the integrated DRAM controller. But the controller used in ClawHammer only supports one DDR SDRAM channel with the maximum bandwidth of just 2.7GB/sec.

One more thing to mention is the different sockets ClawHammer and SledgeHammer are going to use. It seems this way AMD wants to prevent the customers from the natural desire to succumb to the temptation of using cheaper CPUs instead of their more expensive brothers. For instance, Athlon XP is widely used in dual-processor systems where Athlon MP is supposed to be. This will never be the case anymore, said AMD. So, they are planning to promote three sockets at a time in the beginning of 2003. Socket940 is for server and workstations market, Socket745 is for desktops and low-end servers, and Socket A, which AMD is going to support throughout 2003, is for Value computers.

Multiprocessor Hammer Systems: Something to Wonder at

As we have mentioned before, every Hammer processor will feature two or three HyperTransport controllers. This number of buses is more than enough to ensure proper connection with the chipset. So, what do we need the other buses for? To build multiprocessor systems! The key issue about building multi-processor systems with Hammer CPUs is the use of the same HyperTransport bus.

This way, the implementation of a dual-CPU (or four- or eight-CPU) configuration doesn't require any support from the chipset. And as HyperTransport is quite easy to layout on the mainboard, dual-processor Hammer systems won't possibly be expensive and will have green light to enter the desktop market.

There's an interesting question, though. Every Hammer has its own MCT with DDR SDRAM connected to it. What happens to the memory in a multi-CPU system? The thing is that every CPU in a system like that will be able to access other processors' memory besides its own. The access goes along the same HyperTransport bus. AMD claims, its bandwidth of 3.2GB/sec each way is more than enough to transfer data within the multi-CPU system. As a result, the memory turns into a single block, as in ordinary SMP-systems. As every SledgeHammer can use up to 8 modules of 2GB each, the maximum memory capacity in 8-processor system could reach 128GB (!). By the way, there won't be any problems with addressing it as every CPU can address 1TB (1024GB) of memory (Hammer uses 40bit physical and 48bit virtual addressing).

Performance: First Estimates

We shouldn't hope to see a drastic performance growth by the 8th generation processors. Don't forget that Hammer has the same architecture as Athlon. Anyway, by preliminary estimates, Hammer will run ordinary 32bit applications (the ones that we have now) about 25% faster than Athlon XP working at the same core clock frequency. The integrated MCT will contribute 20% of the performance boost and the improvements of the core - 5%.

Recompiled for x86-64 applications, Hammer will run about 10% faster thanks to the extra registers and code structure changes. This will be the case without any code optimization: just due to recompilation. The SSE2 support may also add a few points.

Of course, the above mentioned estimates of the performance growth are preliminary and depend on the application. We all remember Athlon XP, which architectural innovations (not radical at all) allowed it to perform in some applications about 1.5 times faster than Thunderbird of the same core clock. But in most applications the growth was not significant - about 5-7%. The same thing may happen to Hammer: the high or low performance growth will depend on the specific application. The highest performance increase is expected from apps that make copious use of the memory and switch very often between the threads. Why? The answer is very simple: the changes in Hammer compared with Athlon XP were intended for tasks like that, that's clear.

Some time ago on the Web there appeared the first benchmarks of a Hammer sample working at 800MHz. We wrote about it in the news, look here for details. Now I'll mention the conclusions made:

ClawHammer L1 cache works slower than that in Athlon XP (not much slower);
L2 cache is a little faster (remember AMD promising to improve the cache?);
Quake3 test showed that ClawHammer works in this application as fast as Pentium 4 (Willamette) supporting twice as high core frequency. The built-in MCT must have told here a lot.
Note, that testing was performed with ordinary 32bit applications under 32bit Windows XP OS, that is, without the recompilation for x86-64. So, ClawHammer still has reserves to boost the performance even more.

Plans

So, 8th generation CPUs are going to make their first appearance as ClawHammer by the end of the year. First CPUs of the family will work at 1.6GHz core frequency and will have the rating of over 3000+. As we mentioned above, the processors will be sold under the Athlon brand with, possibly, some suffix added to the familiar name. But we shouldn't be very much excited about big amounts of ClawHammer CPUs coming to the market at the end of the year. Sources say that AMD will ship just a little quantity of its new 8th generation CPUs this year. Mass ClawHammer shipments are only going to start after the New Year.

Then in the first half of 2003 mass shipments of Opteron will take place. By that time ClawHammer will have reached the 4000+ rating. In the second half of the year the transition to 0.09micron technology will be underway and AMD will roll out the ClawHammer-S based processor. It's supposed to be a ClawHammer redesigned for the new manufacturing technology. As a result, ClawHammer-S die size will be 64sq.mm against the predecessor's 104aq.mm. The first ClawHammer-S will probably have a 4400+ rating. The third quarter will witness 0.09micron Hammer turning mobile. The first 8th generation mobile CPU will be rated about 3000+.

Conclusion: Intel's in Danger!

And now let's try to compare Hammer with the competing products from Intel. These are Pentium 4 processor as a rival to ClawHammer and Xeon as a rival to Opteron. We'll leave aside the future Prescott CPU core from Intel, as there's little information about it so far. Intel is rumored to be about to add x86-64 instructions support to Prescott (the Yamhill technology) as well as other innovations, which effect is hard to predict.

So let's get back to Pentium 4. What advantages does it have over Athlon? Look:

Work at much higher core clock resulting in better performance;
High-speed bus (533MHz) and memory (up to RDRAM PC1066) subsystems;
SSE2 instructions set;
Larger L2 cache.
So what has Hammer got to say to this? Firstly, Hammer will basically work at higher frequencies than Athlon XP thus getting closer to Pentium 4 (by the time Hammer hits the market, Pentium 4 working frequency will have reached 2.8-3GHz, according to the today's information). As for performance, Hammer is going to bring the laurels back to AMD in the high-end CPU sector
Artigo_Completo

Dominion · 13 de Setembro de 2002

Uhmm eu tinha lido em algum lugar k o bus do hammer seria qualquer coisa como 800Mhz not sure corrijam-me se tou enganado
sei k pelos planos k vi na altura em k o hammer iria sair teria um bus muito mais elevado k a concorrençia (Intel)

MEIA · 13 de Setembro de 2002

Estive a ler o artigo e parece-me que a vantagem que o P4 tem sombre o AMD XP é, acho eu, a tecnologia SSE2 vai desaparecer com Hammer. Agora é que a Intel tem que se por a pau com o seu Prescott.
Outra coisa que achei curioso é o tamanho do core. Vai passar a ser de aproximadamente 104sq.mm. Pelo que eles dizem vai reduzir o custo de fabrico, mas o que eu digo é que se este bicho aquecer tanto como o AMD XP, como vamos fazer a transferencia de calor?
Agora resta saber o oc que se pode obter dele e quando é que chega a pt, para ver se os antigos AMD XP baixam o preço

.

Eu decerto que n irei comprar esta geração de CPU logo que sair, vou esperar que as coisas arrefeçam e que lá para o fim de 2003, quando sair a "2º geração dos hammer" talvez compre um.

D1S0RD3R · 13 de Setembro de 2002

É esperar e ver!

Tecnoboy · 13 de Setembro de 2002

"(by the time Hammer hits the market, Pentium 4 working frequency will have reached 2.8-3GHz, according to the today's information)." -> Se assim fosse já ca tinhamos este novo processador, pois Pentium 4 a 2.8GHz já existem e será que estes novos processadores da AMD ainda vão utilizar a tactica do quantispeed ? Ou será que a AMD finalmente vai recomeçar a guerra plos MHz (Ou no nosso tempo corrente, plos GHz !) ?

oc_nightmare · 13 de Setembro de 2002

Hammer, quando chegas?

Korben_Dallas · 13 de Setembro de 2002

Originally posted by Tecnoboy
"(by the time Hammer hits the market, Pentium 4 working frequency will have reached 2.8-3GHz, according to the today's information)." -> Se assim fosse já ca tinhamos este novo processador, pois Pentium 4 a 2.8GHz já existem e será que estes novos processadores da AMD ainda vão utilizar a tactica do quantispeed ? Ou será que a AMD finalmente vai recomeçar a guerra plos MHz (Ou no nosso tempo corrente, plos GHz !) ?

Acho que comercialmente falando deveriam entrar na guerra pelos Mhz... o povao gosta e de numeros grandes e qt maior o numero melhor (mesmo q nao saibam o que e um Ghz) :rolleyes:

Tinha piada ver a comparaçao entre um Clawhammer a 3Ghz e um PIV a 3Ghz

Strakata · 13 de Setembro de 2002

E olhem aqui esta para meditar

I made a call to my friend working for AMD and got the following info regarding Hammer delay:

(1) Hammer's layout need to redesign. The old design cannot reach competitive speed at introduction. Need to add one extra layer of metal, just like Tbred-B. A new layout revision takes time.

(2) SOI process is not as stable as expected. The yield is still poor and volume production is still a very challeging task. AMD process people is improving SOI process but this is not easy. Even IBM still has problems with SOI but IBM can afford to have a low yield with high-end Power4 unlike AMD where cost is a major concern. The nice thing is AMD already has a non-SOI backup plan in case SOI doesn't deliver the promise.

(3) Integrated memory controller has proven to be a two-edge sword. It does improve performance by 10-20% compared with the same frequency CPU without independent memory controller. However, the integrated memory controller make the CPU less scalable (in term of frequency). Also current design have some stabability issues when performing at high frequency, but new a revision is in progress and hopefully can fix these issues.

(4) realistically Hammer will not become widely available until early or mid Q2. If non-SOI version is required then it will be further delayed until late Q2.

A verdade é que a AMD já mostrou quer o Claw quer o Sledge Hammer a trabalhar há várias semanas atrás. Mas também é verdade que estavam a sub-GHz.

Esse quote vale o que vale. Falta saber quanto

Raptor · 13 de Setembro de 2002

isto ta cronico para todos... apesar de tudo, parece que para a intel afinal não está tão mal... venha o barton para animar o mercado e dar "animo" (€€€€€€) à AMD para desenvolver os Hammer à vontade!

Hammer Preview

D1S0RD3R

Banido

Dominion

Banido

MEIA

Power Member

D1S0RD3R

Banido

Tecnoboy

Power Member

oc_nightmare

Power Member

Korben_Dallas

Zwame Advisor

Strakata

Portugal@Home Member

Raptor

OC Legend