Processador ARM for server


O SC indiano baseado nos cores V1 (Zeus), os 96 cores é dual socket

Screenshot-2023-05-15-at-11-13-28-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png

Screenshot-2023-05-15-at-11-13-33-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png

Screenshot-2023-05-15-at-11-13-39-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png

Screenshot-2023-05-15-at-11-14-17-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png

Screenshot-2023-05-15-at-11-14-31-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png


inclui ainda uma espécie de comparação/preview com os actuais "super CPU"

Screenshot-2023-05-15-at-11-14-04-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png

Screenshot-2023-05-15-at-11-14-47-Aum-HPC-Processor-pptx-Aum-HPC-Processor-pdf.png

https://amritmahotsav.negd.in/presentation/day5/Aum HPC Processor.pdf


Há uma outra apresentação sobre o tema do HPC na Índia que inclui um roadmap

Screenshot-2023-05-15-at-11-25-31-Exa-scale-Vision-of-India-pptx-Exa-scale-Vision-of-India-pdf.png

https://amritmahotsav.negd.in/presentation/day5/Exa-scale Vision of India.pdf

que indica que estão a desenvolver igualmente um acelerador próprio, CPU embedded baseados em cores RISC-V e uma parceria com a Intel IFS além da TSMC.

Pelos vistos este AUM terá um sucessor AUM II, com a inevitável referência "AI" (baseado no anunciado V2?), mas lá para 2028 a indicação é de que o SoC será, aparentemente, baseado em RISC-V 🤔
 

AmpereOne with 192 Cores 128x PCIe Gen5 Lanes and DDR5 in 2023


Today Ampere discussed its new server processor roadmap, including the new AmpereOne processors in its annual update.
The AmpereOne no longer uses stock Arm Neoverse cores. Instead, it is a custom core designed by Ampere. It still uses the Arm ISA, but it is Ampere’s core. It also has a host of new features like DDR5 and PCIe Gen5.
Ampere-AmpereOne-Announcement-Overview-696x327.jpg

Ampere is increasing the core count. AmpereOne will cover 136-192 cores, leaving Altra/ Altra Max to core counts of 128 and below. The cache is doubled. PCIe Gen5 doubles the bandwidth. We asked, Ampere still has 128 total lanes in dual socket configurations, but the focus now is really on single socket since that is how their chips are being deployed. That being said, we saw a dual-socket design at OCP Summit 2022.
Ampere also has a higher TDP in this generation and adds new features like Bfloat16, confidential computing, nested virtualization, and more.
Ampere-Altra-to-AmpereOne-Products-696x394.jpg

Here is a bit more on the cores. The L1 cache is changed along with the 2MB of private L2 cache. That should help performance significantly. Ampere’s architecture is aligned to its go-to-market. The company is focused on cloud instances where each core has its own private cache and not sharing resources is important.
Ampere-AmpereOne-Cloud-Native-Core-696x390.jpg

Ampere’s design also has a sea of cores at the center, then memory and I/O in chiplets on the edge. One can think of Ampere’s design as almost like a reverse EPYC Rome, Milan, Genoa.
Ampere’s list of partners is growing. Notably absent are Dell and Lenovo. We have reviewed systems from companies like Wiwynn, GPU systems from Gigabyte, and even edge GPU systems from Supermicro.
Ampere-AmpereOne-Partners-696x385.jpg

https://www.servethehome.com/ampereone-with-192-cores-128x-pcie-gen5-lanes-and-ddr5-in-2023-arm/



Ampere Gets Out In Front Of X86 With 192-Core “Siryn” AmpereOne​


This is precisely the idea that Ampere Computing was founded upon way back in early 2018, and after five long years of development, the company is bringing a super-dense chip to market based on its own Arm core design, which we have dubbed A1 and of which 192 of them are being plunked down into single chiplet inside the now-shipping “Siryn” AmpereOne processor that we and the market have been anticipating for years.
ampere-computing-cpu-roadmap-2022-augmented.jpg

As expected, the Siryn compute engine is employing a chiplet design, just like the Graviton3 from AWS, and just like Graviton3, the Siryn design puts all of the cores on one die and puts memory and I/O controllers on separate dies that wrap around it. The core complex of the Siryn chips is etched using 5 nanometer processes from Taiwan Semiconductor Manufacturing Co, which is a mature enough process to get reasonable yield all the way up to 192 cores. That said, the Siryn compute engine looks to have a sweet spot at 160 cores, which is the level at which Ampere Computing is running a bunch of its early comparative benchmark tests.

The number 192 is significant because it is a base 2 number that is halfway between 128 and 256, and while Jeff Wittich, chief product officer at Ampere Computing, proudly showing off the AmpereOne complex below, would not tell us the A1 core grid array dimensions.
ampere-one-package-wittich.jpg

The SKU stack and pricing is not being made available just yet to the public.

Neither is the exact configuration of those memory and I/O dies, but we do know that there are I/O dies that are separate from memory dies so these can scale independently from each other and from the cores on the compute chiplet. The initial AmpereOne chip has eight DDR5 memory channels, which can support two DIMMs per channel running at 4.8 GHz,
The A1 core in the Siryn design uses a mix of ArmV8 and ArmV9 instructions, which makes sense given the timing of the chip. The Siryn core is single-threaded, keeping to Ampere Computing’s philosophy of having deterministic performance and absolute isolation for its cores and foreswearing simultaneous multithreading to achieve these goals.

Each A1 core has 64 KB of L1 data cache and a fairly skinny 16 KB of L1 instruction cache. Each A1 core has a private 2 MB L2 cache, which is twice as fat as the L2 caches on the Neoverse N1 and N1+ cores used in the prior Altra and Altra Max CPUs. That is 384 MB of L2 cache, which is pretty beefy. The Siryn complex has an additional 64 MB system level cache, which is not really an L3 cache proper front ending the cores but more of a backend cache that hangs off the memory, as Wittich put it. (That presumably means that this memory is spread across the memory controllers, as IBM has done in certain Power and System z designs.)
The A1 core has a pair of 128-bit vector units like the Altra and Altra Max N1 and N1+ cores did, which support the same FP64, FP32, FP16, and INT8 data formats and processing precisions, but the pair of homegrown 128-bit vector units in the A1 core also support the BF16 “Bfloat 16” variant of 16-bit half precision processing created by Google’s DeepMind unit and commonly added for AI training and inference to many CPUs, GPUs, and NNPs.
Here are some other new features in the Siryn core:

  • Memory and SLC QoS Enforcement: This is important as the grid of cores grows in size.
  • Nested Virtualization: This is a requirement for Internet service providers, who often want to run their cloud services as an overlay on top of one of the big clouds, as well as for enhancing security by putting a hypervisor inside of a VM, as Google famously does with its Google Cloud.
  • Fine Grained Power Management: We need more and more of this to keep the wattages low.
  • Advanced Droop Detection: Figures out if you did a long blink reading this, which you did not.
  • Process Aging Monitors: This one is neat, and is an engineering specialty of Wittich. All silicon ages over time, where the minimum voltage of the chip (Vmin) goes up over time and that takes a hit on the maximum frequency (Fmax) the chip can run at has to come down. But there is a way to give the transistors a little juice goose – like taking testosterone – so the Fmax stays up.
  • Secure Virtualization: Mechanisms for providing isolation for virtual machines in a multitenant environment.
  • Single-Key Memory Encryption: Important for machines located in places, like out at the edge, where enterprises don’t necessarily have absolute physical control of a server.
  • Memory Tagging: This one has been a big ask from customers for a long time, and something that IBM’s Power and System z processors have had for a very long time. Memory tagging is like a role-based access to memory locations for applications, so you can’t just blast into main memory with a buffer overflow attack. X86 chips do not have tagged memory as yet, but if the hyperscalers and cloud builders are asking Ampere Computing to add it, then they are asking Intel and AMD to add it.
https://www.nextplatform.com/2023/0...n-front-of-x86-with-192-core-siryn-ampereone/
 
It is here where the comparisons get really unfair. In the Stable Diffusion test, AmpereOne is configured with 160 cores, 512GB of DDR5 and Linux kernel 6.1.10, while AMD’s Genoa is hamstrung with half the memory at 256GB and only populating 8 of its 12-channel memory capability, on an older 5.18.11 kernel. Furthermore, they only spawned 96 threads for Genoa, while they spawned 160 threads for AmpereOne. Genoa generally requires all 12 channels of memory and 192 threads (utilizing SMT) to achieve maximum performance.

The disingenuous settings don’t stop there. With the DLRM test, there was an additional difference in that AmpereOne used FP16 data formats while their AMD system was configured to use the FP32 data format. Higher precision data puts even greater memory pressure on these memory-bound AI workloads and hurts performance further.

This is really inexcusable as both processors support the reduced precision BF16 format, which could have been used.
Curiously, AmpereOne is shown to pull far more power in AI workloads than SPEC integer rate. While 192-cores in SPEC pulled 434W system power, 160-cores in Stable Diffusion pulled 534W. The opposite is shown for Genoa, pulling 624W in SPEC but only 484W in Stable Diffusion. This would indicate a performance optimization deficit between these processors. It would also indicate that Genoa could be underutilized due to Ampere’s tricks.
https://www.semianalysis.com/p/sound-the-siryn-ampereone-192-core

Isto já é bastante mau no mercado consumidor, no mercado empresarial, não só é mau, como é um tiro no pé.


Board de referencia:
3R6Anjs.jpg


8Wiz2Zm.png


E está a ser preparada uma versão com 12 canais de memória:
OtyiGB9.png
 
O artigo da servethehome na parte final, ao analisar os números dos slides também foi ler as entrelinhas e chamou a atenção para isso, daí também não os ter colocado.
 
Benchmarks do Baikal-S. O processador Russo com 48 Cores ARM A75, 6 canais de memória, 80 canais Pci-Ex Gen4, 16 nm, TDP 120W.
O A75 não é propriamente recente e a comparação é feita com um Skylake com 20 Cores e um Zen1 com 16.
Tendo em conta o Cores que é, o processo de fabrico e o TDP, não está mau.

fyJixq2.jpg


pHtfMxD.jpg


In a sense, summarizing the test results, the developers of Baikal-S admit that this chip is not a leader in terms of price and quality ratio. “However, this product can become on a par with eminent competitors of the world's leading manufacturers,” they add. - And given the fact that the processor is made in a socket form factor, and a board for a two-socket version is already ready, and in the future it will appear for a four-socket version, the new product has every chance of arousing interest among Russian customers planning to upgrade their server and storage fleet in today 's conditions sanctions restrictions.
Continuing the topic of sanctions , company representatives report that, despite the difficulties with mass production of the current version of Baikal-S beyond its control, its developers are working on a new generation Arm v9 Baikal -S2 chip, which should be released in the II-III quarter of 2025 and demonstrate five to six times higher performance than Baikal-S.

“It is planned that Baikal-S2 will be manufactured using 6 nm technology, will have 128 Arm Neoverse-N2 cores, a frequency of 3 GHz, and support up to eight channels of DDR5 memory,” the developers say./quote]
https://www-cnews-ru.translate.goog...=ru&_x_tr_tl=en&_x_tr_hl=en-US&_x_tr_pto=wapp
 

Arm Neoverse V2 at Hot Chips 2023​

At Hot Chips 2023 (35) Arm showed off the Neoverse V2 cores that are known to power NVIDIA Grace CPUs.
The Arm Neoverse V2 is part of the current generation of Neoverse solutions. Arm is working to provide reference cores for the data center and infrastructure markets. Neoverse V2 is more of a high-performance data center CPU core while N2 is more for infrastructure.
Arm-Neoverse-V2-HC35_Page_03-696x391.jpg



Arm-Neoverse-V2-HC35_Page_04-696x391.jpg


Arm says these V2 changes combine for around a 13% increase in performance per core over V1. The numbers for each of the sections do not equal a 13% increase if added. That is because some changes impact others so the total is smaller than adding each individual area of improvement.
Arm-Neoverse-V2-HC35_Page_12-696x391.jpg

Arm says that shrinking from TSMC 7nm to 5nm the new cores only use around 17% more power and are roughly the same area, despite doubling the L2 cache. It is interesting that Arm on the above slide says V2 is 13% faster but below uses 16.666% more power.
Arm-Neoverse-V2-HC35_Page_13-696x391.jpg



Here are the assumptions for the performance results:
Arm-Neoverse-V2-HC35_Page_15-696x391.jpg


Here is the platform summary.
Arm-Neoverse-V2-HC35_Page_22-696x391.jpg

https://www.servethehome.com/arm-neoverse-v2-at-hot-chips-2023/


também voltou o N2, mas neste caso a "diferença" é apenas a forma de disponibilização do IP, este N2 CSS permite licenciar mais que os "cores"



Arm Neoverse CSS Makes Neoverse N2 Cores Drop-in at Hot Chips 2023​

At Hot Chips 2023, Arm showed off a new way to implement Neoverse N2 cores. Instead of just licensing N2 core IP, Arm Neoverse Compute Subsystem or Neoverse CSS allows customers to buy larger IP blocks to drop into designs.
Arm-CSS-N2-HC35_Page_02-696x391.jpg

Arm-CSS-N2-HC35_Page_03-696x391.jpg

Neoverse CSS has fully validated RTL tuned and ready to implement into designs.
Arm-CSS-N2-HC35_Page_04-696x391.jpg

The first Neoverse CSS product is the Neoverse CSS N2. This uses Arm’s scale-out Neoverse N2 cores and allows companies to pick clusters of cores and implement them in designs.
N2 scales from 24, 32, and 64 core designs per chip. It has interfaces to connect to DDR5, LPDDR5, PCIe/ CXL, and other types of IP.
Arm-CSS-N2-HC35_Page_06-696x391.jpg

Here is the block diagram. Arm is using CMN-700 here to tie different components together. CSS is compliant to Arm standards out of the box which makes sense.
Arm-CSS-N2-HC35_Page_07-696x391.jpg

https://www.servethehome.com/arm-neoverse-css-makes-neoverse-n2-cores-drop-in-at-hot-chips-2023/
 
Arm is showing its integer performance. On a pre-briefing call, our Chief Analyst, Patrick, asked about the difference between the two estimates results as the right chart is labeled “SPECrate” but neither are labeled as base or peak. Arm was not able to confirm this. Our best guess is that the left chart is base and the right chart is peak but that is just a guess since Arm was not able to confirm what they were showing.
It is amazing that a CPU company was unable to answer this.
:facepalm:
A apresentação dos Ampere One já tinha sido má, mas a própria ARM não saber que números apresenta é "interessante".......
Arm Neoverse CSS Makes Neoverse N2 Cores Drop-in at Hot Chips 2023
Basicamente, criaram um "Bundle"?


Já agora, uma foto do Ampere One no Socket, onde a die fica exposta (Volta Athlon, estás perdoado!!!! :D):
YZ10cv9.jpg
 
Depois do Neoverse CSS o ARM Total Design... :n1qshok:


Harnessing the Power of the Ecosystem in the Era of Custom Silicon on Arm​

Today we’re taking another step forward by bringing the wider semiconductor industry together to innovate around Arm’s foundational compute subsystems with Arm Total Design, an ecosystem committed to frictionless delivery of custom SoCs based on Neoverse CSS. It unites industry leaders – ASIC design houses, IP vendors, EDA tool providers, foundries, and firmware developers – to accelerate and simplify the development of Neoverse CSS-based systems. Partners within the Arm Total Design ecosystem will benefit from preferential access to Neoverse CSS, enabling them to innovate, drive rapid time to market, and lower the cost and friction of building custom silicon, for everyone.
Together we are delivering:
  • Pre-integrated, validated IP and EDA tools from partners like Cadence, Rambus, and Synopsys to help accelerate silicon design and the integration of things like memory, security, and peripherals
  • Design services from partners including ADTechnology, Alphawave Semi, Broadcom, Capgemini, Faraday, Socionext, and Sondrel, who are ready to support the ecosystem with expertise on Neoverse CSS, other Arm IP and methodology
  • Technology optimized for leading-edge process nodes and advanced packaging techniques from foundry partners, including Intel Foundry Services and TSMC
  • Commercial software and firmware support for Neoverse CSS from leading infrastructure firmware providers like AMI
In addition to unlocking greater accessibility to custom silicon, Neoverse CSS is also evolving to support emerging chiplet technology. By collaborating with Arm Total Design members and the broader ecosystem on AMBA CHI C2C, UCIe and other initiatives, Arm is facilitating industry-wide alignment on the fundamental interfaces and system architectures that will enable innovation around multi-die chiplet SoC designs. One great example is the multi-core CPU chiplet from Socionext, which adopts Neoverse CSS technology and is being developed on TSMC 2nm to target server CPUs, data center AI edge servers, and 5/6G infrastructure.

From hardware and software partners to foundries and leaders in EDA technology, the Arm Total Design ecosystem is bringing expertise from across the semiconductor design and manufacturing industry to accelerate the path to custom, workload-optimized silicon. Working together, we will ensure broad accessibility to performant, efficient solutions that will help meet the insatiable demand of an AI-accelerated future.
https://newsroom.arm.com/news/arm-total-design-ecosystem
 
Afinal...

Microsoft Azure delivers purpose-built cloud infrastructure in the era of AI​


we’re introducing our first custom in-house central processing unit series, Azure Cobalt, built on Arm architecture for optimal performance or watt efficiency, powering common cloud workloads for the Microsoft Cloud. From in-house silicon to systems, Microsoft now optimizes and innovates at every layer in the infrastructure stack. Cobalt 100, the first generation in the series, is a 64-bit 128-core chip
https://azure.microsoft.com/en-us/b...-built-cloud-infrastructure-in-the-era-of-ai/


Arm Collaborates with Microsoft on Custom Silicon to Unlock Sustainable, AI-Driven Infrastructure​

Microsoft is now announcing Azure Cobalt 100, the first generation in the Azure Cobalt CPU series, designed to tackle the biggest and most complex challenges the infrastructure will face from AI to sustainability. Azure Cobalt 100 fully uses the benefits delivered through the Neoverse CSS platform and a robust software ecosystem developing on Arm, allowing Microsoft more time to focus on adding unique innovation and optimization while saving significant development effort.

Through Neoverse CSS, Arm is making it easier than ever for the industry to innovate. For one of the largest hyperscalers in the world, this enabled a strategic decision to rethink what computing looks like, and to build and deploy their own custom-built silicon, powered by Arm Neoverse and the comprehensive technology platform we deliver.
https://newsroom.arm.com/news/microsoft-custom-silicon-on-arm


O servethehome diz que é um um N2, o que faz sentido, falta saber qual é, ao certo, a parte "custom"

Microsoft Azure Cobalt 100 128 Core Arm Neoverse N2 CPU Launched​

Microsoft-Azure-Cobalt-100-Cover-696x496.jpg

https://www.servethehome.com/microsoft-azure-cobalt-100-128-core-arm-neoverse-n2-cpu-launched/
 
Última edição:

AWS Adopts Arm V2 Cores For Expansive Graviton4 Server CPU​

aws-graviton4-selipsky.jpg

The Graviton4 package has 96 of the V2 cores on it, which is a 50 percent boost over the Graviton3 and Graviton3E, and the Graviton4 has twelve DDR5 controllers on it compared to the eight DDR5 memory controllers, and the speed of the DDR5 memory used with the Graviton4 was boosted by 16.7 percent to 5.6 GHz. So the math across all of that, and the Graviton4 has 536.7 GB/sec of memory bandwidth per socket, which is 75 percent higher than the 307.2 GB/sec offered by the prior Graviton3 and Graviton3E processors.
Now, that could mean that AWS has implemented simultaneous multithreading (SMT) in the V2 cores, providing two threads per core, like X86 processors from Intel and AMD do and that some Arm chips have done in the past.

We don’t think so, and our comparative salient characteristics table below says there are 96 threads and not 192 threads per socket. We think it is 96 threads per socket and that the doubling of the L2 cache per core to 2 MB has had a dramatic effect on the performance of Java and database applications. You can get 3X the vCPUs by adding two-way SMT, but that would not give you 3X the memory. It would still only be 1.5X the memory compared to a Graviton3 chip.
aws-graviton4-chip-shot-2048x1462.jpg

So we think, looking at the chip shot above, that the Graviton4 is a two-chiplet package, with one chiplet rotated 180 degrees from the other. That is probably why the memory controller chiplets on the left and right of the central core complex on the package are offset from each other.

Here is how we think the Graviton4 stacks up to the prior generations of chips:
Screenshot-2023-11-29-at-13-04-03-AWS-Adopts-Arm-V2-Cores-For-Expansive-Graviton4-Server-CPU-The-N.png

https://www.nextplatform.com/2023/11/28/aws-adopts-arm-v2-cores-for-expansive-graviton4-server-cpu/
 
O nVidia Grace+LPDDR5X, com o mesmo core, mesmo processo de fabrico, só com 72 Cores em vez de 96, mas mais Cache e IO, tem um TDP de 250W e esse Graviton4 tem um TDP de 130W? Custa-me a acreditar nesse valor.
O que serão aqueles 2 Chips nas pontas? IO?
Outro ponto que me faz alguma confusão é suportar 2 Sockets. Parece-me ir em sentido contrário do resto do mercado. Com os aumentos nos últimos anos de cores, canais de memória, IO, cada vez há menos casos em que sejam precisos mais que 1 Socket, até pelos problemas levanta quando se usa mais que 1 Socket.
 
Deduzo que seja uma questão de clocks, tens aí até o exemplo do Graviton3 e 3E, em que a diferença de clocks do 1º (2.6GHz) para o 2º (3.5GHz) aumenta o TDP de 100 para 240W.
 
giphy.gif


Arm Neoverse N3 and V3 with CSS Launched​

In the announcement, Arm focused on both Neoverse V3 and Neoverse N3 cores, but also expanding its CSS solutions to the V3 line after it launched with the N2 line.
Arm-Neoverse-Launch-2024-Neoverse-N3-and-V3-Roadmap-696x392.jpg

Something Arm did not discuss on the pre-briefing in detail is that there are now E3 cores. They just get a mention on this slide.

Arm also gave us the codenames for next-gen CSS V-series (Vega) and N-series (Ranger) platforms, and what will presumably be the Neoverse V4 “Adonis” and N4 “Dionysus” products.
Arm-Neoverse-Launch-2024-Neoverse-Roadmap-696x392.jpg


Arm is putting a big effort on selling CSS solutions since it gets to sell more IP. Arm will have up to 32 cores and down to 8 cores and can get the 32-core version down to 40W TDP.
Arm-Neoverse-Launch-2024-Neoverse-CSS-N3-Specs-696x392.jpg

Arm says that the new solution is up to 20% more efficient than its N2 core on a performance per watt basis. Arm did not make end notes for this 20% claim.
Arm-Neoverse-Launch-2024-Neoverse-N3-20-Percent-Perf-Improvement-Per-Watt-696x392.jpg


Arm Neoverse V3 and CSS V3​

The Arm Neoverse CSS V3 is really interesting. First, the performance claim is a 50% increase in per-socket performance, but that does not account for power. So moving from the smaller efficient N-series core to the larger V-series core, and without a power limit. This claim, however, did not have an end note on how it was measured.
Arm-Neoverse-Launch-2024-Neoverse-V3-Performance-696x392.jpg

The Neoverse CSS V3 is 64 cores per cluster and up to 128 cores per socket and supports modern features like PCIe Gen5, CXL 3.0, and even HBM3. We do not know, for example, if the HBM3 support was used to get the 50% claim above because Arm did not say how that figure was reached.
Arm-Neoverse-Launch-2024-Neoverse-CSS-V3-Sockets-696x392.jpg

One of the big features is that Arm says that it can help provide a NVIDIA Grace Hopper style compute platform for its customers if they have their own AI accelerator. Arm’s goal is to make the CPU compute side easy with CSS V3, albeit with fewer cores than NVIDIA’s 72 core Grace hemisphere.
Arm-Neoverse-Launch-2024-Neoverse-CSS-for-AI-696x392.jpg


Arm Neoverse V3 and N3 Performance​

Arm says that with core upgrades and software optimizations it can achieve big gains in things like xgboost.
Arm-Neoverse-Launch-2024-XGBOOST-Perf-Components-696x392.jpg

In simulation, it can achieve more performance across the board. Here we notice that the gains are often in the 9-16% for Arm Neoverse V2 to Neoverse V3 and 9-30% for the Neoverse N2 to Neoverse N3 gains. The outlier is the work that Arm put into xgboost which is the AI data analytics.
Arm-Neoverse-Launch-2024-Neoverse-V3-Perf-Claims-696x392.jpg

Here is the generational comparison Arm is doing versus Intel and AMD.
Arm-Neoverse-Launch-2024-Neoverse-V2-Perf-Claims-696x392.jpg

https://www.servethehome.com/arm-neoverse-n3-and-v3-with-css-launched/



Arm Neoverse Roadmap Brings CPU Designs, But No Big Fat GPU​

arm-neoverse-roadmap-2024-1.jpg

Here is what we do know. The CSS N3 package starts with a block of 32 N3 cores and has a pair of DDR5 memory controllers, a pair of I/O controllers, and optional die-to-die interconnects to create compute complexes that have what we expect will be two complexes glued together to create a socket, yielding 64 cores. Those N3 cores are build to the latest Armv9.2 specification.

The process technology was not announced for the N3 cores or the CSS N3 package, but we believe it will have options for 5 nanometers and 3 nanometers from TSMC and whatever analogs there are at Samsung and Intel.
https://www.nextplatform.com/2024/02/21/arm-neoverse-roadmap-brings-cpu-designs-but-no-big-fat-gpu/
 
cs8fuvT.png


Nesta apresentação deste dois cores, não percebo o que mudou entre esta geração e a anterior, para que "Integer performance" e "Video Processing" só tenham um aumento de +/- 10% e AI data analytics, que suponho seja AI inference, tenha aumentos de 84 e 196%.
A geração anterior já suportava bfloat16, int8, etc.
O que é que me está a escapar?
 
Back
Topo