Processador ARM for server

Arm-based ***** Cloud T-Head Yitian 710 Crushes SPECrate2017_int_base​


We have our first look at a next-generation Arm v9 CPU supporting new features like PCIe Gen5 and DDR5. The T-Head Yitian 710 is ***** Cloud’s Arm offering that is expected to be available in September 2022, has an official SPEC CPU2017 integer score listed, and it is a monster result for this 128-core processor.
Yitian-710-SPECrate2017-int-base-Result.jpg

Since the ***** T-Head Yitian 710 result was formally submitted, reviewed, and published in the official results list, we can only use those results to compare other processors. As a single socket solution, the Ampere Altra only has the 80 core model in a Gigabyte server at 301.
Gigabyte-Ampere-Altra-Q80-33-SPECrate2017-int-base-Result.jpg

That is 3.763/ core. *****’s new generation is 3.984/core so it would likely be slightly above the range of an official Ampere Altra Max 128 core 1P system score.

Just for some reference point, an ASUS AMD EPYC 7773X (“Milan-X”) CPU with 64 cores has published results of 440 or 6.875/ core, but with half as many cores and older generation DDR4 (albeit with a larger L3 cache.)
ASUS-AMD-EPYC-7773-X-SPECrate2017-int-base-Result.jpg


Final Words
This is very exciting to see that the ***** Cloud team was able to achieve solid numbers with its next-generation PCIe Gen5 and DDR5 chips. While these new T-Head Yitian 710 chips are hitting performance numbers ~16% higher than Milan-X, AMD Genoa‘s top-bin SKUs should offer significant uplift even in this benchmark well beyond 16%. Also, while one may be quick to say that ***** will be faster than Ampere just based on these results, Ampere’s next generation is a custom-designed core so hopefully, they will bridge the small performance gap between ***** Cloud’s next-generation and Ampere’s 2020 generation with AmpereOne.
https://www.servethehome.com/arm-based-*****-cloud-t-head-yitian-710-crushes-specrate2017_int_base/
 
É preciso ter "cuidado" com esse resultado do Yitian 710.
Código:
      L3:     64 MB I+D on chip per die (128 MB per chip)
Dá ideia que o Chip usa 2 dies, muito provavelmente, fazendo a mesma coisa que o Xeons 9200 e o nVidia Grace, em que, na prática, tem-se um sistema Dual Socket no mesmo chip. Além dos problemas de performance que isso levanta, é bastante provável que não seja possível ter servidores Dual Socket (256 Cores), com este Chip. Isto torna os resultados menos impressionantes, porque a comparação mais justa seria som sistemas Dual Socket Xeon/Epyc.
Código:
      NUMA node0 CPU(s):   0-31
      NUMA node1 CPU(s):   32-63
      NUMA node2 CPU(s):   64-95
      NUMA node3 CPU(s):   96-127
Apenas por curiosidade, dá ideia (pelo menos foi configurado na BIOS com 4 NUMA Nodes), que cada die está dividida em 2 grupos de 32 cores. Um pouco com os 2 CCXs dentro de um CCD que existiam nos Zen e Zen2.
Ali’s new generation is 3.984/core so it would likely be slightly above the range of an official Ampere Altra Max 128 core 1P system score.

Just for some reference point, an ASUS AMD EPYC 7773X (“Milan-X”) CPU with 64 cores has published results of 440 or 6.875/ core, but with half as many cores and older generation DDR4 (albeit with a larger L3 cache.)
Visto que duvido que seja possível ter mais que 1 Socket neste CPU, aqui mostra a grande distancia a que está este Core ARM do Zen3, pelo menos neste benchmark (que pode não ser muito importante para as empresas de Cloud).
Um sistema só com 1 Socket Epyc Zen4, quase de certeza irá ter melhor resultado total e a versão com VCache, ainda mais.

Outro detalhe que pode ser ou não favorável a este Yitian 710. Não se sabe com que TDP é conseguido este resultado e que custo tem aquele processador.
 

The Forbidden Arm Server that is Banned in the US​


we have a Huawei TaiShan server to show you. What makes this server more interesting than being banned in the US, is that the CPUs it uses are Arm server CPUs from Huawei/ HiSilicon. Specifically, this is a Huawei/ HiSilicon Kunpeng 920 server that was one of the first Armv8 server CPUs with up to 64 cores and PCIe Gen4 support.
Huawei-TaiShan-200-2280-CPU-and-Memory-with-Shroud-and-Broadcom-SAS.jpg

We get two CPUs with large heatsinks and 16x DIMMs for each CPU. That means 8-channel memory and a total of 32 DIMMS.
Huawei-TaiShan-200-2280-CPU-and-Memory-Area-1.jpg

The CPUs themselves are Huawei, HiSilicon Kunpeng 920 models. They are 48 core 2.6GHz CPUs. Kunpeng 920 models scaled to 64 cores, but this is what we can get. As we get the system sorted, we will have more formal performance figures, but our 48 core models are roughly equivalent to Cascade Lake 24 core Xeon models in integer workloads. There is more memory bandwidth available and PCIe in this platform.

Huawei Kunpeng 920 2x 48c Lscpu Output

Huawei-Kunpeng-920-2x-48c-lscpu-output.jpg


Here is the lshw output for the CPU that we found to be a Kunpeng 920-4826 or a 48 core 2.6GHz part:

Screenshot-2022-08-06-at-01-12-37-The-Forbidden-Arm-Server-that-is-Banned-in-the-US.png


Huawei-HiSilicon-Kunpeng-920-next-to-Ampere-Altra-Max-M128-30-1.jpg

https://www.servethehome.com/the-forbidden-arm-server-that-is-banned-in-the-us/

 
As we get the system sorted, we will have more formal performance figures, but our 48 core models are roughly equivalent to Cascade Lake 24 core Xeon models in integer workloads.
Sendo o Cascade Lake um processador 14 nm baseado no Skylake, isto não é propriamente espectacular. Além disso, é bastante provável que a performance em fp seja ainda pior.

zsbgBsH.jpg

"Huawei TaiShan 200 2280 Huawei Hydra CPU 1"

Hydra é o interconnect usado entre os processadores e a board tem 2 conectores externos com este interconnect. Não é comum e seria interessante saber para que são usados.

3k0dv7u.jpg

"Huawei TaiShan 200 2280 4x Realtek RTL8211 OCP NIC 3.0 Card In 1"

O servidor tem Controladoras SAS da Broadcom, por isso é bastante estranho estar a usar NICs gigabit da Realtek. É mesmo muito raro ver-se NICs Realtek em servidores, mesmo nos mais baratos, sem ser nos NICs de gestão da BMC.
O servidor tem NICs 25 Gbits, por isso estes NICs gigabit serão considerados "secundários". Mesmo assim é uma escolha muito estranha.
 

Qualcomm Is Plotting a Return to Server Market With New Chip​

  • Amazon has agreed to take a look at company’s offerings
  • Qualcomm abandoned an earlier foray into market four years ago
Qualcomm Inc. is taking another run at the market for server processors, according to people familiar with its plans, betting it can tap a $28 billion industry and decrease its reliance on smartphones.

The company is seeking customers for a product stemming from last year’s purchase of chip startup Nuvia, according to the people, who asked not to be identified because the discussions are private. Amazon.com Inc.’s AWS business, one of the biggest server chip buyers, has agreed to take a look at Qualcomm’s offerings, they said.
https://www.bloomberg.com/news/arti...tting-a-return-to-server-market-with-new-chip

Por vários motivos, penso que isto já tinha sido acordado quando a Nuvia foi comprada pela Qualcomm, mas só agora anunciado.
Esses motivos são, a Qualcomm ter desistido deste mercado, quando acordou com os accionistas cortar custos, quando tinham como objectivo rejeitar a compra por parte da Broadcom e por não fazer sentido a Nuvia ter aceitado ser comprada pela Qualcomm, só para criarem Processadores para o mercado consumidor, quando a saída daquelas pessoas da Apple e criação da Nuvia, foi com o objectivo de criarem uma empresa que criasse Processadores para o mercado de Servidores.

Vamos ver o que isto dá e se outros players irão atrás (Broadcom, Marvell, AMD, Intel, etc).
 

Azure Virtual Machines with Ampere Altra Arm–based processors—generally available​


The new virtual machines will be generally available on September 1, and customers can now launch them in 10 Azure regions and multiple availability zones around the world. In addition, the Arm-based virtual machines can be included in Kubernetes clusters managed using Azure Kubernetes Service (AKS). This ability has been in preview and will be generally available over the coming weeks in all the regions that offer the new virtual machines.
We have been working with the open-source community and various independent software vendors (ISVs) to make several Linux OS distributions including Canonical Ubuntu, Red Hat Enterprise Linux, SUSE Enterprise Linux, CentOS, and Debian available on the new Arm-based Azure Virtual Machines. We will also add support for Alma Linux and Rocky Linux in the future.
The Azure Arm-based virtual machine families include:
  • Dpsv5 series, with up to 64 vCPUs and 4GiBs of memory per vCPU up to 208 GiBs,
  • Dplsv5 series, with up to 64 vCPUs and 2GiBs of memory per vCPU up to 128 GiBs, and
  • Epsv5 series, with up to 32 vCPUs and 8GiBs of memory per vCPU up to 208 GiBs.
All the new virtual machine sizes support up to 40 Gbps of networking bandwidth; Standard SSDs, Standard HDDs, Premium SSDs, and Ultra Disk Storage can be attached to the virtual machines. Dpdv5, Dpldv5, and Epdv5 virtual machine series also include fast local-SSD storage. Virtual Machine Scale Sets are also supported. Monitor your virtual machines and protect your data with Azure Monitor and Azure Backup.
The Ampere Altra Arm–based Azure virtual machines are now available in the US (West US 2, West Central US, Central US, East US, East US 2), Europe (West Europe, North Europe), Asia (East Asia, Southeast Asia), and Australia (Australia East) Azure regions. We plan to expand Azure regional availability after September 1.
https://azure.microsoft.com/en-us/b...tra-arm-based-processors-generally-available/
 
Afinal o V1 não vai ser um "one off", a ARM acabou de fazer um update ao roadmap e supresa... vai haver um V2 e sucessor com o V2 a estrear no Nvidia Grace :n1qshok:

Redefining the global computing infrastructure with next-generation Arm Neoverse platforms​

News highlights:
  • Arm marks a new beginning for the world’s computing infrastructure with additions to its Arm Neoverse roadmap including Neoverse V2 (codenamed “Demeter”)
  • Neoverse V2 is the latest Arm core targeted at providing leadership per-thread performance for cloud, hyperscale and HPC workloads
  • Industry leaders across cloud, 5G, HPC and edge choose Arm Neoverse as the compute foundation for their next generation infrastructure solutions
We heard from our hyperscale and HPC customers that they needed to further push cloud workload performance without requiring more power and area. Our response is the Neoverse V2 Platform (“Demeter”) featuring our newest V-series core and the widely deployed Arm CMN-700 mesh interconnect. Neoverse V2 will deliver market-leading integer performance for cloud and HPC workloads and introduces several Armv9 architectural security enhancements.

2022%20Roadmap%20image%20for%20news%20blog.jpg

Today we already have multiple partners with designs based on Neoverse V2 in progress, one of which is Nvidia who is leveraging V2 as the compute foundation for their Grace datacenter CPU. Grace will combine the power efficiency of V2 with the power efficiency of LPDDR5X memory to deliver 2x performance per watt over servers powered by traditional architectures.
https://www.arm.com/company/news/20...e-with-next-generation-arm-neoverse-platforms


Arm Neoverse V2 Cores Launched for NVIDIA Grace and CXL 2.0 PCIe Gen5 CPUs​

The Arm Neoverse V2 platform is a new Armv9 core that is designed for higher performance than the N2 cores. Usually the N core launches, then the V core so we had N1, V1, N2, and now V2.
Arm-Neoverse-Q3-2022-Neoverse-V2-Platform-1-696x391.jpg

One of the other big changes is the new (up to) 2MB L2 cache. For cloud focused chips, L2 size is important. AMD is focused on L3 cache as we saw with AMD Milan-X. Adding more L2 cache provides more performance, but it also increases the size of a core.

Part of the Neoverse V2 platform is being able to add not just many cores, but also build a larger system. That means adding features like CXL 2.0 support, security features, and large caches/ memory with DDR5 and LPDDR5(X) support.
Arm-Neoverse-Q3-2022-Neoverse-V2-Platform-2-696x391.jpg

The big launch customer with the Neoverse V2 is NVIDIA Grace. The “Grace CPU Superchip” is the two-chip solution in a single package design with an interconnect between the two 72-core Grace halves. Arm does not have a scale here, but the x86 2023 projections seem a bit conservative from what we have seen thus far. Arm declined to share detailed PPA benefits of V1 versus V2 at this launch.
Arm-Neoverse-Q3-2022-NVIDIA-Grace-on-Neoverse-V2-696x391.jpg

https://www.servethehome.com/arm-ne...-for-nvidia-grace-and-cxl-2-0-pcie-gen5-cpus/


Arm Fills In Some Gaps– And Details – In Server Chip Roadmaps​

Here is the high level updated Neoverse roadmap:
arm-neoverse-roadmap-2022.jpg

Here is a more complete roadmap, showing three generations of V, N, and E core designs:
arm-neoverse-roadmap-v-n-e-2022.jpg

The Demeter V2 core is paired with DDR5 memory and PCI-Express 5.0 peripheral controllers and will support the CXL 2.0 coherent memory protocol for accelerators, which allows for memory pooling across servers as well. The V2 core definitely supports the Armv9-A architecture previewed in March 2021, which among other things supports the second generation Scalable Vector Extension (SVE2) vector math design, which has a quad of 128-bit vectors lashed together that supports INT8 and BF16 formations as well as the usual single precision FP32 and double precision FP64 floating point math.
So that mystery about Grace, and who would be the first to bring an SVE2 engine to market, is solved. It looks like it will be Nvidia, and that is probably no accident. Then again, we could see a preview of a Graviton4 chip from AWS at re:Invent 2022 based on the V2 core. . . . In fact, we expect just this. AWS could get into the field before Nvidia.
arm-neoverse-roadmap-2022-v2-core.jpg

The V2 core has 64-bit virtual memory addressing but 48-bit physical addressing, which means a complex of these cores in single socket can have up to 256 TB of physical memory attached to it. That seems like enough for now for traditional CPU use cases.
The V2 core will have 64 KB of instruction L1 cache, 64 KB of L1 data cache with error correction (that last bit is new), and the option to boost the L2 cache from the 1 MB of the V1 core design to 2 MB.
arm-neoverse-roadmap-2022-v2-mesh-io.jpg

The CMN mesh that will be used with the V2 cores can span up to 256 cores and therefore up to 512 MB of L2 cache. That is 2X the cores and 4X the L2 memory of the V1 core, which is reasonable given that the CMN mesh has 4 TB/sec of aggregate bandwidth. The V2 platform supports DDR5 and low power DDR5 (LPDDR5) main memory, the latter of which is employed by Nvidia as main memory in the Grace CPU. The V2 platform supports PCI-Express 5.0 peripherals and can run the CXL 2.0 memory pooling protocol, but we will have to wait until PCI-Express 6.0 and the CXL 3.0 protocol to have memory sharing across CPUs linked by PCI-Express switching.
https://www.nextplatform.com/2022/09/14/arm-fills-in-some-gaps-and-details-in-server-chip-roadmaps/
 

Ampere Shows Next-gen AmpereOne DDR5 PCIe Gen5 Arm Server​


At OCP Summit 2022, the company gave some clear indications of its next-generation Arm server CPU’s capabilities. It says that the new platform supports DDR5, PCIe Gen5, and has 2 DPC (DIMMs per channel) operation. Given we will have AMD Genoa launch this quarter (it is already showing up at OCP) and Intel Sapphire Rapids Xeon (launching Q1, but expect to see some benchmarks next-week on STH) with DDR5 and PCIe Gen5 support, this is great to see. The one spec aside from more on cores missing from this is CXL support.
Ampere-Mt-Mitchell-and-Other-OCP-Contributions-2022-copy.jpg

Inside, we can see dual heatsinks with the trend of using copper heat pipes to move heat to larger fin areas away from the CPU socket itself. Usually designs like this are used for >250W TDP designs, so we expect that AmpereOne will have more thermal headroom.
Ampere-Mt-Mitchell-for-Ampere-One-at-OCP-Summit-2022-copy-696x459.jpg

Ampere says AmpereOne is 2DPC, and we can see black and white DDR5 DIMM slots in Mt Mitchell. Counting 2DPC slot pairs, we can see that there are eight memory channels per CPU in this system. That would equal Intel’s Sapphire Rapids, but be 50% less than AMD EPYC Genoa. The Genoa comparison is a little more nuanced than that, but we need to save that discussion for our launch coverage since we are under embargo on a detail there.
https://www.servethehome.com/wp-con...mpere-One-at-OCP-Summit-2022-copy-696x459.jpg
 
These new chips will obviously power new AWS EC2 instance types, starting with the logically dubbed Hpc7G. This new instance type will come in a variety of sizes, with up to 64 vCPUs and 128 GiB of memory.
https://techcrunch.com/2022/11/28/aws-launches-graviton3e-its-new-arm-based-chip-for-hpc-workloads/

Uh? Uma plataforma para o mercado HPC com limite de 128 GB de RAM? 128 GB de RAM consegue-se ter num Desktop com um processador do mercado consumidor.
Sendo uma primeira versão, servirá mais para os clientes testarem as aguas, mas 128 GB é demasiado baixo.

Da AMD:
Yes. So it's really people get confused on this point. They think it's an ARM versus x86 versus Risc-V, it's really all about the solution that you put together. The reason we had actually, as some of you recall, we had our road map when you go back eight, nine years ago, had both ARM and x86 and a road map and we defeatured the arm in our CPU road map because the ecosystem still had too far to go.

We could have made that -- we had a design approach that was going to make the custom arm design for AMD equally performant to the x86, but the ecosystem wasn't there. So we kept our focus on x86, and we said, let's watch the space in ARM. ARM is now developing more of a robust ecosystem.
But if someone has reasons that they want ARM, we have our custom group, that S3 group I described earlier, and we're happy to work with them to implement in our base solution. We're not we're not married to an ISA. We're married to getting our customers the absolute best solution and delivering incredible value to them.
Versal is ARM-based, we're not changing that. Pensando, SmartNIC is ARM based, we're not changing that. Those are great examples because those are tailor-made applications that don't need that whole ecosystem. When you use a Xilinx device, when you use a SmartNIC device, you don't need that ecosystem of applications because it's a point application. It's a tailored application.
https://seekingalpha.com/article/45...s-fargo-6th-annual-2022-tmt-summit-transcript
 
Última edição:
O artigo está atrás de Paywall, mas no conteúdo, afirmam que a Google tem em desenvolvimento 2 Processadores ARM a 5 nm, TSMC, para o mercado Servidor e com objectivo de ser usado internamente e na GCP.
  • "Maple". Usa IP da Marvell (Ex-ThunderX?) e já está em testes de produção na TSMC.
  • "Cypress". Desenvolvido in house, por uma equipa da Google em Israel. Testes de produção em Q2 deste ano.
Produção em massa em 2024.
A ideia é concorrer com os processadores ARM da AWS, baixar custos e depender menos de empresas externas, como a Intel e AMD.

https://www.theinformation.com/arti...-google-makes-progress-with-data-center-chips
 
A Fujitsu anunciou o sucessor do A64FX, com poucos detalhes. Tem o codename "Monaka".
Fujitsu's Arm-based A64FX processor may have driven the most powerful supercomputer in the world, but it looks like its successor will be a more general-purpose chip that will focus on energy efficiency.
In a presentation given by Vivek Mahajan, there was mention of an "Arm-based CPU for Next-gen DC" slated for 2028.

We asked the company for further details on this processor, and it told us the chip is currently code-named MONAKA and expected to be released in 2027.
However, at the same time the chip is forecast to provide "overwhelming energy efficiency" when compared with rival CPUs that will be available within the same timeframe, with Fujitsu indicating that MONAKA should have a 1.7x lead in application performance while offering 2x the performance per watt.
"The next-generation DC CPU (MONAKA) that we're developing will have a wider range of features and will prove more energy efficient," a Fujitsu spokesperson told us.

"The range of potential applications is wider than that of the A64FX, which has special characteristics (e.g., interconnects) specific to Fugaku," the spokesperson added.

These special characteristics of the A64FX include the 28Gbps Tofu-D interconnect, high-speed HBM2 stacked memory, and the 512-bit Scalable Vector Extensions. It is unlikely that these will feature in MONAKA, although Fujitsu has few specific details to share on the architecture at this point.
https://www.theregister.com/2023/03/06/fujitsus_a64fx_successor_will_be/

Ainda está bastante distante.
 
Também existe em versão desktop. :)

iip2N01.png


sMCd7RR.jpg


WrcFiPl.png


Outra coisa bastante interessante, suporta oficialmente Windows. Acho que é o primeiro computador ARM, com suporte oficial para Windows, que não usa um processador da Qualcomm, e de longe o que tem melhor performance.

38cnkO2.jpg
 
Back
Topo