Processador ARM for server

128 Cores com 16 MB de L3, deve ser dos Processadores mais desequilibrados que já apareceram no mercado.
Se eles o fizeram, deve haver clientes que o precisam, mas em muitos Workloads, a performance vai sofrer bastante.


Anyway.......

wqtwNxq.jpg


Alibarba Yitian 710:
  • 128 Cores ARM v9
  • 60 mil milhões de Transistores
  • 3.2 Ghz
  • 8 Canais DDR5
  • 96 Lanes Pci-Ex
  • TSMC 5 nm
  • SPECint2017 de 440 (Igual a 2 Xeon Platinum 8362 64 cores at 2.80 GHz)
  • "the Yitian 710 is 20% faster and 50% more energy efficient than the current state-of-the-art ARM server CPUs" (Seja lá qual for....)
https://www.tomshardware.com/news/*****-unveils-128-core-server-cpu
https://www.alibabacloud.com/press-...er-chips-to-optimize-cloud-computing-services
 
5nm? Parece que está encontrado o substituto da HiSilicon. Ver até onde dura :berlusca:

Mas segundo percebi é I&D próprio para certos servidores (VM) da própria Ali... tal como no caso da Amazon.
 
5nm? Parece que está encontrado o substituto da HiSilicon. Ver até onde dura :berlusca:
Penso que a Alibarba não tem sanções impostas. Mais curioso é que parece que eles já estão a fazer deployment de Servidores com estes Processadores, desde Julho. Isto é, eles tiveram acesso aos 5 nm da TSMC muito cedo.
The Yitian 710 server chip was taped out earlier this year and has been deployed in their cloud since July 2021.
Outra curiosidade. O die size é bem grande.
Fabricated on a 5-nanometer leading-edge process, the Yitian 710 packs 60 billion transistors on a massive 628 mm² die.
https://fuse.wikichip.org/news/6413/*****-open-source-xuantie-risc-v-cores-introduces-in-house-armv9-server-chip/
Mas segundo percebi é I&D próprio para certos servidores (VM) da própria Ali... tal como no caso da Amazon.
Sim, é só para ser usado in-house.
 
Pois isso é que estava a achar estranho, a T-Head (a divisão de semicodutores da Ali) parece-me ter claramente ocupado o lugar da HiSilicon (Huawei) e de certeza que a dimensão será bem maior também, e isto para uma "empresa" que tem 3 anos.
Founded on September 19, 2018, T-Head Semiconductor Co., Ltd. is a wholly-owned semiconductor chip business entity of ***** Group. T-Head possesses the terminal-cloud integrated full stack product series such as AI chip, CPU Processor IP, etc., covering end-to-end chip design process.
https://www.t-head.cn/about

Eles já vão na 2ª geração dos Xuantie (Risc-V) e também já têm o seu próprio chip AI, Hanguang 800, a 12nm e não deverá muito até apresentarem um 900 parece-me.
 

AVA Developer Platform offers 32 64-bit Arm cores, 32GB RAM, 10GbE for $5,450​


The AVA Developer Platform was announced together with ADLink COM-HPC Ampera Altra server module for embedded applications with up to 80 64-bit Arm cores, up to 768GB DDR4, 4x 10GbE, and 64x PCIe Gen4 lanes.
The AVA Developer Platform is not fitted with the top-end COM-HPC module, but still, with a 32-core COM-HPC Ampere Altra module fitted with 32 GB DDR4 memory, plus a 128 GB NVMe M.2 SSD, and an Intel Quad X710 10GbE LAN card, it still makes an impressive workstation for native Arm development. We did not know the price the last time, but now we do as the workstation is available for pre-order for $5,450.
https://www.cnx-software.com/2021/10/26/ava-developer-platform-32-64-bit-arm-cores-32gb-ram-10gbe/
 
Um dos sistemas exascale chineses com um Phytium ARM ISA


Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed​

Tianhe-3​

Tianhe-3 is based on a Phytium 2000+ FTP Arm chip plus a Matrix 2000+ MTP accelerator. The system was completed at the end of last month, according to ATIP research. It offers an estimated 1.7 exaflops peak performance and just over 1.3 exaflops on Linpack. “We were just told the day before yesterday that the first Linpack run gave them these numbers, 1.3 exaflops (HPL).
https://www.hpcwire.com/2021/11/24/...iled-at-sc21-two-operational-and-one-delayed/
 

GIGABYTE Announces Arm Servers for Real-time Insights for Cloud and Edge​


today refreshed the GIGABYTE server portfolio with eight new Arm based servers for the Ampere Altra processor: G242-P33, G242-P34, R152-P31, R152-P32, R272-P31, R272-P32, R272-P33, and E252-P31
All the new servers support a single socket Ampere Altra processor with sixteen DIMMs, two M.2 (Gen4) slots, dual 1GbE LAN, a dedicated management port, and all PCIe Gen4 expansion slots.

20211125-Gigabyte-product-chart.png

https://www.hpcwire.com/off-the-wir...rs-for-real-time-insights-for-cloud-and-edge/
 
@dblaster7 na prática, sim.
Teoricamente, isto pode correr o que quer que seja. A Microsoft tem builds internas do Windows Server para ARM e no Azure tinha (ainda tem?) plataformas ARM.
A Microsoft chegou a mostrar Servidores, com os falecidos Centriq 2400 e Thunder X2, que usava internamente no Azure.
ArYzqDH.jpg


dzLPGD7.png
 
Anunciados durante a Re:Invent, mas apenas para preview e sem grandes dados

Amazon AWS Launches Graviton3 Arm Processors​


At the AWS re:Invent 2021 keynote, the new C7g instances were announced. Just to be clear, these are preview instances, not GA instances.
Amazon-AWS-Graviton-3-Acceleration-C7g-Instances-696x391.jpg

Amazon-AWS-Graviton3-Cover-696x404.jpg

Amazon-AWS-Graviton-3-Acceleration-696x414.jpg

On the machine learning side, we get up to 3x better performance and support for bfloat16. We saw bfloat16 support on 2020’s 3rd Generation Intel Xeon Scalable Cooper Lake chips that Facebook uses. We expect this to be a popular feature in CPUs in 2022 when we expect Graviton to GA, albeit likely before Sapphire Rapids and Milan.
Amazon says that the new instances will support DDR5 as we would expect from a 2022 chip. The company is also using terminology around stack pointers and security that makes it sound like it is using an Arm Neoverse N2 core. We also get up to 30Gbps of network bandwidth on the new instances.
https://www.servethehome.com/amazon-aws-launches-graviton3-arm/
 
Não sei onde é que este foi buscar isto, porque não vi isto em mais lado nenhum...


Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs | The First PCIe 5.0 And DDR5 Server CPU​


https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d8f222-aea8-4d3d-a40b-e75bb6bf5010_1024x768.jpeg

While Amazon didn’t explicitly state the core type, SemiAnalysis can confirm that Amazon is using Arm’s Neoverse V1 core.
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2234025-5b43-4cb1-8cd5-86a3e05435e4_1024x768.jpeg

V1 had previously only racked up wins at the European, South Korean, and Indian domestic HPC efforts, so Amazon’s core choice here is quite interesting. Compared to N1 and N2, V1 is much wider. It offers double the FP execution units, but this comes at the cost of higher area.
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce40a8d-e103-4101-9d3e-4b2ceb3e3d83_1024x768.jpeg

They are stuffing 3 CPUs into an air-cooled server unit. Intel and AMD are approaching 350W-400W for the next generation CPU's, Amazon is targeting 1/3 to 1/4 this number.
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96de63a8-e1db-4aa7-a3aa-3ebfbe7fbfa1_1024x663.jpeg

Networking costs as a percentage of server costs are ballooning as we move to the 400G and 800G era. Running individual networking cards per CPU is cost prohibitive. Merchant silicon is usually run at 1 CPU and very occasionally, 2 CPUs per NIC. The ratio with Graviton3 is flipped to 3 CPU slaves per NIC.
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdba87224-cb15-4884-b678-fcd26290b9ae_1024x576.jpeg

https://semianalysis.substack.com/p/amazon-graviton-3-uses-chiplets-and

Além disso, custom SSD e NIC

Nitro extends from the custom hypervisor, a security chip, and the powerful Nitro networking cards. Amazon raced ahead of all SmartNIC and DPU efforts from merchant silicon providers and designed / implemented their own custom hardware stack. These NIC’s provide a huge security and operational efficiency advantage by allowing the separation of hypervisor and application layer.
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2edeffab-d216-4eea-90e4-0005fe9f0929_1024x580.png

Rather than having to dedicate CPU cores on each physical CPU to run the AWS management stack, Amazon can offload it onto their custom networking card. This frees up more cores to be rented directly to consumers per physical server. Amazon was able extract this as an operational advantage over other cloud service providers and keep margin away from the likes of Intel.

Amazon receives a huge benefit in performance variation and cost by moving to a custom SSD controller. Cost is obvious; they now purchase raw NAND and get it packaged together with their controllers. AWS maintains control over their own supply chain and not be succumb to the highly variable controller ecosystem. SSD OEM margins are now in-house. AWS can also standardize controllers and performance characteristics across their datacenters.
https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb97a83c-ec4f-4deb-80c7-224aa3b582d9_1024x576.jpeg



EDIT: acho que cheguei à fonte


C7g (Graviton 3) C6g (Graviton 2) os restantes são Intel

FFkTnVcVQAUu2-e


 
Última edição:
Há aí vários pontos interessantes, mas antes de ir a eles, o nextplatform diz que ele deve ser baseado no N2 e não no V1.
Rather than try to make the Graviton3 chip bigger with more cores or faster with more clock speed, what AWS did instead was make the cores themselves a lot wider, and to be very precise, it looks like AWS moved from the “Ares” Neoverse N1 core from Arm Holdings, used in the Graviton2, to the “Perseus” Neoverse N2 core with Graviton3.

There is some talk that it is using the “Zeus” V1 core, which has two 256-bit SVE vectors, but the diagrams we have seen only show a total of 256-bits of SVE, and the N2 core has a pair of 128-bit SVE units, so it looks to use like it is the N2 core. We are looking for confirmation from AWS right now. The V1 core was aimed more at HPC and AI workloads than traditional, general purpose compute work.
https://www.nextplatform.com/2021/12/02/aws-goes-wide-and-deep-with-graviton3-server-chip/

Agora as partes interessantes:
Eles separam o "Core" da parte de "IO", mas qual será a razão para eles dividirem o "IO" em 6 chips, sendo o "Core" monolitico?
A mim parece-me que a intenção deverá ser ter outros SKUs/Produtos com diferentes números de Lanes Pci-Ex e Canais de memória.
3 Servidores em 1 OU + Nitro Controller faz mesmo pensar que ele não deverá passar muito dos 100 W de TDP e se assim é, com 64 Cores a 2.6 Ghz, tenho sérias dúvidas que o core seja o V1.
Aquele "Nitro Card", desde que apareceu, que me parece muito interessante. A sensação que fico é que além da parte de management, também é uma espécie de "hypervisor" implementado em hardware.
 
Pois todos os artigos parecem apontar para o N2, mas a questão do N2 e do V1, é a própria apresentação feita pela ARM, na apresentação os V1 têm um máximo de 96c e os N2 128c.
https://www.anandtech.com/show/16073/arm-announces-neoverse-v1-n2
https://www.servethehome.com/arm-neoverse-v1-and-n2-roadmap-update/


Arm Neoverse V1 Platform: Unleashing a new performance tier for Arm-based computing​

The Neoverse V1 platform itself is also extremely flexible enabling multi-chiplet and multi-socket solutions with best-in class DDR5/HBM3 memory, PCIe5 IO and CXL2.0-attached memory or coherent accelerators.
Arm-Neoverse-V1-platform.png

https://community.arm.com/arm-commu...se-v1-platform-a-new-performance-tier-for-arm

olhando para o lado esquerdo da imagem, isto parece alinhar exactamente com aquela imagem acima do Graviton 3

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2234025-5b43-4cb1-8cd5-86a3e05435e4_1024x768.jpeg


Em relação às restantes questões vai ter de ser esperar para ver quando a AWS libertará a informação.
 
De facto, bate quase tudo igual ao diagrama do V1.

Então acho estranho as "estimativas" de TDP versus número de Cores e Clock. O Nextplatform tem uma tabela com os dados e estimativas.
ldWQDVZ.jpg


Usando Chiplets, se tiver 100W de TDP, dá 1,56W por Core. Isto num "Big Core".
Para comparação, o AMD Epyc 7713 (TDP 225W, 2.0/3.6 Ghz, 64 Cores), dá uma média de 3,52W..
Ok, o a Amazon não tem SMT, não deve ter o mesmo número de lanes Pci-Ex, não deve ter caches tão grandes e é feito a 5 nm, mas mesmo assim, é bem menos de metade.



Já agora, o servethehome tem um artigo sobre o "Nitro".

Antes:
tth2v2P.jpg


2JkcpBz.png


Depois:
mRbxSuG.jpg


8tE1bM6.jpg


fwSV8pF.jpg


https://www.servethehome.com/aws-nitro-the-big-cloud-dpu-deployment-detailed/
 
A apresentação é em Russo, mas aqui fica. Baikal-S2:
Yf4P68C.png

128 Cores N2, 8 Lanes DDR5, 192 Lanes Pci-Ex Gen5, 6 nm.


Resultados do Baikal-S, com 48 Cores A-75:
aV4qU48.png


VkAQ0Cq.png


TE7yPNC.png


44GlL5g.png



Ao mesmo tempo e de forma interessante, também anunciaram que irão criar chips Risc-V em parceria com a Esperanto.
 
O nextplatform voltou ao Graviton3, agora corrigiram o core do N2 para o V1, mas curiosamente mantêm os cálculos feitos incialmente

We looked at the SIMD units and originally thought the Graviton3 was based on the N2 core, not the V1 core. AWS has not confirmed what core is being used, but after re-reading our own coverage of the N2 and V1 cores from last April, where we talked about the pipelines on the two cores, it is clear that it is a modified V1 not a beefed up N2.

Artigo Original​

Artigo Actual​

ldWQDVZ.jpg
Screenshot-2022-01-05-at-19-21-07-Inside-Amazon-s-Graviton3-Arm-Server-Processor.png

https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/
 
Back
Topo