Processador ARM for server

Nemesis11 · 19 de Outubro de 2021

128 Cores com 16 MB de L3, deve ser dos Processadores mais desequilibrados que já apareceram no mercado.
Se eles o fizeram, deve haver clientes que o precisam, mas em muitos Workloads, a performance vai sofrer bastante.

Anyway.......

Alibarba Yitian 710:

128 Cores ARM v9
60 mil milhões de Transistores
3.2 Ghz
8 Canais DDR5
96 Lanes Pci-Ex
TSMC 5 nm
SPECint2017 de 440 (Igual a 2 Xeon Platinum 8362 64 cores at 2.80 GHz)
"the Yitian 710 is 20% faster and 50% more energy efficient than the current state-of-the-art ARM server CPUs" (Seja lá qual for....)

https://www.tomshardware.com/news/*****-unveils-128-core-server-cpu
https://www.alibabacloud.com/press-...er-chips-to-optimize-cloud-computing-services

Dark Kaeser · 19 de Outubro de 2021

5nm? Parece que está encontrado o substituto da HiSilicon. Ver até onde dura :berlusca:

Mas segundo percebi é I&D próprio para certos servidores (VM) da própria Ali... tal como no caso da Amazon.

Nemesis11 · 20 de Outubro de 2021

Dark Kaeser disse:
5nm? Parece que está encontrado o substituto da HiSilicon. Ver até onde dura

Penso que a Alibarba não tem sanções impostas. Mais curioso é que parece que eles já estão a fazer deployment de Servidores com estes Processadores, desde Julho. Isto é, eles tiveram acesso aos 5 nm da TSMC muito cedo.

The Yitian 710 server chip was taped out earlier this year and has been deployed in their cloud since July 2021.

Outra curiosidade. O die size é bem grande.

Fabricated on a 5-nanometer leading-edge process, the Yitian 710 packs 60 billion transistors on a massive 628 mm² die.

https://fuse.wikichip.org/news/6413/*****-open-source-xuantie-risc-v-cores-introduces-in-house-armv9-server-chip/

Dark Kaeser disse:
Mas segundo percebi é I&D próprio para certos servidores (VM) da própria Ali... tal como no caso da Amazon.

Sim, é só para ser usado in-house.

Dark Kaeser · 20 de Outubro de 2021

Pois isso é que estava a achar estranho, a T-Head (a divisão de semicodutores da Ali) parece-me ter claramente ocupado o lugar da HiSilicon (Huawei) e de certeza que a dimensão será bem maior também, e isto para uma "empresa" que tem 3 anos.

Founded on September 19, 2018, T-Head Semiconductor Co., Ltd. is a wholly-owned semiconductor chip business entity of ***** Group. T-Head possesses the terminal-cloud integrated full stack product series such as AI chip, CPU Processor IP, etc., covering end-to-end chip design process.

https://www.t-head.cn/about

Eles já vão na 2ª geração dos Xuantie (Risc-V) e também já têm o seu próprio chip AI, Hanguang 800, a 12nm e não deverá muito até apresentarem um 900 parece-me.

Dark Kaeser · 28 de Outubro de 2021

AVA Developer Platform offers 32 64-bit Arm cores, 32GB RAM, 10GbE for $5,450

The AVA Developer Platform was announced together with ADLink COM-HPC Ampera Altra server module for embedded applications with up to 80 64-bit Arm cores, up to 768GB DDR4, 4x 10GbE, and 64x PCIe Gen4 lanes.
The AVA Developer Platform is not fitted with the top-end COM-HPC module, but still, with a 32-core COM-HPC Ampere Altra module fitted with 32 GB DDR4 memory, plus a 128 GB NVMe M.2 SSD, and an Intel Quad X710 10GbE LAN card, it still makes an impressive workstation for native Arm development. We did not know the price the last time, but now we do as the workstation is available for pre-order for $5,450.

https://www.cnx-software.com/2021/10/26/ava-developer-platform-32-64-bit-arm-cores-32gb-ram-10gbe/

Nemesis11 · 28 de Outubro de 2021

SoM – COM-HPC Ampere Altra module with Ampere Altra 32-core 64-bit Arm Neoverse N1 processor up to 3.3 GHz (TPD: 60W), 32 GB DDR4 memory

Water Cooling num CPU com TDP de 60 W e num Dev Kit? Escolha um bocado bizarra.

Dark Kaeser · 24 de Novembro de 2021

Um dos sistemas exascale chineses com um Phytium ARM ISA

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

Tianhe-3

Tianhe-3 is based on a Phytium 2000+ FTP Arm chip plus a Matrix 2000+ MTP accelerator. The system was completed at the end of last month, according to ATIP research. It offers an estimated 1.7 exaflops peak performance and just over 1.3 exaflops on Linpack. “We were just told the day before yesterday that the first Linpack run gave them these numbers, 1.3 exaflops (HPL).

https://www.hpcwire.com/2021/11/24/...iled-at-sc21-two-operational-and-one-delayed/

Dark Kaeser · 25 de Novembro de 2021

GIGABYTE Announces Arm Servers for Real-time Insights for Cloud and Edge

today refreshed the GIGABYTE server portfolio with eight new Arm based servers for the Ampere Altra processor: G242-P33, G242-P34, R152-P31, R152-P32, R272-P31, R272-P32, R272-P33, and E252-P31

All the new servers support a single socket Ampere Altra processor with sixteen DIMMs, two M.2 (Gen4) slots, dual 1GbE LAN, a dedicated management port, and all PCIe Gen4 expansion slots.

https://www.hpcwire.com/off-the-wir...rs-for-real-time-insights-for-cloud-and-edge/

Nemesis11 · 26 de Novembro de 2021

Esta Board da Gigabyte até daria para um Workstation.

Não falta para ali IO.

dblaster7 · 28 de Novembro de 2021

isto vai ser para usar apenas com unix based systems ne?

Nemesis11 · 28 de Novembro de 2021

@dblaster7 na prática, sim.
Teoricamente, isto pode correr o que quer que seja. A Microsoft tem builds internas do Windows Server para ARM e no Azure tinha (ainda tem?) plataformas ARM.
A Microsoft chegou a mostrar Servidores, com os falecidos Centriq 2400 e Thunder X2, que usava internamente no Azure.

Dark Kaeser · 1 de Dezembro de 2021

Anunciados durante a Re:Invent, mas apenas para preview e sem grandes dados

Amazon AWS Launches Graviton3 Arm Processors

At the AWS re:Invent 2021 keynote, the new C7g instances were announced. Just to be clear, these are preview instances, not GA instances.

Amazon-AWS-Graviton-3-Acceleration-C7g-Instances-696x391.jpg

On the machine learning side, we get up to 3x better performance and support for bfloat16. We saw bfloat16 support on 2020’s 3rd Generation Intel Xeon Scalable Cooper Lake chips that Facebook uses. We expect this to be a popular feature in CPUs in 2022 when we expect Graviton to GA, albeit likely before Sapphire Rapids and Milan.

Amazon says that the new instances will support DDR5 as we would expect from a 2022 chip. The company is also using terminology around stack pointers and security that makes it sound like it is using an Arm Neoverse N2 core. We also get up to 30Gbps of network bandwidth on the new instances.

https://www.servethehome.com/amazon-aws-launches-graviton3-arm/

Nemesis11 · 2 de Dezembro de 2021

Anuncio muito genérico. O facto de já suportar DDR5 é interessante. Espero que mostrem mais detalhes desse chip.

Dark Kaeser · 2 de Dezembro de 2021

Não sei onde é que este foi buscar isto, porque não vi isto em mais lado nenhum...

Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs | The First PCIe 5.0 And DDR5 Server CPU

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fb2d8f222-aea8-4d3d-a40b-e75bb6bf5010_1024x768.jpeg

While Amazon didn’t explicitly state the core type, SemiAnalysis can confirm that Amazon is using Arm’s Neoverse V1 core.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fe2234025-5b43-4cb1-8cd5-86a3e05435e4_1024x768.jpeg

V1 had previously only racked up wins at the European, South Korean, and Indian domestic HPC efforts, so Amazon’s core choice here is quite interesting. Compared to N1 and N2, V1 is much wider. It offers double the FP execution units, but this comes at the cost of higher area.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F0ce40a8d-e103-4101-9d3e-4b2ceb3e3d83_1024x768.jpeg

They are stuffing 3 CPUs into an air-cooled server unit. Intel and AMD are approaching 350W-400W for the next generation CPU's, Amazon is targeting 1/3 to 1/4 this number.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F96de63a8-e1db-4aa7-a3aa-3ebfbe7fbfa1_1024x663.jpeg

Networking costs as a percentage of server costs are ballooning as we move to the 400G and 800G era. Running individual networking cards per CPU is cost prohibitive. Merchant silicon is usually run at 1 CPU and very occasionally, 2 CPUs per NIC. The ratio with Graviton3 is flipped to 3 CPU slaves per NIC.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdba87224-cb15-4884-b678-fcd26290b9ae_1024x576.jpeg

https://semianalysis.substack.com/p/amazon-graviton-3-uses-chiplets-and

Além disso, custom SSD e NIC

Nitro extends from the custom hypervisor, a security chip, and the powerful Nitro networking cards. Amazon raced ahead of all SmartNIC and DPU efforts from merchant silicon providers and designed / implemented their own custom hardware stack. These NIC’s provide a huge security and operational efficiency advantage by allowing the separation of hypervisor and application layer.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2F2edeffab-d216-4eea-90e4-0005fe9f0929_1024x580.png

Rather than having to dedicate CPU cores on each physical CPU to run the AWS management stack, Amazon can offload it onto their custom networking card. This frees up more cores to be rented directly to consumers per physical server. Amazon was able extract this as an operational advantage over other cloud service providers and keep margin away from the likes of Intel.

Amazon receives a huge benefit in performance variation and cost by moving to a custom SSD controller. Cost is obvious; they now purchase raw NAND and get it packaged together with their controllers. AWS maintains control over their own supply chain and not be succumb to the highly variable controller ecosystem. SSD OEM margins are now in-house. AWS can also standardize controllers and performance characteristics across their datacenters.

https%3A%2F%2Fbucketeer-e05bbc84-baa3-437e-9518-adb32be77984.s3.amazonaws.com%2Fpublic%2Fimages%2Fdb97a83c-ec4f-4deb-80c7-224aa3b582d9_1024x576.jpeg

EDIT: acho que cheguei à fonte

https://twitter.com/i/web/status/1466224726221623296

C7g (Graviton 3) C6g (Graviton 2) os restantes são Intel

https://twitter.com/i/web/status/1466225471176146945

Nemesis11 · 2 de Dezembro de 2021

Há aí vários pontos interessantes, mas antes de ir a eles, o nextplatform diz que ele deve ser baseado no N2 e não no V1.

Rather than try to make the Graviton3 chip bigger with more cores or faster with more clock speed, what AWS did instead was make the cores themselves a lot wider, and to be very precise, it looks like AWS moved from the “Ares” Neoverse N1 core from Arm Holdings, used in the Graviton2, to the “Perseus” Neoverse N2 core with Graviton3.

There is some talk that it is using the “Zeus” V1 core, which has two 256-bit SVE vectors, but the diagrams we have seen only show a total of 256-bits of SVE, and the N2 core has a pair of 128-bit SVE units, so it looks to use like it is the N2 core. We are looking for confirmation from AWS right now. The V1 core was aimed more at HPC and AI workloads than traditional, general purpose compute work.

https://www.nextplatform.com/2021/12/02/aws-goes-wide-and-deep-with-graviton3-server-chip/

Agora as partes interessantes:

Dark Kaeser disse:

Eles separam o "Core" da parte de "IO", mas qual será a razão para eles dividirem o "IO" em 6 chips, sendo o "Core" monolitico?
A mim parece-me que a intenção deverá ser ter outros SKUs/Produtos com diferentes números de Lanes Pci-Ex e Canais de memória.

Dark Kaeser disse:

3 Servidores em 1 OU + Nitro Controller faz mesmo pensar que ele não deverá passar muito dos 100 W de TDP e se assim é, com 64 Cores a 2.6 Ghz, tenho sérias dúvidas que o core seja o V1.

Dark Kaeser disse:

Aquele "Nitro Card", desde que apareceu, que me parece muito interessante. A sensação que fico é que além da parte de management, também é uma espécie de "hypervisor" implementado em hardware.

Dark Kaeser · 3 de Dezembro de 2021

Pois todos os artigos parecem apontar para o N2, mas a questão do N2 e do V1, é a própria apresentação feita pela ARM, na apresentação os V1 têm um máximo de 96c e os N2 128c.
https://www.anandtech.com/show/16073/arm-announces-neoverse-v1-n2
https://www.servethehome.com/arm-neoverse-v1-and-n2-roadmap-update/

Arm Neoverse V1 Platform: Unleashing a new performance tier for Arm-based computing

The Neoverse V1 platform itself is also extremely flexible enabling multi-chiplet and multi-socket solutions with best-in class DDR5/HBM3 memory, PCIe5 IO and CXL2.0-attached memory or coherent accelerators.

https://community.arm.com/arm-commu...se-v1-platform-a-new-performance-tier-for-arm

olhando para o lado esquerdo da imagem, isto parece alinhar exactamente com aquela imagem acima do Graviton 3

Em relação às restantes questões vai ter de ser esperar para ver quando a AWS libertará a informação.

Nemesis11 · 3 de Dezembro de 2021

De facto, bate quase tudo igual ao diagrama do V1.

Então acho estranho as "estimativas" de TDP versus número de Cores e Clock. O Nextplatform tem uma tabela com os dados e estimativas.

Usando Chiplets, se tiver 100W de TDP, dá 1,56W por Core. Isto num "Big Core".
Para comparação, o AMD Epyc 7713 (TDP 225W, 2.0/3.6 Ghz, 64 Cores), dá uma média de 3,52W..
Ok, o a Amazon não tem SMT, não deve ter o mesmo número de lanes Pci-Ex, não deve ter caches tão grandes e é feito a 5 nm, mas mesmo assim, é bem menos de metade.

Já agora, o servethehome tem um artigo sobre o "Nitro".

Antes:

Depois:

https://www.servethehome.com/aws-nitro-the-big-cloud-dpu-deployment-detailed/

Nemesis11 · 13 de Dezembro de 2021

Deixo isto aqui:

https://www.intel.com/content/www/us/en/products/performance/cloud-facts.html
https://www.intel.com/content/dam/w.../en/documents/10-reasons-infographic-v3-3.pdf

Quando uma empresa do tamanho da Intel, se dá ao trabalho de fazer uma campanha destas, é porque o perigo é real.

Nemesis11 · 19 de Dezembro de 2021

A apresentação é em Russo, mas aqui fica. Baikal-S2:

128 Cores N2, 8 Lanes DDR5, 192 Lanes Pci-Ex Gen5, 6 nm.

Resultados do Baikal-S, com 48 Cores A-75:

Ao mesmo tempo e de forma interessante, também anunciaram que irão criar chips Risc-V em parceria com a Esperanto.

Dark Kaeser · 5 de Janeiro de 2022

O nextplatform voltou ao Graviton3, agora corrigiram o core do N2 para o V1, mas curiosamente mantêm os cálculos feitos incialmente

We looked at the SIMD units and originally thought the Graviton3 was based on the N2 core, not the V1 core. AWS has not confirmed what core is being used, but after re-reading our own coverage of the N2 and V1 cores from last April, where we talked about the pipelines on the two cores, it is clear that it is a modified V1 not a beefed up N2.

Artigo Original	Artigo Actual

https://www.nextplatform.com/2022/01/04/inside-amazons-graviton3-arm-server-processor/

Processador ARM for server

Power Member

Colaborador

Power Member

Colaborador

Colaborador

AVA Developer Platform offers 32 64-bit Arm cores, 32GB RAM, 10GbE for $5,450​

Power Member

Colaborador

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed​

​

Tianhe-3​

Colaborador

GIGABYTE Announces Arm Servers for Real-time Insights for Cloud and Edge​

Power Member

Power Member

Power Member

Colaborador

Amazon AWS Launches Graviton3 Arm Processors​

Power Member

Colaborador

Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs | The First PCIe 5.0 And DDR5 Server CPU​

Power Member

Colaborador

Arm Neoverse V1 Platform: Unleashing a new performance tier for Arm-based computing​

Power Member

Power Member

Power Member

Colaborador

Artigo Original​

Artigo Actual​

AVA Developer Platform offers 32 64-bit Arm cores, 32GB RAM, 10GbE for $5,450

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

Tianhe-3

GIGABYTE Announces Arm Servers for Real-time Insights for Cloud and Edge

Amazon AWS Launches Graviton3 Arm Processors

Amazon Graviton 3 Uses Chiplets & Advanced Packaging To Commoditize High Performance CPUs | The First PCIe 5.0 And DDR5 Server CPU

Arm Neoverse V1 Platform: Unleashing a new performance tier for Arm-based computing

Artigo Original

Artigo Actual