Gráfica AMD CDNA GPU Architecture: Dedicated GPU for Data Centers

Bom, os dies destas MI100 já são não propriamente pequenos, e não havendo redução de processo de fabrico, e sendo que os benefícios da "reduçaõ" cada vez menores, a única forma de escalar será essa.

Mas falta saber em concreto que tipo de solução será usado.
A AMD e os vários parceiros (quer TSMC, quer as empresas de "packaging" como a ASE, AMKOR e SPIL) têm as suas soluções a que se podem vir a acrescentar agora as da Xilinx que já usava ainda antes da AMD.

O suspeito do costume já publicou 2 de 3 séries de patentes da Xilinx, uma boa parte deles refere-se precisamente a multi chip/die stacking.

Here is Xilinx's long-awaited list of patents, highlighting the latest developments that may be interesting for future AMD projects. (1/3)
https://twitter.com/Underfox3/status/1330512321970659329

Here is Xilinx's long-awaited list of patents, highlighting the latest developments that may be interesting for future AMD projects. (2/3)
https://twitter.com/Underfox3/status/1332872721085165571
 
Não me digas que a AMD vai começar a colar GPUs, como faz com os zen.

Isso é praticamente um dado adquirido. A própria nVidia há uns anos, mostrou um estudo de uma gráfica com vários chips no mesmo package. A Intel irá fazer isso nas Xe. O Raja já as andou a mostrar no Twitter.
No caso da AMD, só falta saber quando e como.
 
Já havia umas patentes sobre o tema, mas entretanto foram publicadas estas.

Coloco aqui proque me parece mais certo que isto aparecerá primeiro nas CDNA do que nas RDNA.

AMD patents GPU chiplet designs, a future of RDNA architecture?

A new patent submitted to the US Patent Office on December 31 describing the AMD approach to potential GPU chiplet design. The manufacturer has outlined the problematic structure of such a design and explained how it would be possible to avoid them in the future.

According to AMD, the GPU designs have been kept in a monolithic state due to various problems with implementation. GPU programming model is inefficient to work with multiple GPUs (that also describes Crossfire configurations), as it is hard to distribute parallelism across multiple active dies in the system. It is also complex and expensive design-wise to synchronize memory content across multiple GPU chipsets, AMD describes.
AMD-GPU-Chiplets-FIG1.png

FIG. 1 is a block diagram illustrating a processing system employing high bandwidth passive crosslinks for coupling GPU chiplets in accordance with some embodiments.

AMD-GPU-Chiplets-FIG2.png

FIG. 2 is a block diagram illustrating a sectional view of GPU chiplets and passive crosslinks in accordance with some embodiments.

AMD-GPU-Chiplets-FIG4.png

FIG. 4 is a block diagram illustrating a floor plan view of a GPU chiplet in accordance with some embodiments.

AMD-GPU-Chiplets-FIG5.png

FIG. 5 is a block diagram illustrating a processing system utilizing a four-chiplet configuration in accordance with some embodiments.

https://videocardz.com/newz/amd-patents-gpu-chiplet-designs-a-future-of-rdna-architecture
 
Mais uma "ajuda", quer à AMD como à Intel, o DoE (Departamento de Energia dos USA), atribui um fundo para acelerar a transição dos SW para os próximos Supercomputadores (que vão ser 2 da AMD+AMD e 1 Intel+Intel), pelo que todo o software que usava CUDA terá que passar ou para OpenCL ou o mais provável o SYCL.

DOE to Provide $12M for Research on Adapting Scientific Software for Exascale Era and Beyond
Today, the U.S. Department of Energy (DOE) announced plans to provide up to $12 million for research aimed at adapting scientific software to run on the coming generation of increasingly powerful supercomputers.
“The coming generation of supercomputers, as we move through and beyond the era of exascale computing, will bring a huge boost in capabilities for scientific investigation and discovery,” said Dr. Steve Binkley, Acting Director of DOE’s Office of Science. “Taking advantage of these capabilities will require adaptation to radically new computing architectures and programming environments. This research seeks to tackle these challenges in very systematic ways.”
The research will have two main areas of focus: (1) developing innovative approaches to updating scientific applications to adapt to the new parallel-programming environments of the coming generation of systems and (2) developing innovative methods of testing scientific applications to ensure that they function properly as they are adapted to the new systems and new features are added to the software.
https://www.hpcwire.com/off-the-wir...dapting-scientific-software-for-exascale-era/
 
Radeon "GFX90A" Added To LLVM As Next-Gen CDNA With Full-Rate FP64

GFX90A is another iteration of Vega/CDNA with GFX10 being the newer RDNA/RDNA2 graphics processors. The "Arcturus" support was under the GFX908 graphics name for what became the Radeon Instinct MI100 that launched last year.
image.php

With this new GFX90A among the differences are most FP64 instructions now being full-rate. AMD GPU FP64 arithmetic performance has tended to be half the rate of FP32 arithmetic but for this next-gen CDNA it looks like full-rate FP64 will be a big ticket feature.

The GFX90A target also adds a new "thread group split" (TgSplit) feature, additional matrix fused multiply add (MFMA) instructions, support for Data Parallel Primitives (DPP) extension, extended image intrinsics, and other changes from a quick look through the new code.
https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-LLVM-GFX90A

Deve estar para sair mais uma lá mais para o final do ano.
 
Também vinha agora postar isso: 2 documentos separados da HP a confirmarem a MI200, que é provável que seja a tal "GFX90A" cujo suporte foi adicionado aos drivers.

Não deixa de ser curioso que a última imagem diz respeito às licenças/suporte da Intel OneAPI :coolshad:

Já a primeira é de uma apresentação da Cray (HPE)
https://indico.ph.ed.ac.uk/event/69/contributions/908/attachments/723/886/HPE-GABRIELE-PACIUCCI.pdf

A bem dizer, o anúncio do Frontier sempre mencionou custom Epyc, se bem que ninguém fazia ideia do que isso significava
Frontier will use a custom AMD Epyc processor based on a future generation of AMD’s Zen cores (beyond Rome and Milan).
https://www.hpcwire.com/2019/05/07/cray-amd-exascale-frontier-at-oak-ridge/

Do PR original
Frontier will incorporate several novel technologies co-designed specifically to deliver a balanced scientific capability for the user community. The system will be composed of more than 100 Cray Shasta cabinets with high density compute blades powered by HPC and AI- optimized AMD EPYC processors and Radeon Instinct GPU accelerators purpose-built for the needs of exascale computing.
https://www.ornl.gov/news/us-depart...er-record-setting-frontier-supercomputer-ornl

Em teoria até ambos, Epyc e Instinct podem ser "custom".

EDIT: afinal, relacionado com um evento open source FOSDEM 2021, na apresentação de um projecto LUMI (supercomputador europeu) baseado em AMD + AMD, estes também mencionam que vão usar Instinct, mas que não serão as MI100 :coolshad:

Lumi-00.png

https://fosdem.org/2021/schedule/ev...slides/4710/getting_started_with_amd_gpus.pdf
 
Última edição:
Timming meio suspeito :whistle:

Sweden’s KTH Royal Institute of Technology Selects HPE to Build New Supercomputer for Academic, Industrial Research
The HPE Cray EX supercomputer will include HPE Slingshot for purpose-built HPC networking to address demands for higher speed and congestion control for data-intensive workloads. It will also feature next generation AMD EPYC processors and AMD Instinct GPU accelerators to improve efficiency and achieve the performance required to process and harness insights from computationally complex data.

HPE to deliver KTH’s Dardel in two phases

HPE will install the first phase of the supercomputer this summer. It will feature over 65,000 CPU cores and it be ready for research use in July 2021. The second phase of the installation will consist of GPUs which will be installed later this year and be ready for use in January 2022.
https://www.hpcwire.com/off-the-wir...percomputer-for-academic-industrial-research/
 
AMPD Announces ‘Machine Learning Cloud’ Initiative Built Around AMD Instinct MI100 Accelerators
AMPD Ventures Inc. (“AMPD” or the “Company”) is pleased to announce a ‘Machine Learning Cloud’ initiative designed to cater to the requirements of academic institutions and companies in the artificial intelligence (“AI”), machine learning and deep learning sectors. The platform, featuring AMD Instinct accelerators along with the AMD ROCm open software platform, will initially be hosted at AMPD’s DC1 data centre in Vancouver, British Columbia, but is expected to expand into other territories over the coming months.
https://www.hpcwire.com/off-the-wir...built-around-amd-instinct-mi100-accelerators/
 
De um Patch de sexta feira da AMD


Feifei Xu (11):
drm/amdgpu: simplify the sdma 4_x MGCG/MGLS logic.
drm/amdgpu: add sdma 4_x interrupts printing
drm/amdgpu: Add DID for aldebaran
drm/amdgpu:add smu mode1/2 support for aldebaran
drm/amdgpu:return true for mode1_reset_support on aldebaran
drm/amdgpu: correct vram_info for HBM2E
drm/amd/pm:add aldebaran support for getting bootup values
drm/amdgpu: update atom_firmware_info_v3_4 (v2)
drm/amdpgu: add ATOM_DGPU_VRAM_TYPE_HBM2E vram type
drm/amdgpu:disable XGMI TA unload for A+A aldebaran
drm/amdgpu: Use dev_info if VFCT table not valid
https://lists.freedesktop.org/archives/dri-devel/2021-March/300278.html

Parece que o Aldebaran vai usar HBM2E.

HBM2E: The E Stands For Evolutionary​

https://semiengineering.com/hbm2e-the-e-stands-for-evolutionary/
 
Como já tinha dito antes, estes contratos com os National Labs amaricanos iriam dar imenso jeito a nível de salto qualitativo do SW, que ainda era um dos calcanhares da AMD


Argonne, ORNL Award Codeplay Contract to Strengthen SYCL Support for AMD GPUs​

Argonne National Laboratory (Argonne) in collaboration with Oak Ridge National Laboratory (ORNL), has awarded Codeplay Software a contract implementing the oneAPI DPC++ compiler, an implementation of the SYCL open standard software, to support AMD GPU-based high-performance compute (HPC) supercomputers.

The Argonne Leadership Computing Facility (ALCF) is deploying an exascale supercomputer,
Aurora, based on Intel GPUs with SYCL being one of the primary programming models. The Oak Ridge Leadership Computing Facility (OLCF) is deploying an exascale supercomputer, Frontier, which features AMD GPUs.
https://www.hpcwire.com/off-the-wir...ract-to-strengthen-sycl-support-for-amd-gpus/

Basicamente: o Argonne, sistema Aurora (Intel CPU + GPU) e o Oak Ridge, sistema Frontier (AMD CPU + GPU), ambos contrataram a mesma empresa Codeplay (empresa de SW especializada em optimização de código) para implementar SYCL, OSS para substituir CUDA.
 
Hmmm....


From: Naveen Krishna Chatradhi <[email protected]>
To: [email protected], [email protected]
Cc: [email protected], [email protected], [email protected],
[email protected]
Subject: [PATCH 0/7] x86/edac/amd64: Add support for noncpu nodes
Date: Wed, 30 Jun 2021 20:58:21 +0530
This patchset does the following
1. Add support for northbridges on Aldebaran
* x86/amd_nb: Add Aldebaran device to PCI IDs
* x86/amd_nb: Add support for northbridges on Aldebaran
2. Add HBM memory type in EDAC
* EDAC/mc: Add new HBM2 memory type
3. Modifies the amd64_edac module to
a. Handle the UMCs on the noncpu nodes,
* EDAC/mce_amd: extract node id from InstanceHi in IPID
b. Enumerate HBM memory and add address translation
* EDAC/amd64: Enumerate memory on noncpu nodes
c. Address translation on Data Fabric version 3.5.
* EDAC/amd64: Add address translation support for DF3.5
* EDAC/amd64: Add fixed UMC to CS mapping

Aldebaran has 2 Dies (enumerated as a MCx, x= 8 ~ 15)
Each Die has 4 UMCs (enumerated as csrowx, x=0~3)
Each die has 2 root ports, with 4 misc port for each root.
Each UMC manages 8 UMC channels each connected to 2GB of HBM memory.
https://lore.kernel.org/lkml/[email protected]/

Portanto: 4 UMC x 8 Channels x 2GB HBM = 64GB HBM por die

Uma vez que a Arcturus tem 2 dies são 128GB HBM


e25.gif
 
Foi o que também pensei, provavelmente para gerir a parte do IF/PCI e memória.

De qualquer das maneiras pela primeira página, relativamente aos sistemas para os sistemas exascale já deixava adivinhar qualquer coisa do género, a 3ª geração do IF, que chamaram de Infinity Architecture permitiria ligar directamente as 8 Gpu comunicando entre elas, e ligar também directamente o "interconect" ao GPU sem necessidade do CPU
 
Back
Topo