AMD - HSA e a evolução do APU

Como dizem os artigos acima, isto é algo feito para um determinado propósito, computação e cálculo científico.

Mas o que importa é o conceito em si

- 2015
amd-exascale-fastforward.jpg

First, the Fast Forward concept stacks up memory close to the CPU-GPU hybrid compute to get the high bandwidth that applications require and cannot get with GPUs linked to CPUs over PCI-Express peripheral buses. The hybrid compute chip will have a single virtual address space, thanks to AMD’s Heterogeneous Systems Architecture (HAS) electronics, which by the way will be extended to work across discrete Opteron CPUs and AMD GPUs and is not limited to on-die setups.
https://www.nextplatform.com/2015/07/30/future-systems-can-exascale-revive-amd/

- 2017 (post anterior de 2017)
21065703899l.png

http://www.computermachines.org/joe/publications/pdfs/hpca2017_exascale_apu.pdf

Já se sabe a abordagem da AMD com o ZEN e agora levada a outro nível com o ZEN (separação IO + CPU core), mas também se sabe que a AMD iria manter os die APU monolíticos, o que para muitos é estranho pois aquele layout faria supor que daria para um IO+CPU+GPU, e quem sabe o sonho de também incluir HBM.

Embora não tenha havido qualquer menção a APU, o que me fez pensar no mesmo foi esta imagem da apresentação
article-630x354.22c1d063.jpg


a menção à conexão CPU-GPU via IF leva-me a pensar precisamente que os die de GPU estarão num mesmo interposer (provavelmente activo à lá Vega) que o CPU e, lá está, que este também terá HBM (3 provavelmente), e que não será um tradicional sistema de discrete/dedicated CPU + GPU via PCIe.

Claro que a menção a custom CPU e GPU instruções (HW?) dedicado a HPC e AI, pelo menos parece confirmar que a nível de GPU a AMD pretende fazer divergir uma gama de GPU para esta área (linha Radeon Instinct), o que já tinha comentado no lançamento da VEGA 20.

A nível de software e como tenho dito neste tópico, o objectivo é tratar o CPU+GPU como algo único, o que seria mais fácil com um die monolítico, mas que como referi a AMD tem vindo a adicionar o suporte a dGPU, deduzo que eliminando o PCIe e sendo o link entre CPU e GPU o IF a tarefa seja mais fácil.
EDIT: Discrete GPU Code For AMDKFD, Radeon Compute Could Be Ready For Linux 4.17

Por último e não menos importante, a AMD não irá receber a totalidade dos 600M$, mas irá trabalhar directamente com a malta dos National Labs americanos e isso é importante do ponto de vista do software.

EDIT: nem a propósito, a AMD a lançar hoje um update ao ROCm

Radeon ROCm 2.4 Released With TensorFlow 2.0 Compatibility, Infinity Fabric Support
Equally exciting is initial support for AMD Infinity Fabric Link for connecting Radeon Instinct MI50/MI60 boards via this Infinity Fabric interconnect technology. Infinity Fabric will become more important moving forward and great to see Radeon ROCm positioning the initial enablement code into this release.
https://www.phoronix.com/scan.php?page=news_item&px=Radeon-ROCm-2.4-Released
 
Última edição:
Oak Ridge National Lab a preparar a transição do Summit (IBM Power9 + Nvidia Tesla V100) para o Frontier (AMD optimised CPU + custom Radeon Instinct)

ORNL.jpg

https://www.olcf.ornl.gov/wp-content/uploads/2019/10/Roth-HIP-on-Summit-20191009.pdf

a despropósito da Intel ir lançar um gpu também para o mercado de "aceleradores" em conjunto com a OneAPI, isto também a juntar ao SYCL da Khronos, há um gajo da Universidade de Heidelberg que tem estado a trabalhar nisso, resultados mais recentes

#hipSYCL rocking on the big GPUs with the #BabelStream benchmark. Utterly in love with HBM2 :)
ECXKGa0XUAQIW65.png:large

https://twitter.com/illuhad/status/1163558300199325697
 
ROCm activado por defeito nas drivers open source (AMDGPU) para o Power 9

AMDKFD/ROCm GPU Compute Can Work On POWER Systems Like Raptor's Talos II
With various Radeon driver bugs in the open-source stack having been worked out over time that affect the POWER architecture, it turns out the driver stack is good enough on POWER to even enable the AMDKFD (Kernel Fusion Driver) compute support -- which is the kernel component to the Radeon Open Compute (ROCm) stack that runs in user-space.

The last step is simply this patch to enable the AMD HSA/KFD option to appear on the ***** 64-bit architecture. With that patch against the latest kernel is all that's needed to getting the AMD GPU compute code running on POWER9 systems like the libre hardware offerings by Raptor Computing Systems.

Raptor's Timothy Pearson mentioned this support has been verified to work on a Talos II with Radeon RX Vega 64 graphics card though the lower-end Blackbird should presumably be working fine as well. So unlike the NVIDIA proprietary driver with their POWER build, this combination would yield a fully-open-source (sans the GPU firmware/microcode bits needed for Radeon GPUs) GPU-accelerated compute experience
https://www.phoronix.com/scan.php?page=news_item&px=AMDKFD-Compute-*****
 
The State Of ROCm For HPC In Early 2021 With CUDA Porting Via HIP, Rewriting With OpenMP

Earlier this month at the virtual FOSDEM 2021 conference was an interesting presentation on how European developers are preparing for AMD-powered supercomputers and beginning to figure out the best approaches for converting existing NVIDIA CUDA GPU code to run on Radeon GPUs as well as whether writing new GPU-focused code with OpenMP device offload is worthwhile.

For converting CUDA code over for AMD GPU execution, the focus is obviously on using AMD's open-source HIP heterogeneous interface. With the "Hipify" Clang is how source-based translations can be achieved in large part from CUDA or there is also Hipify-Perl for the text-based search/replace in migrating from CUDA to HIP. From the HIP-based approaches they have been seeing good results with roughly 2% overhead.

With Fortran code, Hipfort is necessary as as interface library for the GPU kernel and more manual porting compared to the automatic translation. But with one test case at least under their HIP version they found it to be 30% faster, but part of that at least may also come down to compiler stack differences, as noted.
https://www.phoronix.com/scan.php?page=news_item&px=LUMI-Preparing-For-AMD-HPC
 
Back
Topo