AMD - HSA e a evolução do APU

- Hot Chips 2013: Phil Rogers ( AMD) , Ben Gaster ( Qualcomm), Ian Bratt ( ARM), and Ben Sander ( AMD) presented on HSA, HSA Memory Model, HSA Queueing Model and HSAIL this last Sunday.
http://hsafoundation.com/hot-chips-2013-hsa-foundation-presented-deeper-detail-hsa-hsail/

- HSAIL: Write-Once-Run-Everywhere for Heterogeneous Systems
Ben Sander of AMD and Chien-Ping Lu MediaTek HSA Foundation Working group leader for HSA Programer Reference Manual pen a nice article on HSAIL and HSA technology
http://www.computer.org/portal/web/...once-run-everywhere-for-heterogeneous-systems

- HSAemu Full system simulator built form PQUEMU to do Full System Emulation of HSA from our Academic Member Yeh-Ching Chung of National Tsing Hua University
http://www.slideshare.net/slideshow/embed_code/25697629#
 
HSA o capítulo final?


Today we are announcing another technology, also part of HSA, that further elevates the GPU to its rightful place inside the APU. That technology is called heterogeneous Queueing, or hQ.
hQ transforms the current processor ‘master slave’ architecture of the APU and turns it into ‘all processor equal’ type of design. Instead of being spoon-fed by the CPU, the GPU can spawn its own work items by placing tasks onto either the GPU or CPU queue to be dispatched immediately via low-latency user mode queuing.

1z63wcp.png



Reducing latency when performing work on the GPU, reducing software layers, more efficiently utilizing GPU opens up many more opportunities to accelerate application performance, reduce power consumption and improve portability. However, another major benefit of doing this allows use of the languages that programmers already use every day. Now that the GPU has the flexibility to create and dispatch its own work items, programming models become equivalent to familiar models available for the CPU. This in turn will enable greater number of apps to be able to take advantage of GPU compute performance. Removing the software layers that are vendor specific will allow the slogan ‘write once, use everywhere’ to finally come to life.
http://community.amd.com/community/...2013/10/21/hq-from-masterslave-to-masterpiece


AMD is finally talking about hQ or Heterogeneous Queuing, the final step in the Fusion integration of CPU and GPUs.
http://semiaccurate.com/2013/10/21/amd-makes-gpu-comute-reality-hq/


AMD's heterogeneous queuing aims to make CPU, GPU more equal partners
http://techreport.com/news/25545/amd-heterogeneous-queuing-aims-to-make-cpu-gpu-more-equal-partners
 
Como é que isto é posto em prática? Através de alterações ao nível do sistema operativo? Deixa de haver drivers para as gráficas e passa a comunicar como para o cpu hoje em dia.
 
Como é que isto é posto em prática? Através de alterações ao nível do sistema operativo? Deixa de haver drivers para as gráficas e passa a comunicar como para o cpu hoje em dia.

Resumidamente, a aplicação disto é focada nos APU e SoC, basta ver quais as empresas que estão na origem da criação da HSA Foundation:

24fbmuw.png

http://hsafoundation.com/

e o objectivo é criar, a nível de hardware, uma "base comum" de funcionamento que independentemente da arquitectura do CPU (ARM, x86) ou do GPU (GCN da AMD, MALI da ARM, PowerVR da Imagination, etc) ou outro tipo de Co-processadores ou acelaradores, que permita desenvolver software que possa tirar partido da mesma.

A nível de hardware a base comum ou HSA compatible, com os requesitos do lado esquerdo, será algo como isto:

339oocl.jpg

http://share.csdn.net/#/detail/851
(ver última imagem do 1º post, corresponde à 4ª coluna ou seja 2014 - APU "Carrizo)


A nível de software e para facilitar as coisas, eles criaram uma ISA virtual, o HSAIL:

An intermediate language called HSAIL is helping to address some of the challenges. One of the benefits of HSAIL is its portability across multiple vendor products. Compilers that generate HSAIL can be assured that the resulting code will be able to run on a wide variety of target platforms. HSAIL also provides existing programming languages with an efficient parallel intermediate language that runs on a wide variety of hardware. This provides the underlying infrastructure and brings the benefits of heterogeneous computing to existing, popular programming models such as Java™, OpenMP™, C++, and more.
288wb2u.png

http://www.mediatek.com/_en/03_news/01-2_newsDetail.php?sn=1118

Creio que isto permite beneficiar usando apenas extensões às actuais linguagens de programação.


NOTA: e já agora antes que venham dizer que a AMD não tem CPU e por isso vai usar o GPU, e troca-o-passo, fica aqui a nota que não é esse o objectivo, mas sim usar o processador mais eficiente (seja o CPU, GPU ou outro) a realizar os cálculos não só para minimizar o tempo de cálculo, mas também porque isso acaba por beneficiar os consumos.

De qualquer forma dentro de ~15 dias já deverão vir mais novidades.
 
Já que o tópico da AMD foi encerrado, e dado que já havia aqui informação acerca do HSA, faço o update por aqui.


- Everything You Always Wanted to Know About HSA - but where afraid to ask (PDF)


- Hot Chips 25 (2013), Tutorial 1 - August 25, 2013

HSA Overview
Phil Rogers, AMD

HSAIL Virtual Parallel ISA
Ben Sander, AMD

HSA Memory Model
Benedict Gaster, Qualcomm

HSA Queuing Model
Ian Bratt, ARM

http://www.youtube.com/watch?v=vxEyK32tc30

NOTA: apesar de a conferência se ter realizado em Agosto, o vídeo apenas foi carregado em Dezembro.
ATENÇÂO que o vídeo cobre toda a conferência, são cerca de 3h30, só para quem tiver paciência.


- APU 13 - os slides da maioria das apresentações já está disponível no seguinte link
http://developer.amd.com/apu/home/sessions/

videos das apresentações apenas pequenos apanhados, excluindo o da Oxide


De resto mais informação podem ir acompanhando aqui:
http://developer.amd.com/
http://hsafoundation.com/
 
Interessante. Mas o software vai ter que mudar muito para se adaptar a isto.
Embora quem tenha um APU, ao mudar para o W8.1 ja sinta uma grande mudança na velocidade.
 
APU como "co-processador" HSA de uma placa gráfica GCN

What basically happens is that the Radeon R9 290X will render the graphics while the A10-7850K would handle the physics calculations utilizing HSA as a Co-Processor. This is similar to the NVIDIA PhysX technology where you can set a second GPU as a PhysX Co-Processor.

an increase in performance due to offloading of physics processing from the discrete GPUs over to the 12 Compute Units inside the A10-7850K.

g67s.jpg


http://wccftech.com/amd-catalyst-13...a-support-scheduled-january-kaveri-coprocess/
 
Última edição:
Update.

- HSA System Architecture, HSA Programer Reference Manual, HSA System Runtime Specifications 1.0 Provisional are Now Available

HSA Foundation recently ratified and released the three main HSA specifications:



    • HSA Platform System Architecture Specification: Defines the requirements for shared virtual memory, platform coherency, signaling, queuing mechanics and packet formats, context switching, and the HSA memory model.
    • HSA Programmer’s Reference Manual: Contains the HSAIL Virtual ISA and Programming Model, Compiler Writer’s Guide, and BRIG (the “HSAIL” compiler intermediate language) object format.
    • HSA Runtime Programmer’s Reference Manual: Defines the APIs in the HSA Runtime used for tasks cuh as initialization and device discovery, queue creation, and memory management. These specifications are at the “1.0 Provisional” Level and are available from the HSA Foundation web site here (http://www.hsafoundation.com/standards/).
AMD is also supplying early implementation to test out capabilities of HSA
The project provides an initial implementation of the HSA specifications on the AMD “Kaveri” silicon a pre-HSA Compatible part. The implementation includes a Linux kernel and associated kernel-level drivers, the HSA runtime, and the HSAIL finalizer. The project includes a reference LLVM-based compiler which generates HSAIL and can extended to add additional languages that support HSAIL-based compute. The project also includes tools for assembling and disassembling HSAIL and for compiling OpenCL 2.0 kernels into HSAIL. Finally, the project includes an approachable runtime layer called “OKRA” designed to minimize the time required to get started with HSA. You can access these at https://github.com/HSAFoundation


Who should use this project?
The project is aimed at:





    • Compiler and language developers who want to add parallel acceleration to a high-level language.
    • Programmers who want to leverage features of HSA such as shared-virtual-memory, platform atomics, user-level queues, and signals.

http://www.hsafoundation.com/three-core-hsa-foundation-specification-are-available/

na sequencia do que é dito acima, que a implementação inicial é feita num APU Kaveri,

34tai9z.jpg

https://github.com/HSAFoundation/HSA-Docs-AMD/wiki/HSA-Platforms-&-Installation


para já tem sido a Penguin Computers, que foi inicialmente a escolhida pela AMD para fornecer um sistema baseado ainda no Llano para avaliação no Sandia labs, isto ainda antes da compra da Seamicro por parte da AMD, que tem neste momento disponivel o Altus com o Kaveri

Penguin Computing’s Cluster Collaboration with Advanced Micro Devices Makes Heterogeneous System Architecture Clustering a Reality
“We are making these machines immediately available for evaluation as a tremendous tool for software development,” said Phil Pokorny, Chief Technology Officer, Penguin Computing. “HSA is a reality and our technology is already in the hands of major U.S. labs. Penguin Computing’s extensive experience in APU cluster development and implementation is instrumental in this progress, in addition to close collaborate with AMD.”

Named Jäätikkö, or iceberg in Finnish, the cluster is currently being demonstrated at AMD’s SC14 booth #839 and combines 10 AMD APU compute nodes, plus head node based on Penguin’s Altus 2A30 development platform, with high performance ethernet using Penguin’s Arctica open ethernet switches.

“Initial feedback from early adopters reinforces our belief that this collaboration with Penguin Computing is an important step forward for the industry,” said Karl Freund, corporate vice president, Product Management and Market, Server Business Unit, AMD. “The potential of modern heterogeneous architectures is exciting, and collaborations such as these can result in significant steps forward in performance for a broad range of software applications.”

The oil and gas industry is an example of a customer segment that could experience significant benefits from this capability. With the oil and gas sector’s need for GPU parallel codes and single-precision, APU has almost a teraflop of single precision floating point performance.

O link para o Altus cluster
http://www.penguincomputing.com/products/rackmount-servers/altus/altus-2a30/#configure



Entretanto a AMD prepara-se para lançar o primeiro APU totalmente compatível com as especificações HSA 1.0, o Carrizo

24dhkyc.jpg
 
HSA Foundation Launches New Era of Pervasive, Energy-Efficient Computing with HSA 1.0 Specification Release

HSA is a standardized platform design supported by more than 40 technology companies and 17 universities that unlocks the performance and power efficiency of the parallel computing engines found in most modern electronic devices. It allows developers to easily and efficiently apply the hardware resources in today’s complex systems-on-chip (SOCs).
...
The newly-approved specification comprises the key elements that improve the programmability of heterogeneous processors, the portability of programming code and interoperability across different vendor devices. These include:


    • The HSA System Architecture Specification, which defines how the hardware operates;
    • The HSA Programmers Reference Manual (PRM), which targets the software ecosystem, tool and compiler developers;
    • The HSA Runtime Specification, which defines how applications interact with HSA platforms.
http://www.hsafoundation.com/hsa-fo...computing-with-hsa-1-0-specification-release/


Entretanto novo patch para o AMDKFD, para suporte multi-GPU

A patch published on Sunday for the new AMDKFD HSA kernel driver adds support for using more than one graphics card/driver.

The new patch by Oded Gabbay and Xihan Zhang, two of the AMD HSA Linux developers, adds support for handling multiple KGD (Kernel Graphics Driver) instances. With the current AMDKFD HSA kernel driver up through Linux 4.0, there's only support for one graphics driver instance.

Using this new less than 100 line kernel patch, the AMDKFD DRM HSA driver can now support multiple KGD instances whether they be two AMDGPU instances (the new R9 285, Carrizo, and Rx 300 series GPU driver), two Radeon DRM drivers (for supporting all current Radeon hardware), or a combination of AMDGPU and Radeon hardware.

This new patch for supporting multiple kernel graphics drivers only impacts this HSA driver and has nothing to do with CrossFire/OpenGL or other long sought after multi-GPU features. The patch can currently be found on dri-devel and will hopefully be merged for the Linux 4.1 kernel.
http://www.phoronix.com/scan.php?page=news_item&px=AMD-KFD-HSA-Multiple-KGD

Tal como dito no artigo, este é um open source HSA driver, não tendo nada a haver com o novo GPU driver open source, o AMDGPU.
 
HSA Foundation Members Preview Plans for Heterogeneous Platforms

World's First Heterogeneous Systems Architecture Platforms Will Span Mobile Devices, Desktops, HPC Systems and Servers

"After the HSA's successful release of the v1.0 specification in March 2015, the organization went to work on developing conformance tests," said Dr. Jon Peddie of Jon Peddie Research. "Conformance testing is critical to a meaningful HSA certification, and now that is in place too. This firmly and permanently establishes the organization's place in the industry."

"ARM is actively developing CPU, GPU and interconnect IP with energy efficiency and full system coherency as guiding design principles while extending the system capabilities aligned with HSA coherency standards."

Imagination is planning a staged rollout of HSA across its processors starting in 2016. This includes MIPS I-class and P-class CPUs, PowerVR GPUs and HSA compliant fabric solutions. According to Peter McGuinness, director of multimedia technology marketing for Imagination, "Because it provides a consistent programming model and enables efficient execution on CPUs, GPUs and beyond, HSA is an important standard for future SoCs.

MediaTek is working with partners in developing HSA features on mobile SoCs. The company is already receiving interest in HSA from customers, and is on track to deliver HSA features in mobile SoC products in phases.
http://www.marketwired.com/press-re...m?hootPostID=656328dcc316d179667c33b391955405

Sem grandes surpresas a Imagination após a compra da MIPS, anuncia que os CPU core MIPS também farão parte da oferta HSA, juntando-se ao x86 (AMD) e ARM, além das várias ofertas a nível de GPU.
 
A ARM ao apresentar os novos CoreLink CCI-550 e DMC-500

vmvfq1.png


t8vsx1.png



acabou por anunciar a próxima geração de GPU "Mimir" que já suportará o Shared Virtual Memory e será Cache Coherent, o que permitirá um SoC HSA

xnffwi.png


ARM explains that its still to-be-announced next-generation Mali IP codenamed "Mimir" will be fully cache-coherent and would be a perfect fit to take advantage of such a configuration (Current generation Midgard-based GPUs such as the T6-/7-/800 series are only I/O coherent). Fully coherent GPUs will be able to take advantage of shared virtual memory and new simplified programmers models provided by APIs such as OpenCL 2.0 and HSA.
http://anandtech.com/show/9743/arm-announces-new-cci550-and-dmc500

ARM out CCI-550 interconnect and Mimir GPU
http://semiaccurate.com/2015/10/27/arm-cci-550-interconnect-mimir-gpu/

New CoreLink IP ties in mobile GPU coherently
https://www.semiwiki.com/forum/cont...ently.html?s=89e50d67b9f82e85d5936f8371677683
 
A SUSE tem estado a colaborar com a AMD em integrar o HSA no GCC, um developer tinha anunciado há coisa de um mês:

Jambor explained in the message seeking approval to land the initial HSA support code, "I acknowledge that the submission comes quite late and that the class of OpenMP loops we can handle well is small, but nevertheless I would like to ask for review and eventual acceptance to trunk and GCC 6."

GCC 6 should be released in the first half of 2016.

The HSA libgomp plug-in that would be landing too is what finds the HSA devices on the system, finalize the HSAIL code, and then executes it on HSA-capable GPUs. So far though it seems only a small amount of OpenMP code with GCC can be easily offloaded to GPUs using HSA. The GCC HSA back-end meanwhile is explained in more detail via this patch.
http://www.phoronix.com/scan.php?page=news_item&px=GCC-HSA-Mainlining

entretanto a AMD lançou uma nova versão do CodeXL, uma ferramente de desenvolvimento para developers, com uma beta de um debugger para o HSAIL:

Use the GPU Debugger to debug applications running on the Linux® HSA stack, as shown in Figure 2 below. Set breakpoints, step through HSAIL code and watch local variables, HSAIL registers and kernel arguments. This release is compatible with the Sep 2015 AMD release of the HSA runtime, available on GitHub here.

The HSAIL Debugger is a beta feature in this release.
http://developer.amd.com/community/blog/2015/11/26/aloha-amd-codexl-1-9/
 
Design and Analysis of an APU for Exascale Computing

This paper presents a vision for an architecture that can be used to construct exascale systems. We describe a conceptual Exascale Node Architecture (ENA), which is the computational building block for an exascale supercomputer. The ENA consists of an Exascale Heterogeneous Processor (EHP) coupled with an advanced memory system. The EHP provides a high-performance accelerated processing unit (CPU+GPU), in-package high-bandwidth 3D memory, and aggressive use of die-stacking and chiplet technologies to meet the requirements for exascale computing in a balanced manner.
http://www.computermachines.org/joe/publications/pdfs/hpca2017_exascale_apu.pdf
 
Será que a AMD estará em condições de produzir o mítico HPC APU de que se fala há anos?

- Cray, AMD Tag Team On 1.5 Exaflops “Frontier” Supercomputer
The exact feeds and speeds of the AMD CPUs and GPUs that are at the heart of the system were not divulged, but Forrest Norrod, senior vice president and general manager of the Enterprise, Embedded, and Semi-Custom group at AMD, told The Next Platform what it isn’t, which is almost as useful. The CPU is a unique, custom device that is not based on the impending “Rome” second generation Epyc processor and it is not based on the future “Milan” follow-on, either, but is rather a custom CPU. Lisa Su, AMD’s chief executive officer, said that the processor used in the Frontier machine was “beyond Zen 2,” the core that is being used in the Rome chips. Norrod joked that when this custom Eypc chip is divulged, it will be named after an Italian city. . . . The Radeon Instinct GPU accelerators in Frontier are not derivative of the current “Vega” or “Navi” GPU designs, but a custom part. In both cases, the chips have had special instructions added to them for goosing the performance of both HPC and AI workloads, according to Su, but the exact nature of those enhancements are not being revealed.

The other secret sauce that AMD brought to bear in Frontier is an enhanced Infinity Fabric interconnect between the CPUs and the GPUs that will offer coherent memory access across the devices, much as IBM and Nvidia have done across the Power9 CPUs and Volta GPUs through NVLink interconnects.
https://www.nextplatform.com/2019/05/07/cray-amd-tag-team-on-1-5-exaflops-frontier-supercomputer/


- AMD Chips to Power Exascale System
A Frontier node will consist of a custom AMD Epyc CPU linked coherently to four upgraded Radeon GPUs over an enhanced version of the company’s Infinity fabric. The Epyc will use “future-generation” Zen cores and sport an updated microarchitecture packing new instructions for AI and supercomputing jobs.

The Radeon will use high-bandwidth memory, sport new compute cores, and enable mixed-precision operations for deep learning. AMD declined to say how many chips Frontier will use or what process the chips are made in, leaving analysts to speculate that it could be TSMC’s 7+-, 6-, or even 5-nm nodes.
https://www.eetimes.com/document.asp?doc_id=1334665&page_number=1


- Cray and AMD Win Big Contracts for 1.5 Exaflop Frontier Supercomputer
During the pre-brief, Dr. Lisa Su, AMD’s CEO outlined a number of advancements AMD will make for the 2021 system. First, the CPUs will be a future generation AMD EPYC product. Second, the GPUs will be a future Radeon Instinct product. Neither of these two announcements is completely surprising. What they do show in the context of Frontier, is that AMD has convinced Cray and the US DoE that in 2021 it will have the CPU and GPU platforms that are worth investment. With a $100M development contract and the fact that the largest supercomputer in the world will be using AMD in two years, this will do a lot to bolster AMD’s market perception and support.
https://www.servethehome.com/cray-and-amd-win-big-contracts-for-1-5-exaflop-frontier-supercomputer/


A maioria do software acabou integrado no ROCm e só recentemente é que o suporte para dGPU tem sido adicionado ao AMDKFD (kernel fusion driver)
 
Back
Topo