Intel OneAPI

Dark Kaeser · 11 de Janeiro de 2019

Aparentemente durante o Intel Architecture Day não se falou apenas de hardware, e só há pouco é que me apercebi do reboliço de uma notícia do Phoronix...

"We (Intel) would like to request to add SYCL programming model support to LLVM/Clang project to facilitate collaboration on C++ single-source heterogeneous programming for accelerators like GPU, FPGA, DSP, etc. from different hardware and software vendors."

https://www.phoronix.com/scan.php?page=news_item&px=Intel-SYCL-For-LLVM-Clang

mas tirando um outro artigo do Phoronix da altura
Intel Developing "oneAPI" For Optimized Code Across CPUs, GPUs, FPGAs & More

não encontrei muito mais

Intel One API to Rule Them All Is Much Needed

A key tenant in the One API solution is unifying all of the hardware, SDKs, and middleware used to deliver optimized software experiences to users of Intel hardware, whether that is x86 CPUs, AI accelerators, GPUs, or FPGAs.

The reason Intel is going the route of using a common API across products is that the company’s complexity would soon get out of hand. It already works on everything from drivers, to libraries and middleware, to application optimization throughout a number of product lines and categories. Instead of having to maintain very different APIs for different hardware, it can focus on developing software with a common API, and hiding hardware complexity at the back end.

https://www.servethehome.com/intel-one-api-to-rule-them-all-is-much-needed/

De repente tive um flashback :whistle:

Nota: não assisti ao Intel Architecture Day, estava de férias :biglaugh:

por isso passou-me ao lado.

Dark Kaeser · 28 de Janeiro de 2019

Contextualizar a coisa.

Cerca de um mês antes um developer da Red Hat, Dave Airlie, na Linux Plumbers Conference porpôs isto, apesar de ser um projecto não oficial

"Until now the clear market leader has been the CUDA stack from NVIDIA, which is a closed source solution that runs on Linux. Open source applications like tensorflow (AI/ML) rely on this closed stack to utilise GPUs for acceleration...This talk will discuss the possibility of creating a vendor neutral reference compute stack based around open source technologies and open source development models that could execute compute tasks across multiple vendor GPUs. Using SYCL/OpenCL/Vulkan and the open-source Mesa stack, as the basis for a future task that development of tools and features on top of as part of a desktop OS."

His talk focused not only on NVIDIA's closed-source Compute Unified Device Architecture (CUDA) but also how AMD's ROCm/HIP compute stack is open-source but vendor-specific. There is also Intel's NEO compute driver project that is OpenCL but also focused just on their own hardware.

https://www.phoronix.com/scan.php?page=news_item&px=Red-Hat-Plumbing-Compute-Stack

As imagens da apresentação estão disponíveis aqui.

Basicamente e como a AMD já tinha percebido quando lançou o GCN para voltar ao mercado Profissional, mais que as capacidades do HW, o que iria fazer a diferença era o suporte e SW, e neste caso CUDA e ferramentas de desenvolvimento desta tinham-se tornado na norma de facto. O Koduri deve saber isso melhor que ninguém...

A própria AMD teve de andar a fazer as optimizações de OpenCL e depois tentar que os produtores de SW as integrassem, acabaram por de certa forma "a cortar curvas" desenvolvendo o HIP que "traduz" o CUDA.

miguelbazil · 29 de Janeiro de 2019

Mas isso é um problema geral desta indústria, e fico feliz de os ver a tentar ir por aí. Resta saber se a adopção vai ser larga, como é necessário. E qualquer coisa que ajude o Vulkan a ser mais adoptado é algo que suporto

Dark Kaeser · 8 de Maio de 2019

Para os mais distraídos... a Cray tinha anunciado um 1º sistema exascale Intel em Março

- Intel To Take On OpenPower For Exascale Dominance With Aurora

combined with a future generation of Optane 3D XPoint persistent memory, will aim at reducing the major energy consumption of data movement, with the actual floating point and integer math performance coming from a future compute platform based on future Xeon SP CPUs working in tandem with the Xe discrete graphics cards that the new CPU and GPU architects at Intel revealed were under development back in December 2018,

...

We imagine that the new CXL interconnect that Intel revealed it was working on last week as well as its OneAPI alternative to CUDA for compiling and distributing applications (or portions of them) across different CPUs, GPUs, and FPGAs, will also be a prominent feature of this updated Intel HPC hardware and software stack

https://www.nextplatform.com/2019/0...openpower-for-exascale-dominance-with-aurora/

- First U.S. Exascale Supercomputer to Be a Cray Shasta System

The Argonne system, named Aurora, will be comprised of more than 200 Shasta cabinets, Cray’s unique software stack optimized for Intel architectures and the Cray Slingshot™ interconnect,
...
Aurora will be powered by a range of next generation Intel technologies including a future generation of the Intel Xeon™ Scalable processor, Intel’s Xe compute architecture, Intel® Optane™ DC Persistent Memory, and Intel’s One API software. Cray designed the Slingshot scalable interconnect to handle the complex processing and communication of HPC and AI applications to run on exascale systems. Slingshot is a complete rethinking of the interconnect for extreme scale and has added novel adaptive routing, quality-of-service, and congestion management features.

http://investors.cray.com/phoenix.zhtml?c=98390&p=irol-newsArticle&ID=2391628

- Intel, Cray Win U.S. Exascale Deal
https://www.eetimes.com/document.asp?doc_id=1334439

Não deixa de ser curioso é que este sistema exascale da Intel para o Argonne National Laboratory ter mais de 200 cabinets, enquanto o anunciado pela Cray para o Oak Ridge National Laboratory anunciar mais de 100 cabinets.
Acho que a Anandtech tinha uma tabela com a comparação, mas por motivos que não entendo o site não abre.

Dark Kaeser · 20 de Junho de 2019

Intel Developing "Data Parallel C++" As Part Of OneAPI Initiative

We've known OpenCL will take a big role and their LLVM upstreaming effort around their SYCL compiler back-end. The SYCL single-source C++ programming standard from The Khronos Group we've expected Intel to use as their basis for oneAPI while now it seems they are going a bit beyond just targeting SYCL.

Data Parallel C++ (DPC++) is their "new direct programming language" aiming to be an open, cross-industry standard and based on C++ and incorporating SYCL.

Not much about Data Parallel C++ was publicly revealed today, just that it's coming, it will be open/cross-vendor, and is based on SYCL/C++.

https://www.phoronix.com/scan.php?page=news_item&px=Intel-OneAPI-Data-Parallel-Cpp

Dark Kaeser · 18 de Novembro de 2019

The Intel® oneAPI Base Toolkit is a core set of tools and libraries for building and deploying high-performance, data-centric applications across diverse architectures.

It features the Data Parallel C++ (DPC++) language, an evolution of C++ that:

Allows code reuse across hardware targets—CPUs, GPUs, and FPGAs†

Permits custom tuning for individual accelerators

https://software.intel.com/en-us/oneapi/base-kit

Dark Kaeser · 4 de Dezembro de 2019

#oneAPI
@IntelDevTools
As compute-intense workloads become ever more diverse, so too must the architectures they run on. From cluster-rich HPC to AI and machine learning, achieving the highest performance from today’s data-centric applications requires development and deployment across a mix of compute engines—CPUs, GPUs, FPGAs, and specialized accelerators.

https://techdecoded.intel.io/topics/oneapi/#gs.jzvk1q

To address this, Intel has launched oneAPI, a unified, standards-based programming model to simplify cross-architecture development and improve efficiency and innovation.

Dark Kaeser · 1 de Julho de 2020

Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification

SYCL is a standard C++ based heterogeneous parallel programming framework for accelerating High Performance Computing (HPC), machine learning, embedded computing, and compute-intensive desktop applications on a wide range of processor architectures, including CPUs, GPUs, FPGAs, and AI processors.

SYCL 2020 is based on C++17 and includes new programming abstractions, such as unified shared memory, reductions, group algorithms, and sub-groups to enable high-performance applications across diverse hardware architectures.

At the Argonne National Laboratory, Exascale supercomputer systems using Intel chips are being built and new implementations seek to enable developers to easily scale C++ applications to accelerator clusters using SYCL. In Europe, the Cineca Supercomputing center is using the Celerity distributed runtime system, built on top of SYCL, to program the new Marconi100 cluster equipped with 3,920 GPUs and ranked #9 in the Top500 (June 2020).

https://www.khronos.org/news/press/khronos-releases-sycl-2020-provisional-specification

Khronos Releases SYCL 2020 Provisional Specification

The SYCL 2020 provisional specification is available today and is now based on C++17 where as formerly SYCL had been based on C++11. SYCL 2020 is also bringing new programming abstractions like unified shared memory, group algorithms, sub-groups, and other features.

SYCL 2020 still supports OpenCL as the default back-end target but there continues to be new implementations for supporting SYCL on more accelerator targets (CPUs, OpenMP, CUDA, Radeon ROCm, etc) and other environments for heterogeneous programming. There is already early SYCL 2020 support coming to Intel's DPC++ compiler for oneAPI and Codeplay's ComputeCpp project.

https://www.phoronix.com/scan.php?page=news_item&px=SYCL-2020-Provisional-Spec

Evaluating the Performance of the hipSYCL toolchain for HPC kernels on NVIDIA V100 GPUs

This video was presented at the online version of IWOCL / SYCLcon 2020.
Authors: Brian Homerding and John Tramm (Argonne National Laboratory)

In this paper, we produce SYCL benchmarks and mini-apps whose performance on the NVIDIA Volta GPU is analyzed. We utilize the RAJA Performance Suite to evaluate the performance of the hipSYCL toolchain, followed by an more detailed investigation of the performance of two HPC mini-apps. We find that the kernel performance from the SYCL kernels compiled directly to CUDA preform at a competitive level with their CUDA counterparts when comparing the straightforward implementations.

Dark Kaeser · 28 de Setembro de 2020

Intel oneAPI 1.0 Officially Released

At the center of oneAPI is Intel's Data Parallel C++ (DPC++) as the language built atop C++ and Khronos SYCL standards. Besides their LLVM/Clang-based DPC++ compiler toolchain also encompassing oneAPI are their many libraries from deep learning with oneDNN to oneMKL as their math kernel library to oneDAL for analytics, oneTBB for threading, and oneVPL for video processing, among other components.

While oneAPI is most often talked about for now in the context of Intel hardware given their product portfolio, there has already been work on bringing oneAPI/DPC++ to NVIDIA GPUs as third-party work being pursued by Codeplay in cooperation with Intel. In terms of CPU-based execution, Intel's oneAPI software libraries have been running fine on AMD CPUs (and to great performance in many instances!) as well as even seeing work to support POWER and ARM architectures with their software libraries.

https://www.phoronix.com/scan.php?page=article&item=intel-oneapi-10&num=1

Dark Kaeser · 30 de Setembro de 2020

Intel's oneAPI Is Coming To AMD Radeon GPUs

Intel and the Heidelberg University Computing Center are announcing today they are establishing the "oneAPI Academic Center of Excellence." Great for academia, but what's more interesting to the masses that as part of that Intel and the University of Heidelberg are working to add oneAPI support for AMD Radeon GPUs.

Presumably this oneAPI Radeon support is coming by building off the existing AMDGPU LLVM back-end given the SYCL/DPC++ support being LLVM-based and thus just extending the AMDGPU LLVM back-end. Thanks to AMD's open-source driver stack and full-featured support it should be more straight-forward for adding this oneAPI support than the currently pursued NVIDIA approach of going through CUDA.

https://www.phoronix.com/scan.php?page=news_item&px=oneAPI-AMD-Radeon-GPUs

Dark Kaeser · 11 de Novembro de 2020

Intel Debuts oneAPI Gold and Provides More Details on GPU Roadmap

The biggest news from an HPC perspective was introduction of oneAPI Gold, the first productized version of Intel’s programming platform for the Xe GPU line. On the hardware side, Intel added detail to its plans for offering distinct versions of its GPUs and introduced a video streaming GPU solution.

Intel also drew attention to the XPU/oneAPI expanding ecosystem (excerpt from Intel literature):

Argonne National Laboratory: Researchers at the U.S. Department of Energy’s Argonne National Laboratory are using Intel oneAPI Toolkits to test code performance and functionality using programming models that will be supported on Aurora. Aurora is set to be one of the nation’s first exascale systems and will be used to dramatically advance scientific research and discovery.

Codeplay builds oneAPI support: Codeplay Software announced the first release of its Data Parallel C++ (DPC++) compiler for Nvidia GPUs.

University of Illinois (UI): The Beckman Institute for Advanced Science and Technology at UI today announced a new oneAPI center of excellence (CoE). They are bringing the oneAPI programming model to life sciences and application NAMD to additional heterogeneous computing environments. NAMD, which simulates large biomolecular systems, is helping to tackle real-world challenges such as COVID 19.

Heidelberg University Computing Center (URZ): URZ announced it is establishing a oneAPI CoE focused on bringing oneAPI support to AMD GPUs.

Swedish e-Science Research Center (SeRC): Hosted at Stockholm University and the KTH Royal Institute of Technology, the SeRC’s oneAPI academic CoE is using oneAPI’s unified and heterogeneous programming model to accelerate research conducted with GROMACS, a widely used free and open-source application designed for molecular dynamics simulation.

https://www.hpcwire.com/2020/11/11/...old-and-provides-more-details-on-gpu-roadmap/

Nemesis11 · 17 de Novembro de 2020

Uma Placa de Rede 4 X 100 Gbits a usar uma FPGA Intel Agilex e onde será possível usar OneAPI:

Bittware IA-840F:

FPGA:

OneAPI:

“Intel Agilex FPGAs and cross platform tools including the oneAPI toolkit are leading the way to enable easier access to these newest FPGAs and their tremendous capabilities - including eASIC integration, HBM integration, BFLOAT16, optimized tensor compute blocks, Compute Express Link (CXL), and 112 Gbps transceiver data rates for high speed 1Ghz compute and 400Gbps+ connectivity solutions”, said Patrick Dorsey, VP Product, Programmable Solutions Group at Intel. “The highly customizable and heterogenous Agilex platform and oneAPI tools enable products like the new IA-840F accelerator card from BittWare to drive innovation from the edge to the cloud.”

That last bit is also an intriguing element to the new Bittware product: support for the oneAPI unified programming environment. OneAPI is Intel’s grand vision for a singular software platform for use in CPU, GPU, FPGA, and AI hardware – while the upper layer is built on a SYCL variant of Data Parallel C++ (DPC++), the libraries underneath will be optimized for the hardware along with a hardware abstraction layer from the programmer. The goals are admirable, and so far we’ve heard about oneAPI used in the context of GPUs as it relates to Intel’s Xe graphics with our recent interview of Intel’s Lisa Pearce, but we’ve not heard much on the FPGA side. With Bittware making this announcement, it would appear that the FPGA angle is certainly well on its way as well. Alongside oneAPI support, the IA-840F comes with a HDL developer toolkit, such as PCIe drivers, application example designs, and a board management controller. Based on the image of the IA-840F, it looks like that the unit has three DDR memory slots, likely for different accelerators on the FPGA.

https://www.anandtech.com/show/16251/bittware-4x100g-fpga-card-uses-intel-10nm-agilex-and-oneapi

Dark Kaeser · 6 de Fevereiro de 2021

NERSC, ALCF, Codeplay Partner on SYCL for Next-generation Supercomputers

National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory (Berkeley Lab), in collaboration with the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory, has signed a contract with Codeplay Software to enhance the LLVM SYCL GPU compiler capabilities for NVIDIA A100 GPUs.

Codeplay is a software company based in the U.K. that has a long history of developing compilers and tools for different hardware architectures. The company has been the lead implementor of SYCL compilers and a main contributor to the existing open source support for NVIDIA V100 GPUs through the DPC++ project.

“The ALCF is excited to see that Perlmutter will be supporting the SYCL programming model via DPC++,” said Kalyan Kumaran, director of technology at the ALCF. “As a key programming model for Argonne’s upcoming exascale system, SYCL and DPC++ will benefit the broader DOE community by providing portability of accelerator programming models across DOE computing facilities.”

“We are delighted to see the SYCL programming standard being embraced by the U.S. national labs and providing scientists developing accelerated C++ with a standardized software platform,” said Andrew Richards, founder and CEO of Codeplay Software. “Codeplay is a big believer in open standards and has worked extensively within Khronos to define and release SYCL 2020, which includes many new features such as memory handling for higher overall system performance.”

https://www.hpcwire.com/off-the-wir...r-on-sycl-for-next-generation-supercomputers/

O que faz esta notícia aqui?

Como está no post #8, a Nvidia acaba de perder uma das suas vantagens, ferrementas de desenvolvimento e programação CUDA.

Além disso, e por isso a notícia estar aqui, o Argonne é quem vai receber o Aurora (Sapphire Rapids + Ponte Vecchio da Intel), mas como dito pelo director de Tecnologia do Argonne, este modelo vai beneficiar outros "National Labs" e sistemas do Departamento de Energia dos US, nomeadamente Oak Ridge (Frontier) e o Lawrence Livermore (El Capitán) ambos sistemas AMD Epyc + Instinct.

Basicamente, post #8, a notícia aqui refere-se à primeira imagem do post, à "via azul", e no caso dos outros 2 estarão a seguir para já a "via vermelha" dessa imagem.
O objectivo ideal seria a "via roxa".

Dark Kaeser · 11 de Fevereiro de 2021

SYCL 2020 Launches with New Name, New Features, and High Ambition

Here’s snapshot of SYCL’s major new features:

Unified Shared Memory (USM) enables code with pointers to work naturally without buffers or accessors.

Parallel reductions add a built-in reduction operation to avoid boilerplate code and enable maximum performance on hardware with built-in reduction operation acceleration.

Work group and subgroup algorithms enable efficient parallel operations between work items.

Class template argument deduction (CTAD) and template deduction guides simplify class template instantiation.

Simplified use of Accessors with a built-in reduction operation, reducing boilerplate code and simplifying use of C++ software design patterns.

Expanded interoperability for efficient acceleration by diverse backend acceleration APIs.

Atomic operations are now closer to standard C++ atomics to enhance parallel programming freedom.

o mais curioso é que foi adicionado mais uma projecto "paralelo", que visa Intel CPU e NEC "Vector Engines".
Onde havia 4 implementações SYCL agora há 5 :coolshad:

Intel is hardly alone. In fact, Wong argues the number of SYCL development efforts is one of the clearest measures of SYCL’s growing traction. Xilinx has an effort as does AMD (with the University of Heidelberg) and its natural to wonder if those efforts could be merged if/when AMD’s acquisition of Xilinx is completed. Wong doesn’t think so. There’s a neoSYCL that is quite new targeting NEC and Intel processors. Wong packed a chart showing SYCL implementations. Take a moment to look at SYCL’s growing family tree and then read Wong’s comments.

“The SYCL implementations in development are now ballooning. Actually, we just put one in just in the last couple of weeks. Traditionally, there has always been Codeplay’s ComputeCpp. That’s the company I work for, which generates codes for any number of CPUs. GPUs have gone through OpenCL and SPIR-V that can work for Intel, AMD, Arm, Mali, IMG PowerVR, and the Renesas R-Car [devices]. But we also have one that goes through PTX to generate code for Nvidia’s GPUs,” said Wong.

“Then the big player that came in was Intel with their oneAPI. Inside oneAPI is a compiler called data parallel C++ (DPC++). They are doing that so they can generate code for Intel CPUs, GPUs, FPGAs, and I think in future for AI processors. They are using a Clang (compiler) implementation [and] so is Coldplay.

https://www.hpcwire.com/2021/02/09/sycl-2020-launches-new-name-new-features/

Não deixa de ser curioso este neoSYCL, porque o propósito é Intel CPU + NEC VE, quando a maioria dos sistemas a usar os NEC VE são... EPYC da AMD :n1qshok:

Dark Kaeser · 15 de Fevereiro de 2021

CERN Uses DLBoost, oneAPI To Juice Inference Without Accuracy Loss

The researchers at CERN used the Intel Low Precision Optimization Tool, which is a new open-source Python library that supports automatic accuracy-driven tuning strategies. The tool helps to speed up deployment of low-precision inferencing solutions on popular DL frameworks including TensorFlow, PyTorch, MXNet, and so forth. In addition to the GitHub site, it is included in Intel AI Analytics Toolkit along with Intel optimized versions of TensorFlow, PyTorch, and pre-trained models to accelerate deep learning workflows.

Quantization, Intel DL Boost, And oneAPI

INT8 has broad support thanks to Intel Xeon SP processors, and it is also supported in Intel Xe GPUs. FPGAs can certainly support INT8 and other reduced precision formats. Quantization methods offer effective ways to use powerful hardware support in many forms.

The secret sauce underlying this work and making it even better: oneAPI makes Intel DL Boost and other acceleration easily available without locking in applications to a single vendor or device

It is worth mentioning how oneAPI adds value to this type of work. Key parts of the tools used, including the acceleration tucked inside TensorFlow and Python, utilize libraries with oneAPI support. That means they are openly ready for heterogeneous systems instead of being specific to only one vendor or one product (e.g. GPU).

oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures. Intel helped create oneAPI, and supports it with a range of open source compilers, libraries, and other tools. By programming to use INT8 via oneAPI, the kind of work done at CERN described in this article could be carried out using Intel Xe GPUs, FPGAs, or any other device supporting INT8 or other numerical formats for which they may quantize.

https://www.nextplatform.com/2021/0...api-to-juice-inference-without-accuracy-loss/

OSPRay Studio 0.6 Released For Intel's Open-Source Interactive Ray-Tracing Visualizer

OSPray Studio builds atop the existing OSPray ray-tracing engine and inter-connected oneAPI Rendering Toolkit components to offer an open-source scene graph application for interactive visualizations and ray-tracing based rendering.

OSPray Studio makes it easy to enjoy ray-traced, interactive, real-time rendering whether it be for visualizations or pursuing photo realistic rendering. OSPray Studio and the rest of the oneAPI Rendering Toolkit has been great and excellent open-source offerings.

https://www.phoronix.com/scan.php?page=news_item&px=OSPray-Studio-0.6

Dark Kaeser · 17 de Fevereiro de 2021

Vão começando a aparecer cada vez mais programas a suportar as funcionalidades

Intel's oneDNN 2.1 Released With NVIDIA GPU Support, Initial Alder Lake Optimizations

With oneAPI/oneDNN being open-source and being used on more than just Intel hardware, there are even more AArch64 (ARM 64-bit) enhancements in the 2.1 release. The oneDNN 2.1 release has various performance improvements with ArmCL. There is also now JIT support for AArch64 along with implementations for various primitives.

A new preview-level feature with oneDNN 2.1 is support for NVIDIA GPUs. The oneDNN library now supports NVIDIA GPU acceleration when using the proprietary driver stack with the cuDNN and cuBLAS libraries. Targeting the NVIDIA GPUs relies on using Intel's DPC++ Compiler.

https://www.phoronix.com/scan.php?page=news_item&px=Intel-oneDNN-2.1-Released

Dark Kaeser · 9 de Julho de 2021

Preparing for Aurora: Porting a Computational Chemistry Code to Exascale Architectures

As the original NWChem code is some quarter-century old, the NWChemEx developers decided to rewrite the application from the ground up, with the ultimate goal of providing the framework for a next-generation molecular modeling package. The new package is capable of enabling chemistry research on a variety of leading-edge high-performance computing (HPC) systems. Prominent among these systems will be the forthcoming Aurora supercomputer, an exascale Intel-HPE machine to be housed at the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy (DOE) Office of Science User Facility located at Argonne National Laboratory.

To this end, NWChemEx incorporates numerous modern software-engineering techniques for C++, while GPU compatibility and support have been planned since the project’s initial stages, thereby orienting the code to the demands of exascale as matter of constitution.

At the core of their work is NVIDIA-based development using the CUDA model.

Today the NWChemEx project encompasses programming models such as CUDA, HIP, and DPC++ in order to target various hardware accelerators. Moreover, the portability of DPC++ potentially makes it a portable programming model for future architectures. With DPC++, explicit control of memory management and data transfers can be scheduled between host and device. The NWChemEx project uses the newly introduced Unified Shared Memory (USM) feature from the SYCL 2020 standards.

For Intel hardware, the developers employ Intel’s DPC++ Compatibility Tool to port any existing optimized CUDA code and translate it to DPC++. The Compatibility Tool is sophisticated enough that it reliably determines apposite syntax in translating abstractions from CUDA to SYCL, greatly reducing the developers’ burden.

This two-step process—automated translation followed by manual finetuning—generates, from old CUDA code, performant DPC++ code that specifically targets Intel architectures.

https://www.hpcwire.com/off-the-wir...nal-chemistry-code-to-exascale-architectures/

Dark Kaeser · 19 de Maio de 2022

E cá está

Intel Announces SYCLomatic For Open-Source Conversion Of CUDA Code To C++ SYCL

Intel today has lifted the embargo on SYCLomatic, their new open-source tool to help migrate code-bases targeting NVIDIA's CUDA so they can be re-purposed to target C++ and SYCL -- thereby being able to leverage Intel's graphics processors and jiving with their oneAPI goals.

https://www.phoronix.com/scan.php?page=news_item&px=Intel-SYCLomatic

Dark Kaeser · 2 de Junho de 2022

To Cure Iron Anemia With SYCL, Intel Buys Codeplay

Intel doesn’t want to just create a rival to the CUDA programming model and library stack so it can better compete against Nvidia in the GPU compute market.

To do that, Intel is going to need some help, and to that end the company is snapping up the 80-strong team at Codeplay, one of the stewards of the SYCL programming model that was created in 2014, that is at the heart of Intel’s oneAPI cross-platform, cross-device programming effort, and that is a derivative of (or better still an integral of) the OpenCL programming framework that was created by Apple in 2009. Both SYCL and OpenCL are steered by Khronos Group.

Codeplay is one of the organizations that has been proving that you can balance the three Ps of programming for high performance systems – that would be productivity, performance, and portability – in such a way that you can achieve portability and still get performance and have reasonable productivity. To prove this point, the team at Codeplay created oneAPI SYCL compilers for AMD and Nvidia GPUs by three major US Department of Energy facilities – namely, Lawrence Berkeley National Laboratory, Argonne National Laboratory, and Oak Ridge National Laboratory. Codeplay has also written its own SYCL DNN neural network and SYCL BLAS linear algebra acceleration libraries that can run on AMD, Intel, and Nvidia GPUs and has been involved in making the cuDNN and cuBLAS libraries that Nvidia has created for the very heart of CUDA run within the oneAPI environment.

https://www.nextplatform.com/2022/06/01/to-cure-iron-anemia-with-sycl-intel-buys-codeplay/

Não foi aparentemente divulgado o valor.

Dark Kaeser · 25 de Outubro de 2022

https://twitter.com/i/web/status/1580634488274489345

Intel OneAPI

Colaborador

Colaborador

Ninja

Colaborador

Colaborador

Colaborador

Colaborador

Colaborador

Colaborador

Colaborador

Colaborador

Power Member

Colaborador

Colaborador

Colaborador

Colaborador

Colaborador

Preparing for Aurora: Porting a Computational Chemistry Code to Exascale Architectures​

Colaborador

Colaborador

To Cure Iron Anemia With SYCL, Intel Buys Codeplay​

Colaborador

Preparing for Aurora: Porting a Computational Chemistry Code to Exascale Architectures

To Cure Iron Anemia With SYCL, Intel Buys Codeplay