Processador Curiosidades de hardware

O SOC da 2ª Board parece ser o mais interessante. Ele tem a marcação "VIA John":
cdewHqe.png


Este "VIA John", parece que era a 3ª Geração dos "Via Corefusion". SOCs x86 muito integrados, low power. Entretanto, parece que foi cancelado:
Little is known about the Corefusion John, other than a VIA roadmap PDF from 2005 found here. It was apparently planned as a successor to the Luke, and supposed planned features were to include DDR2 support, hardware WMV9 decoding and HD audio support. It is unknown if this part was ever released, but it likely would have used some variant of a VIA C7 core (since VIA lost licenses required to produce the C3 series.)
https://en.wikipedia.org/wiki/VIA_CoreFusion#John

Foto do "VIA Mark" (1ª geração):
3sq402y.png


OqG5r9Z.png


Foto do "VIA Luke" (2ª geração):
0UVzZMf.png


cqwlq2q.png


TMRf391.png
 

:n1qshok:

ACES ‘Composable’ Supercomputer Gets Ready for Phase One Use​

ACES will have a variety of nodes with a range of processors – CPUs, GPUs, FPGAs, specialized AI processors – that can be dynamically mixed and matched as needed for particular workflows.
ACES_NSF-600x305.png

ACES_Accelerators-600x286.png

ACES_SYS_Infrastructure-600x273.png

Next generation infrastructure will be dynamically configurable, he said. “In ACES, each server can dynamically pool CPUs or GPUs or storage from the resource pool using the composable fabric and software. Each server is dynamically configurable based on the workload of your research team. We plan to deploy the Dell Omnia software which will support both Slurm and Kubernetes schedulers and they will be integrated with the Liqid software and fabric.”
ACES_Composable_Platform-600x302.png

ACES_DynamicResearchArchiteccture-600x335.png

Besides highlighting the different accelerator types available, he noted, “If you need large memory you can dynamically compose up to three terabytes of the memory and then you can run your application that needs[such] a large memory.”
ACES_Workflow_CPU.GPU_-600x303.png

ACES_WORKFLOWS_ACCELERATORS-600x304.png

About the ACES software environment, Liu said, “ACES will host all major HPC AI/ML software and frameworks. We will support the most widely-used the recent application software, with support to JupyterHub. We plan to offer Intel’s oneAPI as the cross-architecture programming framework or CPUs, GPUs and FPGAs. A user can use the same code through oneAPI to run on CPUs, GPUs, and FPGAs.

“We will support Slurm and Kubernetes. We will use Anaconda, Easybuild to build your software, and we will provide Singularity and Charlie Cloud for container applications. On the system side, we will install the XSEDE software stack and also support the HTC Condor.” (see slide below)
ACES-SOFTWARE-ENVIRONMENT-600x312.png

https://www.hpcwire.com/2022/04/04/aces-composable-supercomputer-gets-ready-for-phase-one-use/

Castle-nevermind.gif
 
Mais um "prcessador" desenvolvido "à medida" para uma utilização na área científica

Oil And Gas Industry To Get Its Own Stencil Tensor Accelerator​


Franz-Josef Pfreundt, team lead for next-generation architectures for the Fraunhofer Institute for Industrial Mathematics, has seen how compute engines fail in the market, and recalls IBM’s “Cell” processor, which launched in 2007 and which was used by the institute. IBM abandoned the chip, which paired a brawny Power4 core and eight specialized vector engines, in 2010. (The Cell chips famously did most of the compute in the petaflops-busting “Roadrunner” supercomputer at Los Alamos National Laboratory.)

In the wake of Cell’s demise, Pfreundt partnered with Jens Kruger at Fraunhofer and Marty Deneroff at Berkeley National Laboratory to develop the GreenWave chip, a processor aimed at reverse time migration (RTM) workloads common in the oil and gas industry and also at accelerating climate simulations. The initial GreenWave chip was based on Tensilica cores, and the 28 nanometer design included 700 cores, 32 chips per node, an in-order core with scratchpad memory and solid performance. The Greenwave chip also came out in 2013 – too early for the industry and, therefore, with no money to produce it.
In 2016, Pfreundt and Kruger embarked on creating a new a new chip, teaming with ETH Zurich to design an accelerator that could meet the performance and efficiency demands for HPC and machine learning workloads.

The result is the STX compute engine – that STX is short for stencil and tensor accelerator, which is part of the growing trend toward domain-specific processing – that tries to balance the conflicting demands for high power efficiency, easy programmability, and low costs. The STX chip was designed as part of the larger European Processor Initiative (EPI) that is driving the push for European independence in HPC and, eventually, exascale computing by relying more on EU-developed technologies.

stx-stencil-processing-unit.jpg

The STX is design to execute “mass kernels on volume points,” Pfreundt says. “Every time you have a constant access pattern, which is any type of stencil with iterative calculation, that will work. One, for example, is wind energy. In the EPI, one side is this Arm processor which is being developed. Then there are the Spanish guys doing a RISC-V vector processor and we have this stencil processing unit, which in the first place is a specialized accelerator, but it will be general programing. This is the important point: It’s developed from a scientist’s view, so that makes programing easier and keeps the general programmability.”
The design includes a stenciling processing unit, or SPU, a small VLIW architecture with some key parts, including the address generation unit.
stx-spu-core.jpg

“When you imagine this large stencil, you have to do a lot of advanced calculations to get the points in memory where you have to get the data from,” Pfreundt says. “This is done in hardware. Since we have a constant access pattern within the loop, that means we can even put the looping in hardware. You run your loop index in hardware. This makes programing a lot easier. Sure, there are some ideas how to organize the data better. But this is the main thing and we have this scratch pad memory on the SPU as well.”

Having the address generation hardware reduces the overhead of chips that do so in software, Pfreundt says. It’s needed because users have a constant access pattern that isn’t modified and is always the same, so it makes sense to do it in hardware, similar to how it’s done in FPGAs.

There also are two floating point units (FPUs) that can be 32-bit or 64-bit and run four operations per cyle as well as the TCDM scratch pad memory. There are four SPUs – with a RISC-V management core – per cluster and several clusters per microtile and 32 microtiles per chiplet, along with scratch pad memory and a low-latency interconnect.
stx-microtile.jpg


As for coding, developers can code formula as they would in C, C++, or Fortran, and the STX will work with that. That feeds into the push to make programming easier by avoid large porting and tuning efforts, he says.

“The compiler is really an essential piece and we spent a lot of time in the compiler technology,” Pfreundt says. “The compiler and the architecture are developed together so that the compiler guys really have an inference how the chip looks like and the other way around. Therefore, the code itself is simple enough that the compiler can do the job. It’s called a very large structural architecture, but actually the instruction was very short. There’s no vectorization that needs to be done at all, which makes life a lot simpler for a compiler. We have OpenMP support. It’s just one OpenMP call and code will compile. If we talk about the host, the host is a quad-core RISC-V core.”
stx-llvm-compiler.jpg


The design of the compiler enabled the developers to reach 80 percent of the theoretical peak performance, he says. It’s based on the LLVM architecture.

None of this was done on final hardware; instead, a software simulator was used to run tests of the SPU on a range of workloads, from RTM and fast-fourier transform (FFT) to convolutional neural networks and average pooling.

What we see is that you have now many simulations which are combined with machine learning,” Pfreundt says. “They say in the fluid dynamics model, you have a machine learning that describes the turbulence model and so you’re very often have this combination now of machine learning with simulation and so we can do both in an optimal way.”
stx-performance.jpg

Right now, the hardware setup is four SPUs per cluster and 128 clusters per chiplet, with clock speeds of 1.1 GHz, a 12 nanometer FinFET design, and 16 GB of HBM2e memory, with PCI-Express 5.0 CXL 2.0 support, and consuming 35 watts. It is manufactured by Globalfoundries, so it can be built in Europe, and Pfreundt says the STX chip helps drive down the cost by a factor of five what a GPU costs, he says.
https://www.nextplatform.com/2022/0...ry-to-get-its-own-stencil-tensor-accelerator/
 
Não é recente e já era conhecida, mas é fora do habitual. Intel VCA2.
4JDgVGf.jpg

Isto são 3 computadores numa placa Pci-Express.
São 3 Xeons E-1585L v5 Quad Core, cada um com 2 SODIMMs. A ligação passa por um Switch Pci-Ex na própria placa.
Não tem Storage. O host vê cada computador nesta placa, numa rede (a correr em cima de Pci-Ex) e envia uma imagem do Sistema Operativo (Centos), por essa rede e fica a correr em RAM.
Usada para Video Enconding com o Quicksync via iGPU. Suporta 44 Streams simultâneos h264 1080p ou 14 Streams simultâneos h264 4K.

Video do der8auer:
 
Última edição:
@godevskii Sim, tens razão, ele usa um Microcontrolador (O Raspberry Pi RP2040) e não tem os interfaces comuns de um computador. Ter-lhe chamado SBM teria sido mais correcto que SBC. :)
Quanto a não correr um OS, ele corre um OS. Não é um OS genérico.

De notar que é possível ter um computador genérico a partir de um Microcontrolador.
 
Está dentro de uma peça de Lego? :)

Continuando nesta onda de dispositivos muito pequenos, lembrei-me dos "Smart Rings". Basicamente, um anel com um microcontrolador, bateria e outros componentes integrados. :)
9csWv4E.png


TLEulCX.png


HXGPLKE.png


Um vídeo com o Teardown :)

Aquele Infineon CY8C6336BZI é um microcontrolador com um processador ARM M4.
 
Última edição:
Muito bom. :)

Continuando nisto dos tamanhos e com algo mais comum, a motherboard do portátil Dell XPS 13 de 2022:
dBEveQL.png


E a comparação do espaço que ocupa, entre a versão 2021, à esquerda e 2022, à direita (O portátil, em si, já é pequeno, visto ser de 13 polegadas):
uxDubWT.png

https://www.tomshardware.com/news/dell-xps-13-price-specs-release-date-2-in-1

Naquela motherboard está o CPU, Gráfica integrada, RAM (LPDDR5 package-on-package), SSD, Wireless, etc.
O ponto contra é que não é "upgradable", nem sequer outros componentes fora da motherboard, como a bateria.
 
Back
Topo