Processador Curiosidades de hardware

Dark Kaeser · 26 de Agosto de 2021

The Huge Payoff Of Extreme Co-Design In Molecular Dynamics

When money is really no object, and the budget negotiations involve taking a small slice of your personal net worth of $7.5 billion out of one pocket and putting it into another, and you have the technical chops to understand the complexities of molecular dynamics, and have a personal mission to cure disease, then you can build any damned supercomputer you want.

So that is what David Shaw, former computer science professor at Columbia University and quant for the hedge fund that bears his name, has done once again with the Anton 3 system.

The Anton 3 processors, their interconnect, and the complete Anton 3 system, which were delivered last September to DE Shaw Research, the scientific and computer research arm of the Shaw empire, were on display at the Hot Chips 33 conference this week.

The current Anton team is just shy of 40 people, who design, build, test, and program this “fire-breathing monster for molecular dynamics simulations,” as Adam Butts, the physical design lead at DE Shaw Research, called the Anton 3 system. When you see the benchmark test results that pit the Anton 3, etched in seven-nanometer processes at Taiwan Semiconductor Manufacturing Corp, against its predecessors and the best GPU-accelerated supercomputers in the world, you will see that Butts is not exaggerating.

With the Anton 3 chip, the compute elements were all concentrated down to a single core tile, each with two modified PPIMs and two GCs plus a new chip that calculated the force of chemical bonds. The GCs had 128KB of L2 cache each instead of a giant shared L2 cache, and this L2 cache segment fed directly into a PPIM block. A pair of GC/PPIM blocks were put on a tile (a logical one, not a physical one), and the tiles are connected in a mesh network using an on-chip router. Like this:

Here is the die shot of the Anton 3 chip, with zooms of the edge tile and the core tile:

There are two columns of twelve edge tiles, left and right of the chip, and there are 24 columns of core tiles that are twelve cores high for a total of 288 core tiles. That’s 576 GCs and 576 PPIMs in total for the Aton 3, versus 48 GCs and 76 PPIMs for the Anton 2. For yield purposes, only 22 of the 24 columns of compute tiles are activated on the first stepping of the Anton chips, so there is a latent 8.3 per cent performance boost in the Anton 3 over current performance levels if yields on the seven-nanometer processes can be improved so the whole chip works.

Here is what the Anton 3 node looks like:

The nodes are hanged up in a chassis with a backplane and interlinked in a 3D torus network using electrical signaling internal to the chassis.

And here is what that 3D torus interconnect looks like all cabled up:

The whole shebang is water cooled and can handle 500 watts of power per node using 65 degree Celsius water, weighing in at 100 kilowatts per rack and 400 kilowatts across the entire system. The Anton systems are not designed to scale beyond 512 nodes — at least not now.

Add it all up, and Anton is still the molecular dynamics machine to beat. Check this chart out:

“On a chip-for-chip basis running the Desmond code, the Anton 3 chip is around 20X faster than the Nvidia A100,” explained Butts. “That is not to pick on Nvidia. The A100 is a tremendously capable compute engine. Rather, this illustrates the tremendous benefit we achieve from specialization. As a point of historical interest, a single chip Anton 3 supports almost the same maximum capacity as the Anton 1 supercomputer with 512 nodes. Moreover, it is actually faster over most of that range while consuming just one-fiftieth of the power.”

The molecular simulation that Butts showed of the interaction of molecules with the now-famous spike on the SARS-CoV-2 virus fits on a single Anton 3 node and has an order of magnitude better performance than an A100 GPU running the same code. This Anton 3 machine can simulate milliseconds of time over 100,000 atoms with just 64 nodes over the course of a work week.

https://www.nextplatform.com/2021/08/25/the-huge-payoff-of-extreme-co-design-in-molecular-dynamics/

Quando podes encomendar um supercomputador feito "à medida" para executar um trabalho específico

Nemesis11 · 1 de Setembro de 2021

As máquinas que produzem chips também são hardware.

The $150 Million Machine Keeping Moore’s Law Alive
ASML’s next-generation extreme ultraviolet lithography machines achieve previously unattainable levels of precision, which means chips can keep shrinking for years to come.

The current generation of EUV machines are already, to put it bluntly, kind of bonkers. Each one is roughly the size of a bus and costs $150 million. It contains 100,000 parts and 2 kilometers of cabling. Shipping the components requires 40 freight containers, three cargo planes, and 20 trucks.

The finished component will be shipped to Veldhoven in the Netherlands by the end of 2021, and then added to the first prototype next-generation EUV machine by early 2022. The first chips made using the new systems may be minted by Intel, which has said it will get the first of them, expected by 2023. With smaller features than ever, and tens of billions of components each, the chips that the machine produces in coming years should be the fastest and most efficient in history.

But it has taken decades to iron out the engineering challenges. Generating EUV light is itself a big problem. ASML’s method involves directing high-power lasers at droplets of tin 50,000 times per second to generate high-intensity light. Lenses absorb EUV frequencies, so the system uses incredibly precise mirrors coated with special materials instead. Inside ASML’s machine, EUV light bounces off several mirrors before passing through the reticle, which moves with nanoscale precision to align the layers on the silicon.

“To tell you the truth, nobody actually wants to use EUV,” says David Kanter, a chip analyst with Real World Technologies. “It's a mere 20 years late and 10X over budget. But if you want to build very dense structures, it’s the only tool you’ve got.”

ASML’s current generation of EUV machines can create chips with a resolution of 13 nanometers. The next generation will use High-NA to craft features 8 nanometers in size.

Lundstrom remembers visiting his first microchip conference in 1975. “There was this fellow named Gordon Moore giving a talk,” he recalls. “He was well known within the technical community, but nobody else knew him.”

“And I remember the talk that he gave,” Lundstrom adds. “He said, ‘We will soon be able to place 10,000 transistors on a chip.’ And he added, 'What could anyone possibly do with 10,000 transistors on a chip?’”

https://www.wired.com/story/asml-extreme-ultraviolet-lithography-chips-moores-law/

Nemesis11 · 4 de Setembro de 2021

Dark Kaeser disse:
O novo elemento da linha Z da IBM, sucessor do Z15, apresentação na Hot Chips 33, só há artigo em alemão, nos sítios do costume nada

Hot Chips 33: IBM Z Telum - 5+ GHz und viel (virtueller) Cache

O Ian do Anandtech escreveu um artigo e fez um vídeo onde fala que este pode ser o futuro das caches e que pode ser muito bom para......................Gaming, dada a quantidade e especialmente latência.

https://www.anandtech.com/show/16924/did-ibm-just-preview-the-future-of-caches

Teria o seu interesse se um processador para Mainframes viesse a "revolucionar" o mercado Gaming.

Aquele sistema de Caches faz muito sentido no papel. A questão agora a ver é se funciona melhor que caches "tradicionais" na prática.
Se as outras empresas de CPUs não estivessem já a testar esta ideia, agora vão testar de certeza.

Nemesis11 · 17 de Setembro de 2021

Motherboard de um ZX Spectrum:

O QL era extremamente bonito:

Rip Sir Sinclair.

Camões · 17 de Setembro de 2021

O ZX Spectrum foi o meu segundo computador, veio revolucionar o meu ZX81

Dark Kaeser · 18 de Setembro de 2021

Para o caso de algum amigo precisar, aqui fica

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud

Today, Cerebras and Cirrascale Cloud Services are launching the Cerebras Cloud @ Cirrascale platform, providing access to Cerebras’ CS-2 Wafer-Scale Engine (WSE) system through Cirrascale’s cloud service.

The physical CS-2 machine – sporting 850,000 AI optimized compute cores and weighing in at approximately 500 lbs – is installed in the Cirrascale datacenter in Santa Clara, Calif., but the service will be available around the world, opening up access to the CS-2 to anyone with an internet connection and $60,000 a week to spend training very large AI models.

https://www.hpcwire.com/2021/09/16/...gine-ai-system-is-now-available-in-the-cloud/

muddymind · 26 de Setembro de 2021

Mesmo com 60k/semana não sei se tão cedo terão ROI.

Dark Kaeser · 28 de Setembro de 2021

Honestamente não me parecem muito preocupados, eles poderiam ter feito bem mais $ nas rondas de financiamento se assim quisessem, até porque o CEO é o Andrew Feldman que já criou e vendeu várias empresas.

Este acordo deve ser a Cerebras a "assumir" os custos e disponibilizar a plataforma para "testes" sem colocar um sistema cujo custo pode ser proibitivo.

Eles já terão pelo menos 2 sistemas nos "National Labs" americanos e acho que a GSK e AstraZeneca, nunca irão ter muitos clientes.

Entretantos a Nec e os seus Vector Engines têm mais um cliente, a Aramco

NEC Deutschland GmbH Announces Collaboration with Aramco Europe to Advance HPC Applications

NEC is contributing to the collaboration by deploying additional computational capacity in the form of an SX Aurora-TSUBASA system, which performs environment evaluations. At the heart of SX-Aurora TSUBASA lies NEC’s own vector technology, the NEC Vector Engine. The latest generation of PCIe-based Vector Engine card models provides 8 or 10 vector cores and 48GB of HBM2 memory at an extremely high memory bandwidth of 1.53 Terabyte/s.

https://www.hpcwire.com/off-the-wir...h-announces-collaboration-with-aramco-europe/

e o JAMSTEC ES-4 (Japan Agency for Marine-Earth Science and Technology Center - Earth Simulator 4)

JAMSTEC Goes Hybrid On Many Vectors With Earth Simulator 4 Supercomputer

operational in March 2021, includes hybrid compute capabilities that mix NEC “SX-Aurora TSUBASA” vector co-processors and Nvidia “Ampere” A100 GPU accelerators inside of host server nodes that employ AMD Epyc X86 server processors, all lashed together by high-speed Nvidia HDR 200Gb/sec InfiniBand networking.

With the ES4 system, JAMSTEC is going hybrid on a couple of different vectors (pun intended). Here are the complete feeds and speeds of the ES4 machine:

Here is a more simplified view of the ES4’s architecture:

https://www.nextplatform.com/2021/0...vectors-with-earth-simulator-4-supercomputer/

Nemesis11 · 2 de Outubro de 2021

Um pouco de extremos.

Uma consola Portátil:

https://imgur.com/98gXyz3

https://www.kickstarter.com/projects/kenburns/thumby-the-tiny-playable-keychain

E um Servidor com 32 Sockets:

https://www.nextplatform.com/2021/09/28/hpe-superdome-flex-the-other-big-iron-in-the-datacenter/

Dark Kaeser · 17 de Outubro de 2021

Entretanto na Rússia

TSMC delivers first batch of Baikal BE-M1000 CPUs based on ARM Cortex-A57 cores

Baikal Electronics confirms they received the first batch of 5000 BE-M1000 CPUs from their foundry, TSMC. These are second-generation processors based on ARM architecture.

The BE-M1000 is manufactured using 28nm process technology

Baikal Electronics will distribute BE-M1000 CPUs mainly to state-owned companies, which come with government-approved software such as Linux Astra OS, Red OS, My Office, or VPNet SafeBoot. Additionally, BE also plans to distribute the CPUs to Russian system integrators such as iRU. The company will sell All-in-One systems and notebooks based on Baikal processors.

https://videocardz.com/newz/tsmc-de...l-be-m1000-cpus-based-on-arm-cortex-a57-cores

muddymind · 17 de Outubro de 2021

A rússia dispara para todos os lados.

Elbrus-16S a sair com RISC spark com instruções VLIW
Soluções RISC-V
E agora esses ARM

Eles vão a tudo e o ecosistema de software vai ser um inferno com tantas arquitecturas ao mesmo tempo.

Dark Kaeser · 17 de Outubro de 2021

Mas dado que tudo é feito "por contrato", acaba por ser feito à medida, este será provavelmente apenas para ir preenchendo até vir o RISC-V.

muddymind · 17 de Outubro de 2021

RISC-V é compreensível que seja o end game para eles pois não ficam reféns de poderem levar sanções como aconteceu com a Huawei barrada de usar ARM (se bem que continuam dependentes da TSMC...)

Dark Kaeser · 19 de Outubro de 2021

Não é bem uma curiosidade, quer dizer olhando para o hardware fica a pergunta

Eni told us that out of the 1,522 nodes, there were 1,375 nodes equipped with two Epyc 7402 CPUs, which have 24 cores running at 2.8 GHz plus two Nvidia V100 accelerators ; 125 nodes with a pair of Epyc 7402 CPUs and two Nvidia A100s; 22 nodes with two Epyc 7402 CPUs and two AMD MI100s. Additionally, there 20 login nodes without GPUs.

https://www.nextplatform.com/2021/10/12/eni-chooses-utility-pricing-for-new-hpc4-supercomputer/

Nemesis11 · 1 de Novembro de 2021

Alguém decidiu colocar um Pentium numa Board para 386.

I was wondering what would be the ultimate upgrade for my 386 motherboard. It has a 386 CPU soldered-in, an unpopulated 386 PGA socket and a socket for either 387 FPU or 486 PGA or (might take a Weitek as well – not quite sure) and even might have a soldered-in 486SX PQFP. Plenty of options…

But how about hacking a Pentium in?

However, looking at the Pentium Overdrive pinout the extra row of pins doesn’t seem to be at all essential. There is a number of extra power points and some signalling pins to support L1 cache coherency when using write-back. Nothing too much to worry about. Conveniently, the PODP doesn’t use 486 WB cache controls like DX4 or 5×86 does, so the WB/WT pin remains floating and no need to configure the board for P24T like you do it on normal 486 motherboards.

And believe it or not, it runs fine. The MR BIOS on my board is quite confused and reports the CPU to be 586SX, which I like.

Obviously, the performance is slower than newer 486 boards, the L1 runs in write-through mode, but it runs just fine.

I thought there must be a reason for the PODP having the extra power pins. Just to play it safely I configured maximum power on my ATX2AT device and set the frequency to 25MHz. However, it turns out that idling only @ 3.6A – not too far from ordinary 486DX.

https://dependency-injection.com/pentium-on-a-386-motherboard/

Very Nice.

JPgod · 3 de Novembro de 2021

Não sei coloco isso nesta thread ou nas thread que nos deixam parvos

DELL de 2008 vs um de 2021 e ficou bem "pior"...

Meme obrigatório:

MinisterOfSound · 4 de Novembro de 2021

Boas,

Se vais pegar na Dell então creio que esse tipo também se tem debruçado sobre as manhosices, para não dizer pior, da Dell.

Na minha opinião a Dell agora está má, especialmente na Europa e Portugal, seja em suporte ou vendas.

Além da oferta deles ser quase toda baseada na Intel, conseguir um servidor com Epyc é quase milagre.

Cumps.

Nemesis11 · 4 de Novembro de 2021

Conseguir um servidor Epyc a preços decentes é quase milagre em todo o lado e em relação a Desktops mercado consumidor, há muitos anos é uma corrida ao preço mais baixo. É andar mesmo a raspar o tacho.
Mas grande parte da culpa é do mercado. O mercado é que puxa pelos preços mais baixos possíveis. Além disso, faz sentido construir um "tanque de guerra" quando depois a vida útil do computador é de 3 ou 4 anos?

Anyway.....Computador com Chips AMD, nos mais recentes Tesla:

Specs:

Tesla Car Computer full specs:

The main processor is now an AMD Ryzen YE180FC3T4MFG (4 core 45-watt Ryzen Embedded) 512 KB L2 cache per core, 4 MB L3 cache.
The GPU is also an AMD Radeon marked 215-130000026, the closest guess is similar to a Radeon Pro W6600.
The wifi/BT Module is an LG Innotek ATC5CPC001
Cell Modem is a Quectel AG525R-GL
Gateway is still the venerable SPC5748GSMMJ6
DSP 1 is an ADSP-SC587W SHARC+ Dual Core DSP with ARM Cortex-A5
DSP 2 is an AD21584 SHARC+ Dual Core DSP with ARM Cortex-A5
Ethernet Switch is a Realtek RTL9068ABD

https://videocardz.com/newz/tesla-c...n-ryzen-embedded-apu-and-discrete-navi-23-gpu

JPgod · 4 de Novembro de 2021

É weird usarem um APU antigo em vez de um zen2/3...

Miguel_Pereira · 4 de Novembro de 2021

JPgod disse:
É weird usarem um APU antigo em vez de um zen2/3...

Provavelmente não precisa de alguma muito especial a nivel de cpu e assim sai mais barato.

Processador Curiosidades de hardware

Colaborador

The Huge Payoff Of Extreme Co-Design In Molecular Dynamics​

Power Member

The $150 Million Machine Keeping Moore’s Law Alive​

Power Member

Hot Chips 33: IBM Z Telum - 5+ GHz und viel (virtueller) Cache​

Power Member

Power Member

Colaborador

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud​

1st Folding then Sex

Colaborador

NEC Deutschland GmbH Announces Collaboration with Aramco Europe to Advance HPC Applications​

JAMSTEC Goes Hybrid On Many Vectors With Earth Simulator 4 Supercomputer​

Power Member

Colaborador

TSMC delivers first batch of Baikal BE-M1000 CPUs based on ARM Cortex-A57 cores​

1st Folding then Sex

Colaborador

1st Folding then Sex

Colaborador

Power Member

Moderador

Power Member

Power Member

Moderador

Power Member

The Huge Payoff Of Extreme Co-Design In Molecular Dynamics

The $150 Million Machine Keeping Moore’s Law Alive

Hot Chips 33: IBM Z Telum - 5+ GHz und viel (virtueller) Cache

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud

NEC Deutschland GmbH Announces Collaboration with Aramco Europe to Advance HPC Applications

JAMSTEC Goes Hybrid On Many Vectors With Earth Simulator 4 Supercomputer

TSMC delivers first batch of Baikal BE-M1000 CPUs based on ARM Cortex-A57 cores