Processador Intel Sapphire Rapids (2022 - SR-SP e SR-X)

Não é a primeira vez que a Facebook não usa todos os potenciais canais de memória de uma plataforma. Também no Sapphire Rapidds não devem ir usar os SKUs com mais cores.

Já agora, já começaram a instalar o super computador "Crossroads", em que o primeiro Cluster terá 2600 servidores com Sapphire Rapids.
oUU8drp.jpg

“We’re excited to be entering this new phase of supercomputing at the Lab,” said Los Alamos’ HPC Platforms Program Director Jim Lujan. “Early benchmarks indicate a four-times increase in speed over Trinity. All of the new efficiencies that are part of Tycho, and ultimately Crossroads, come together to reduce that crucial time to insight. Improving efficiencies in many areas for modeling and simulation is what this project is all about.”
In coming months, the Crossroads team will work to stabilize Tycho, calibrating the system’s 2,600 Sapphire Rapids nodes for maximum efficiency. Software will be installed and functionality testing will take place — all in time to ensure Tycho’s early use in the classified environment by the end of the year and full production status in March 2023.
https://discover.lanl.gov/news/1020-supercomputer-tycho

Em relação aqueles valores de "4 vezes mais", o "Trinity" é um super computador de 2015 e usa Haswells de 16 Cores e Xeons Phi de 68 Cores. Este "Crossroads" é CPU only e vai usar Sapphire Rapids de 56 Cores, e no futuro, vão passar para a versão com 64 GB de HBM2e integrados no Socket.
 

Hands-on Benchmarking with Intel Sapphire Rapids Xeon Accelerators​


We are not going to get to show you everything. Intel has specifically only allowed us to show some of the acceleration performance of the new chips. Since it is going to be a few months until these officially launch, we have some significant guardrails on what we can publish.
As you can see, this is a fairly standard-looking 2U QCT machine. In the front, we have 24x 2.5″ bays. In the middle, we have a fan partition. Perhaps the big one here, though, is really the CPUs.
Dual-4th-Gen-Intel-Xeon-Scalable-QTC-Built-SDP-3.jpg

The process of installing chips in the new platform is very similar. The biggest difference is that the CPUs have heat pipes with extra heatsink area attached that have extra screws.
Dual-4th-Gen-Intel-Xeon-Scalable-QTC-Built-SDP-9.jpg

Dual-4th-Gen-Intel-Xeon-Scalable-QTC-Built-SDP-14.jpg

Intel’s headline feature is really the built-in accelerators. Intel has five accelerators:

  • Intel AMX (Advanced Matrix Extensions) – This provides bfloat16 and INT8 matrix math acceleration. Intel has said that in future chips, AMX will be extended to support FP16. The basic idea here with AMX is that Intel is looking to move AI acceleration on-chip, and at a 2x or higher clip versus the 3rd Generation Intel Xeon Scalable (Ice Lake.) The bfloat16 support here is important. In the 3rd Gen that was a Cooper Lake feature. The new CPU supports avx_vnni as well as avx512_bf16. Intel’s goal with this is not necessarily to remove the need for a NVIDIA H100, or other accelerator completely. It is simply to raise the bar for “good enough” AI inferencing on CPUs. Think of this as though if you were to install a NVIDIA T4, A2 or something similar on each CPU with the expectation that it would be only fractionally used, AMX will make it so that is not required.
  • Intel DSA (Data Streaming Accelerator) – This one helps with things like data copy and transformation for things like NVMe/TCP. We are going to look at this one later, but this is really for speedups in things like VM migrations.

  • Intel IAA (In-Memory Analytics) – This is to help databases use less memory and less memory bandwidth. To us, this is one where Intel will need to work with the database software folks, and then in a few months, we would expect major in-memory databases to support this.

  • Intel DLB (Dynamic Load Balancer) – DLB is managed set of queues for moving data through cores. This is something that is often done in software, but Intel is bringing to hardware for lower latency in applications such as packet processing.

  • Intel QAT (QuickAssist Technology) – We have covered QAT since 2013 on STH. It is finally going mainstream in CPUs. Get excited for this one. This is what we are going to focus most of our efforts on today.

Final Words​


At the end of the day, the accelerators we have shown today are new. Some software work needs to go into utilizing them. QAT is probably the most mature, yet it is also one that some in-the-know love, but others do not use after almost a decade in the market.
Overall, this was an interesting piece. On one hand, we could show you the accelerators. On the other hand, we have data and information on Sapphire Rapids we cannot show you. It feels strange since we have so much Genoa data as well at this point.
If you are grappling with “why acceleration?”, let us conclude with Intel’s thesis. For certain high-volume tasks, offloading to accelerators is more power efficient. It also unburdens general-purpose cores so they can be freed to be utilized by VMs or to do other tasks. Acceleration, as a data center trend, is happening. AI inference and training are great examples or even modern NICs that offload much more than predecessors. As chips get larger, the amount of workloads running on them increases, and therefore accelerators start to make more sense as more applications can drive utilization.
https://www.servethehome.com/hands-on-with-intel-sapphire-rapids-xeon-accelerators-qct/
 
O semiaccurate tem um artigo, parcialmente paywalled, sobre os "DLCs" nos Sapphire Rapids e na parte que está visível, eles felizmente não poupam nas palavras.
Intel is offering ‘upgradeable’ features on their new Xeons, an idiotic move that will hurt the company. SemiAccurate can’t understand how something this stupid and self-destructive was allowed to go through.
Intel’s marketing is the single most detrimental thing the company is fighting at the moment, the problems it creates makes the threat of AMD pale in comparison.
:rofl::clap:
You might recall the pricing of Cascade lake which took Xeons from Broadwell’s $4000 to nearly $20,000. The most galling bits were the abusive memory tax, but the SKU proliferation was also egregious.
On Ice Lake-SP, Intel told OEMs that they would price MK-TME ‘security’ by the key against AMD’s faster, cheaper, and more functional CPUs which had SME/SEV included for free. This provoked such massive backlash that Intel had to pull the scheme. The fact that it wasn’t able to be debugged in time for launch and was pulled is an afterthought, the pitch, and the damage, had been done.
This time around Intel can’t extort for memory, AMD Epycs have a higher capacity, are faster, and cheaper, can’t extort for MK-TME keys, it still doesn’t work in Sapphire Rapids, so they turned to SGX. You know that technology that no one uses and no one wants because you have to recode your apps to work with it, not just recompile. The idea this time was to artificially fuse off SGX capacity and sell it back to customers by size. Those that had rewritten their apps for SGX were trapped so opportunities to milk abounded. Customer enmity be damned, bonuses were on the offing!
Since these codes will likely come from Intel itself rather than the OEMs that sell the systems, it is high margin profits. If you come from a financial background you might see the subtle problem that it cuts OEMs out of the picture and lowers their profits and margins. So this time around Intel is hell bent on pissing off OEMs and VARs too.
This plan is both self-destructive and abjectly stupid. We have gone through the self-destructive part above, albeit with only a few cherry-picked examples, so why do we say it is abjectly stupid? Other than the obvious part, the ‘upgrades’ Intel is offering on Sapphire Rapids are things that, well, no sane deployment can utilize. Sure on paper you can get advantages but once a system is purchased and deployed, SemiAccurate can’t think of a scenario that these ‘upgrades’ will benefit the purchaser. Do note that this is regardless of the likely abusive pricing, it just doesn’t make sense on the surface.
Intel marketing doesn’t seem to understand that AMD Epycs are faster, cheaper, and better in just about every way.
https://semiaccurate.com/2022/10/28/intels-sapphire-rapids-upgrades-are-a-bad-idea/

100% de acordo.
O meu receio é que, sendo a Intel, as OEMs, VARs e Clientes, comam e calem, este modelo funcione e se espalhe para outros segmentos de mercado.
 
O dr. Ian Cutress bem tinha razão nas previsões de passagem de ano que SPR só no próximo ano e mesmo assim não será no início... Vem praticamente competir com genoa quando era para competir com milan e milan-x. O défice de cores vai ser muito difícil de colmatar.

QAT pode ajudar a anular a diferença nalguns cenários mas isso não é uma varinha mágica para todos os casos genéricos:
 

Intel Xeon Max CPU is the Sapphire Rapids HBM Line​


According to the new disclosure, the Intel Xeon Max series of processors will only scale to 56 cores, not the 60 core non-HBM units that we tested recently. That acceleration will be a key part of the new generation. There will be 64GB of HBM2e onboard and Intel will be over 100MB of L3 cache with up to 112.5MB. Another notable feature is that the new chips will scale up to 2 sockets even though the Sapphire Rapids line will scale to 4 and 8-socket systems.
Intel-Tech-at-SC22-Intel-Xeon-Max-CPU-Overview-696x391.jpg

Intel-Tech-at-SC22-Intel-Xeon-Max-CPU-HBM-Modes-696x391.jpg

Intel-Tech-at-SC22-Intel-Xeon-Max-CPU-Building-Blocks-696x391.jpg

Intel-Tech-at-SC22-Intel-Xeon-Max-CPU-P-Core-696x391.jpg

We expect the next generation of HPC-focused CPUs to be really interesting. AMD will have a large lead on core counts versus Xeon Max. AMD Genoa-X we expect to be well over 1GB/ L3 cache per core while also offering up to 50% more DDR5 bandwidth. Xeon Max is using fast and relatively large on-package memory instead of growing caches. We will likely see some workloads that have random data sets too large for AMD’s caches look relatively spectacular on Xeon Max.

Intel told us that Xeon Max will use the full 4 compute tile design across all of its SKUs. We think this is required for four packages of HBM 2e. That means it will not use some of the lower-core counts Sapphire Rapids designs that are notably different from the larger chips.
https://www.servethehome.com/intel-xeon-max-cpu-is-the-sapphire-rapids-hbm-line/
 
QAT pode ajudar a anular a diferença nalguns cenários mas isso não é uma varinha mágica para todos os casos genéricos:
O problema do QAT e dos outros aceleradores neste Xeon, é a integração com o software. O QAT (Quickassist) foi lançado há quase 10 anos, integrado em múltiplos cores, de Atoms empresariais a Xeons, já houve várias placas dedicadas só para o QAT e a sua adopção não é extraordinária.
E se isto acontece com algo já maduro e de "borla" em alguns CPUs, não sei se os novos aceleradores (AMX, DSA, IAA), ainda por cima sendo a pagar para adicionar mais unidades às que o SKU tiver, terão grande adopção nos próximos tempos.
Acumulando a isto tudo, não é algo que se possa testar num processador desktop/laptop do mercado consumidor.
Intel Xeon Max CPU is the Sapphire Rapids HBM Line
Bela foto :)
RlLxUbl.jpg
 
Agora que está a a decorrer a SC22, já tem havido umas apresentações e já há números, da Intel, para os Xeon Max

intel-max-series-hpc-ai-bound.jpg

Addressing these distinctions across workloads means having different kinds of compute and memory in different mixes to optimize for different workloads. And that, in a nutshell, is why Intel has to have GPUs with HBM, CPUs with DRAM, and CPUs with HBM in its lineup.

Where The Memory Hits The Code​


Intel rushed out its announcements about the Max Series CPU ahead of the AMD “Genoa” Epyc 9004 server chip launch last week, and that was so it could make more comparisons to the Milan-X 7773X chip instead of the top-bin Genoa part, which will be goosed with 3D V-Cache next year.

Here is a broader and more useful set of benchmarks that Echeruo shared. The first shows how the HBM variant of a 56-core Sapphire Rapids stacks up to an Ice Lake Xeon SP-8380 running HPL. HPL is not particularly bandwidth bound, of course, so the performance increase by moving to the Max Series CPU is only 1.4X.
intel-max-series-cpu-hpc-performance.jpg

This is actually less than you would expect based on core counts and IPC, and is actually just based on core count since the AVX-512 units have not really changed all that much with the Golden Cove cores. But look at what happens on the High Performance Conjugate Gradients (HPCG) and Stream Triad benchmarks, both of which are heavily dependent on memory bandwidth. (Stream Triad is the test for memory bandwidth, in fact.) The performance increases are 3.8X and 5.3X, respectively.
But look at what happens on the High Performance Conjugate Gradients (HPCG) and Stream Triad benchmarks, both of which are heavily dependent on memory bandwidth. (Stream Triad is the test for memory bandwidth, in fact.) The performance increases are 3.8X and 5.3X, respectively.

And on the same tests, Intel will show 1.7X better performance on HPL compared to an AMD 7773X with 3D V-Cache, and 3.2X better performance on HPCG and 5X better performance on Stream Triad.


Here are some more general comparisons of the AMD 7773X versus the Max Series CPU on 19 different HPC and AI workloads:
intel-max-series-cpu-performance.jpg

Generally speaking, the HBM variant of Sapphire Rapids is going to offer somewhere around 2X the performance of an Ice Lake Xeon SP and a Milan Epyc 7003. Now all that Intel has to do is not make it cost 2X as much and then it might actually sell some.

The math we are itching to do is how many “Skylake” and “Cascade Lake” Xeon SPs can be replaced by a Sapphire Rapids HBM socket, and how much money will that save or allow to be plowed back into more compute capacity. If Intel plays this right, it can make some happy customers and gain some goodwill as well as some desperately needed HPC and AI revenues.
https://www.nextplatform.com/2022/1...eon-sps-plus-hbm-offer-big-performance-boost/


Honestamente não sei até que ponto o "Genoa-X" (Epyc zen4 com 3Dcache) não irá ficar nas proximidades da performance do 1.5x, depois a questão será sempre o preço.
 

First Intel W790 Sapphire Rapids Workstation Motherboard Spotted​

The Supermicro X13SWA-TF motherboard (listed at Atic.ca and spotted by @momomo_us) comes in an E-ATX form factor, which is used for workstations, desktops, and tower servers. The motherboard is priced at CAD$1290 ($965) with a discount if you pay cash, but there is no mention of its availability timeframe.
https://www.tomshardware.com/news/intel-w790-motherboard-spotted
 
q7MJ2yv.jpg


Bem bonita. Não gosto do formato M.2 em Desktop. Apenas 4 M.2 ocupam demasiado espaço na board.
No canto inferior esquerdo, parece-me ter múltiplos conectores SATA (8?), mas por cima parece-me ter outros conectores, que não percebo exactamente o que são.
 
O que não deixo de reparar que para 350W o circuito de VRM é demasiado pequeno, enquanto nos ryzen e lga1700 da vida tem-se circuitos muito maiores para fornecer potencias similares.
 
A Intel lançou o Sapphire Rapids e, ou a Intel sabe alguma coisa que está a escapar à maior parte das pessoas, ou é desconcertante.
uFmjjBz.jpg


Vejo tantas coisas erradas nesta tabela.

Eu conseguiria compreender esta tabela se a Intel ainda detivesse +95% do mercado Servidor, a AMD não existisse neste mercado e as várias Clouds não estivessem a colocar em produção diversos processadores ARM. Estariam a puxar bastante pelos preços e fazer segmentação desnecessária, mas seria o normal para uma empresa que tem o total domínio de um mercado.
Na posição actual em que a Intel se encontra, faz pouco sentido.

O ponto que a Intel dá maior relevância, neste lançamento, é os "aceleradores", mas coloca-os atrás de uma "paywall" e, em muitos dos SKUs, o "paywall" está "desactivado" (estão literalmente a recusar dinheiro dos clientes).
Não é só nos "aceleradores". Fazerem segmentação em detalhes como a velocidade da memória RAM ou o SGX, não faz qualquer sentido actualmente.

Os preços em muitos SKUs, não são nada competitivos. Uns exemplos:
Intel Xeon Platinum 8480 - 56 Cores - 2 Ghz Base - 3 Ghz Turbo - 105 MB L3 - 350 W - 10710$
AMD Epyc 9554 - 64 Cores - 3.1 Ghz Base - 3.75 Ghz Turbo - 256 MB L3 - 360 W - 9087$

Intel Xeon Platinum 8462Y - 32 Cores - 2.8 Ghz Base - 3.6 Ghz Turbo - 60 MB L3 - 300 W - 5945$
AMD Epyc 9354 - 32 Cores - 3.25 Ghz Base - 3.8 Ghz Turbo - 256 MB L3 - 280 W - 3420$

Este CPU também tem algumas "limitações" estranhas.
Por exemplo, o complexo Pci Ex Gen5, pode ser dividido em lanes 16x, 8x, 4x, mas se for dividido em 2x ou 1x, não funciona em Gen5 e só funciona em Gen4.
Outro exemplo, CXL (um protocolo criado pela Intel), funciona com todos os tipos, menos nos "Type 3", que são os dispositivos de memória, que é exactamente onde tem aparecido maior interesse no uso de CXL e são exactamente os dispositivos que a AMD suporta nos Epyc.
Dá ideia que não validaram alguns cenários, para poder lançar agora o processador/plataforma.

Por ultimo, até o New York Times tem um artigo sobre este lançamento. Ficam aqui algumas quotes:

Inside Intel’s Delays in Delivering a Crucial New Microprocessor​

The company grappled with missteps for years while developing a microprocessor code-named Sapphire Rapids. It comes out on Tuesday.
Engineers had worked for more than five years to develop a powerful new microprocessor to carry out computing chores in data centers and were confident they had finally gotten the product right. But signs of a potentially serious technical flaw surfaced during a regular morning meeting to discuss the project.
The launch of Sapphire Rapids wound up being pushed from mid-2022 to Tuesday, nearly two years later than once expected.
For Intel, the pressure is on. Along with falling demand for chips used in personal computers, the company faces stiff competition in the server chips that are its most profitable business. That issue has worried Wall Street, with Intel’s market value plunging more than $120 billion since Mr. Gelsinger took charge.
Sapphire Rapids began in 2015, with discussions among a small group of Intel engineers. The product was the company’s first attempt at a new approach in chip design.
The Sapphire Rapids team grappled with bugs, flaws caused by designer errors or manufacturing glitches that can cause a chip to make incorrect calculations, work slowly or stop functioning.
They were also affected by delays in the product’s manufacturing process.
Repeating that process led to missed deadlines. Ms. Nassif said Sapphire Rapids was designed to counter AMD’s Milan processor, which was introduced in March 2021. But it still wasn’t ready by that June, when Intel announced a delay until the next year to allow more validation.
Then came the discovery of the flaw last May. Ms. Rivera would not describe it in detail but said it had affected the processor’s performance. In June, she used an investor event to announce a delay of at least a quarter, which pushed Sapphire Rapids later than the launch of a competing AMD chip in November.
Ms. Rivera saw a series of lessons from the setbacks. One was simply that Intel packed too many innovations into Sapphire Rapids, rather than deliver a less ambitious product sooner.
She also concluded that the team should have spent more time on perfecting and testing its design using computer simulations.
She also determined that Intel had scheduled more products than its engineers and customers could easily handle. So she streamlined that product road map, including pushing back a successor to Sapphire Rapids to 2024 from 2023.
https://www.nytimes.com/2023/01/10/technology/intel-sapphire-rapids-microprocessor.html

Se o sucessor a que se referem, for o Emerald Rapids, é bastante preocupante.

Do lado positivo, alguns SKUs <32 Cores, pelo preço, TDPs e por serem monolíticos, podem ser +/- interessantes e os "aceleradores" e SKUs com HBM serão interessantes em alguns cenários.
 
Última edição:
Back
Topo