Gráfica nVidia Pascal (Mobile, TITAN Xp, GTX 1000 series)

Então mas.. Basicamente mantem-se os problemas dos consumos altos e temperaturas altas para larguras de banda superiores devido ao tamanho/largura de banda da GDDR5
 
Room For Innovation On GDDR5: Higher Density, Faster Data Rates Coming With 'GDDR5X'


gddr5x_w_600.png


8 Gb GDDR5 modules are available now for hardware manufacturers.Micron expects availability of products using GDDR5X to start hitting the market in the second half of 2016.

http://www.tomshardware.com/news/micron-launches-8gb-gddr5-modules,29974.html
https://www.extremetech.com/gaming/...-doubling-gddr5x-for-next-generation-hardware
 
Última edição:
Então mas.. Basicamente mantem-se os problemas dos consumos altos e temperaturas altas para larguras de banda superiores devido ao tamanho/largura de banda da GDDR5
Mas isso é de longe a solução mais económica e como tal serve bem as soluções de entrada. HBM devido à baixa produção é caro e escasso e o interposer aumenta ainda mais o dispêndio. Talvez daqui a 2-3 anos o HBM se torne ubiquo em todos os patamares (futurologia alert!)
 
exactamente, é um prototipo logo é normal que seja caro..

Só espero é que não se atrasem com o lançamento, e que saiam no Q2 (ai em Abril ou Maio no maximo) do proximo ano.. Ando a precisar de um upgrade..
 
Vamos ver o que sai.

Só no Pascal ou Volta que aposento a GTX 760 que ainda serve muito bem. De preferência com HBM(2).

Gostei foi de uma AMD ai com HBM que é quase metade das outras!!
 
É basicamente um sample, é normal que custe os olhos da cara, não é produção em massa..
Segundo consta é mais do dobro do valor que era o sample do GM200
Nvidia is testing a chip with the nomenclature JM601. It is a pretty expensive piece too costing 73,917 INR, which equals to approximately 1136 USD. Even after accounting for the fact that this is probably a prototype chip – this is a pretty huge figure. To put it into perspective the GM200 prototype chip was only listed at ~30, 000 INR.

Read more: http://wccftech.com/nvidia-graphics-chip-jm-601-gpu-spotted/#ixzz3pibmwByT
 
Ou simplesmente estão fartos de andarem a cuscar a descrição dos produtos que andam em testes e deram nomes e preços aleatórios a tudo. Não deve ser por pagar mais uns INR (nem sei que moeda é) de alfandega que a nvidia vai ter problemas. Possivelmente até é mais seguro.
 
Mais uns INR naquele caso é mais do dobro.


Entretanto:
NVIDIA Pascal and Volta GPUs Now Supported By Latest GeForce 358.66 Drivers – Also Adds Preliminary Support For Vulkan API
HardwareLeak 4 hours ago by Hassan Mujtaba
Some surprising information has been found in NVIDIA’s latest GeForce drivers in regards to their upcoming Pascal and Volta GPUs. As we know, NVIDIA keeps on adding preliminary support for their new graphics cards in GeForce drivers and the recently launched GeForce 358.66 driver has exposed some new information about the next generation NVIDIA GPUs.



NVIDIA GeForce 358.66 Driver Adds Pascal GPU, Volta GPU and Vulkan API Support
Well to start off this news, we already know that NVIDIA is hard at work, preparing two new GPUs for launch in 2016 and 2017. In 2016, NVIDIA plans to launch their Pascal GPU architecture which will make use of the latest FinFET process node and HBM2 memory architecture. The Pascal GPU will be aimed at both consumer and HPC markets with availability reported in first half of 2016. The second GPU in talks is the Volta GPU which is going to enhance the architecture that Pascal brings in all possible ways and while it is actually slated for a 2018 time frame for both consumer and server market, the shipments will start in 2017 to two advanced supercomputers in 2017, the Summit from Oak Ridge National Laboratory and Sierra from Lawrence Livermore National that are going to feature next generation Volta GPUs and IBM Power9 CPUs to deliver up to 300 PetaFlops of compute performance.

NVIDIA-Pascal-GPU_Roadmap-635x357.jpg


Now the surprising thing about the new GeForce 358.66 driver is that while checking through OpenCL runtime, the driver exposes two brand new compute capabilities on a few new CUDA architecture enabled GPUs. There are two unique ids, “-D__CUDA_ARCH__=600” for Pascal GPUs and “-D__CUDA_ARCH__=700” for Volta GPUs. Previously, NVIDIA has used the “-D__CUDA_ARCH__=500” for their Maxwell GPUs, “D__CUDA_ARCH__=300” for Kepler, “-D__CUDA_ARCH__=210” for Fermi and “-D__CUDA_ARCH__=200/100” for their first generation Tesla GPUs.

Volta:

  • -D__CUDA_ARCH__=700
Pascal:

  • -D__CUDA_ARCH__=600
  • -D__CUDA_ARCH__=610
  • -D__CUDA_ARCH__=620
Maxwell:

  • -D__CUDA_ARCH__=500
Now the added support is still in preliminary phase and we shouldn’t get excited all yet. In February 2014, we saw a similar leak through GeForce drivers which not only leaked the CUDA compute capabilities of the then upcoming Maxwell cards but also revealed their codenames. A few days after the leak, NVIDIA launched their first Maxwell GM107 based cards and 8 months later introduced the high-end, GM204 powered GeForce 900 series cards. Now we know that there are not one but three specific GPUs mentioned in the Pascal series with newly added compute capabilities which would indicate that the Pascal GPU is pretty much ready as far as the testing and qualification phase is concerned and if NVIDIA has plans to introduce the GPUs in 1H 2016, then we are looking at mass production in early Q1 2016. The other surprising bit is that Volta is also listed in the driver which is going to be featured on two supercomputers in 2016. This shows that Volta is currently in engineering phase but is being tested at an even faster rate than Pascal so that the top-end chips make their way in Summit in Sierra before a public introduction in 2018.

Last month, at GTC Taiwan 2015, NVIDIA presented brief technical seminars for their GPUs and the applications that worked around them. During the main keynote, Vice President of Solutions Architecture and Engineering at NVIDIA, Marc Hamilton, talked about several new technologies that NVIDIA will be announcing in the coming years. Of course, Pascal was a part of the keynote and not only did he talked about Pascal GPU but one of the slides showcased the updated Pascal GPU board with the actual chip fused on the new form factor which will be aimed at HPC servers.



What we know so far about the GP100 chip.

  • Pascal microarchitecture.
  • DirectX 12 feature level 12_1 or higher.
  • Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
  • Built on the 16FF+ manufacturing process from TSMC.
  • Allegedly has a total of 17 billion transistors, more than twice that of GM200.
  • Taped out in June 2015.
  • Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM for the consumer variant and 32GB for the professional variant.
  • Features a 4096bit memory interface.
  • Features NVLink and support for Mixed Precision FP16 compute tasks at twice the rate of FP32 and full FP64 support. 2016 release.
The Pascal board features the actual Pascal GPU core with four HBM2 stacks which will feature up to 16 GB VRAM on consumer and 32 GB VRAM on professional HPC solutions. The Pascal GPU looks very similar to the Fiji GPU with a similar design. The die seems slightly larger than the Fiji GPU and could be anywhere around 500-600mm2. We cannot say for sure whether the Pascal chip shown on the board is the full GP100 solution or a lower tier chip that will come in as a successor to the GM204 chip but knowing that NVIDIA has aimed their high-performance chips at the HPC market, such board designs will act as a new form factor for workstation / servers and it is likely to be featuring the full Pascal GPU. On the sides of the chip, we can see the metallic heatspreader while the VRMs/MOSFETs sit on both sides o the chip.



Now we know that NVIDIA has taped out Pascal chips and we recently spotted a shipment of Pascal GPUs on their way to NVIDIA’s testing facility straight from TSMC’s fabs. Now there’s been some questioning about the board we were showcased back in 2014 as to whether it will be an actual form factor and it was officially stated by NVIDIA that along side PCI-Express form factors, Pascal GPUs will be available on Mezzanine board which is smaller than PCI-Express 3.0 PCBs. This specific PCB will come with the Mezzanine connector that has speeds of 15 GB/s and up to 40 GB/s and will be available on select HPC servers and workstations that feature NVLINK support. Several of these boards can be stacked on top of each other to conserve space inside servers while consumer PCs will stick with PCI-Express form factor and full-length cards as they are the best solution for high-end gaming rigs and professional usage.

The last bit is that the drivers also add preliminary support for the next generation Vulkan API which is meant to replace OpenGL. NVIDIA is a key partner with Khronos Group along with many others who are making the Vulkan API become a reality with performance optimizations across the board and OS / hardware support far more than DirectX 12. Following is the transcript from LaptopVideo2Go which shows newly added functions and extensions in the driver:

OpenGL runtime contains following extensions and functions:

VK_EXT_KHR_device_swapchain
VK_EXT_KHR_swapchain

vkCreateInstance
vkEnumerateInstanceExtensionProperties
vkGetDeviceProcAddr
vkGetInstanceProcAddr
vkGetProcAddressNV
Extensions are not recognized by GPUCapsViewer yet.
GLEW based apps fail to launch.

This driver comes with a new runtime “nv-vk32.dll”, which exposes following functions

vkAcquireNextImageKHR
vkCreateDevice
vkCreateSwapchainKHR
vkDestroySwapchainKHR
vkEnumerateDeviceExtensionProperties
vkGetDeviceProcAddr
vkGetPhysicalDeviceSurfaceSupportKHR
vkGetSurfaceFormatsKHR
vkGetSurfacePresentModesKHR
vkGetSurfacePropertiesKHR
vkGetSwapchainImagesKHR
vkQueuePresentKHR
vkCreateInstance
vkEnumerateInstanceExtensionProperties
vkGetPhysicalDeviceMemoryProperties
vkGetInstanceProcAddr
vkEnumeratePhysicalDevices
vkCreateImage
vkDestroyImage
vkAllocMemory
vkFreeMemory
vkBindImageMemory
vkGetImageMemoryRequirements
vkQueuePresentNV


Khronos Group announced their Vulkan API few months ago that has been regarded as the successor to OpenGL. Vulkan aims to be bigger and better than what it once was. It is the only low level API that supports every single platform in existence. A big advantage of Vulkan over OpenGL is that it possesses a multi-core friendly architecture. Where OpenGL APIs did not allow a generation of graphic commands in parallel to command execution, Vulkan happily allows multiple command buffers in parallel. AMD who has put a lot of emphasis on Mantle API in the past may just leverage performance when the Vulkan API hits the market since both share the same foundation and Vulkan has cross platform support (Windows 7/8/10, Linux, Android) along with Cross-Vendor support (NVIDIA, AMD, Intel, Qualcomm, Imagination Technologies, ARM, Samsung, Broadcom, Vivanate.

Even Valve has said the Vulkan is the right way forward as said by Valve’s San Ginsburg in his speech from SIGGRAPH 2015. Dan Ginsburg, who has taken care of porting the Source 2 engine to Vulkan, didn’t tiptoe around the elephant in the room, Microsoft’s DirectX 12. In fact, he openly said that Vulkan is the right way forward and there is not much reason to create a DX12 backend when developers can use Khronos Group’s API right away; here’s a transcription of the most relevant parts:

Since hosting the first Vulkan face-to-face meeting last year, we’ve been really pleased with the progress of the API and we think it’s the right way forward for powering the next generation of high performance games.

Here’s why we think Vulkan is the future. Unless you are aggressive enough to be shipping a DX12 game this year, I would argue that there is really not much reason to ever create a DX12 back end for your game. And the reason for that is that Vulkan will cover you on Windows 10 on the same class of hardware and so much more from all these other platforms and IHVs that we’ve heard from. Metal is single platform, single vendor, and Vulkan; we are gonna have support for not only Windows 10 but Windows 7, Windows 8, we’re gonna have it on Android and all of the IHVs are making great progress on drivers, I think we’re going to see super rapid adoption. If you’re developing a game for next generation APIs, I think it’s clear that Vulkan is the best choice and we’re very pleased with the progress and the state of the API. We think it’s gonna power the next generation of games for years to come.



Moreover, we all know that Valve as a company has been trying to push OpenGL & Linux support in the last few years, in an effort to oppose Microsoft’s near monopoly on Windows; however, they haven’t had any real success so far and presently there is no reason to believe Vulkan will suddenly turn the tide. Of course, the battle for the leading next generation APIs between DirectX12, Metal and Vulkan has just begun, but we can see who’s already in pole position and it’s not Vulkan right now. Still, what gamers really care for is to get the promised performance boost and that can be achieved through constant driver optimization and robust use of the next generation APIs that are now available in the market. AMD will definitely try to enhance their graphics performance with Vulkan on the market and NVIDIA is already focusing to extend their established lead with the new APIs.

GPU Family AMD Arctic Islands NVIDIA Pascal
GPU Family
AMD Arctic Islands
NVIDIA Pascal

GPU Name AMD Greenland NVIDIA GP100
GPU Process TSMC 16nm FinFET TSMC 16nm FinFET
GPU Transistors 15-18 Billion ~17 Billion
HBM Memory (Consumers) 4-16 GB (SK Hynix) HBM2 2-16 GB (SK Hynix/Samsung)
HBM2
HBM Memory (Dual-Chip Professional/ HPC) 32 GB (SK Hynix) HBM2 32 GB (SK Hynix/Samsung) HBM2
HBM2 Bandwidth 1 TB/s (Peak) 1 TB/s (Peak)
Graphics Architecture GCN 2.0? (New ISA) Next-CUDA (Compute Oriented)
Successor of (GPU) Fiji (Radeon 300/Fury) GM200 (Maxwell)

Read more: http://wccftech.com/nvidia-pascal-v...rivers-adds-support-vulkan-api/#ixzz3qXjE4ITn

http://wccftech.com/nvidia-pascal-volta-gpus-supported-geforce-drivers-adds-support-vulkan-api/
 
NVIDIA Pascal GPU’s Double Precision Performance Rated at Over 4 TFLOPs, 16nm FinFET Architecture Confirmed – Volta GPU Peaks at Over 7 TFLOPs, 1.2 TB/s HBM2
HardwareReport 17 hours ago by Hassan Mujtaba
At this year’s SC15, NVIDIA revealed and confirmed two major bits about their next generation Pascal GPUs. The information includes details regarding process design, peak compute performance and even shared the same numbers for their Volta GPUs which are expected to hit the market in 2018 (2017 for HPC). The details confirm the rumors which we have been hearing since a few months that Pascal might be coming in market earlier next year.



NVIDIA’s Pascal and Volta GPUs Peak Compute Performance Revealed – Volta To Push Memory Bandwidth To 1.2 TB/s
For some time now, we have been hearing that NVIDIA’s next generation Pascal GPUs will be based on a 16nm process. NVIDIA revealed or should we say, finally confirmed during their SC15 conference that the chip is based on a 16nm FinFET process, NVIDIA didn’t reveal they name of the Semiconductor Foundry but it was confirmed that TSMC would be supplying the new GPUs. Now this might not be a significant bit as it has been known for months and we know that NVIDIA’s Pascal GP100 chip has already been taped out on TSMC’s 16nm FinFET process. This means that we can see a launch of these chips as early as 1H of 2016. Doubling of the transistor density would put Pascal to somewhere around 16-17 Billion transistors since Maxwell GPUs already feature 8 Billion transistors on the flagship GM200 GPU core.



TSMC’s 16FF+ (FinFET Plus) technology can provide above 65 percent higher speed, around 2 times the density, or 70 percent less power than its 28HPM technology. Comparing with 20SoC technology, 16FF+ provides extra 40% higher speed and 60% power saving. By leveraging the experience of 20SoC technology, TSMC 16FF+ shares the same metal backend process in order to quickly improve yield and demonstrate process maturity for time-to-market value.

Nvidia decided to let TSMC mass produce the Pascal GPU, which is scheduled to be released next year, using the production process of 16-nm FinFETs. Some in the industry predicted that both Samsung and TSMC would mass produce the Pascal GPU, but the U.S. firm chose only the Taiwanese firm in the end. Since the two foundries have different manufacturing process of 16-nm FinFETs, the U.S. tech company selected the world’s largest foundry (TSMC) for product consistency. (This quote was originally posted at BuisnessKorea however the article has since been removed due to confidential reasons).

What we know so far about the GP100 chip:



  • Pascal microarchitecture.
  • DirectX 12 feature level 12_1 or higher.
  • Successor to the GM200 GPU found in the GTX Titan X and GTX 980 Ti.
  • Built on the 16FF+ manufacturing process from TSMC.
  • Allegedly has a total of 17 billion transistors, more than twice that of GM200.
  • Taped out in June 2015.
  • Will feature four 4-Hi HBM2 stacks, for a total of 16GB of VRAM for the consumer variant and 32GB for the professional variant.
  • Features a 4096bit memory interface.
  • Features NVLink and support for Mixed Precision FP16 compute tasks at twice the rate of FP32 and full FP64 support. 2016 release.


Back at GTC 2015, NVIDIA’s CEO Jen-Hsun Huang talked about mixed precision which allows users to get twice the compute performance in FP16 workloads compared to FP32 by computing at 16-bit with twice the accuracy of FP32. Pascal allows more than just that, it is capable of FP16, FP32 and FP64 compute and we have just learned the peak compute performance of Pascal in double precision workloads. With Pascal GPU, NVIDIA will return to the HPC market with new Tesla products. Maxwell, although great in all regards was deprived of necessary FP64 hardware and focused only on FP32 performance. This meant that the chip was going to stay away from HPC markets while NVIDIA offered their year old Kepler based cards as the only Tesla based options. AMD which is NVIDIA’s only competitor in this HPC GPU department also made a similar approach with their Fiji GPUs which is a FP32 focused gaming part while the Hawaii GPU serves in the HPC space, offering double precision compute.



Spending a lot of energy in the computation units and dedicating a lot of energy doing double precision and arithmetic when you need it is great but when you don’t need it, there’s a lot left on the table such as the un necessary power envelope that goes under utilized, reducing the efficiency of the overall systems. If you can survive with single precision or even half precision, you can gain significant improvements in energy efficiency and that is why mixed precision matters most as told by Senior Director of Architecture at NVIDIA, Stephen W. Keckler.

Pascal is designed to be NVIDIA’s greatest HPC offering that incorporates the latest NVLINK standard and offers a UVM (Unified Virtual Memory) addressing inside a heterogeneous node. The Pascal GPU would be the first to introduce NVLINK which is the next generation Unified Virtual Memory link with Gen 2.0 Cache coherency features and 5 – 12 times the bandwidth of a regular PCIe connection. This will solve many of the bandwidth issues that high performance GPUs currently face.

First technology we’ll announce today is an important invention called NVLink. It’s a chip-to-chip communication channel. The programming model is PCI Express but enables unified memory and moves 5-12 times faster than PCIe. “This is a big leap in solving this bottleneck,” Jen-Hsun says. NVIDIA

According to official NVIDIA slides, we are looking at a peak double precision compute performance of over 4 TFLOPs along with 1 TB/s HBM2 memory which will be amount to 32 GB VRAM in HPC parts. NVIDIA’s current flagship, Tesla K80 accelerator which features two GK210 GPUs has a peak performance rated at 2.91 TFLOPs when running with boost clocks and just a little bit over 2 TFLOPs when running at the standard clock speeds. The single GK180 chip based, Tesla K40 has a double precision compute performance rated at 1.43 TFLOPs and AMD’s best single chip FirePro card, the FirePro S9170 with 32 GB VRAM has the peak double precision (FP64) performance rated at 2.62 TFLOPs.

Built for Double Precision General Matrix Multiplication workloads, both Kepler and Hawaii chips were built for compute and while their successor kept things pretty silent on the FP64 end, they did come with better FP32 performance (Maxwell and Fiji). On compute side, Pascal is going to take the next incremental step with double precision performance rated over 4 TFLOPs, which is double of what’s offered on the last generation FP64 enabled GPUs. As for single precision performance, we will see the Pascal GPUs breaking past the 10 TFLOPs barrier with ease.

NVIDIA also shared numbers for their Volta GPUs which will be rated at 7 TFLOPs (FP64) compute performance. This will be an incremental step in building multi-PFLOPs systems that will be integrated inside supercomputers from Oak Ridge National Laboratory (Summit Supercomputer) and Lawrence Livermore National Laboratory (Sierra Supercomputer). Both computers are rated at over 100 PFLOPs (peak performance) and will integrate several thousand nodes with over 40 TFLOPs performance per node. While talking about Exascale computing, NVIDIA’s Chief Scientist and SVP of Research, Bill Dally gave a detailed explanation why energy efficiency is the main focus towards HPC:

NVIDIA-Pascal-Reference-Chip-With-FPUs-635x357.jpg


So let me talk about the first gap, the energy efficiency gap. Now lot’s of people say don’t you need more efficient floating point units? That’s completely wrong, It’s not about the flops. If I wanted to build an exascale machine today, I could take the same process technology we are using to build our Pascal chip, 16nm foundry process, 10mm on a side which is about a third the linear size and about 9th the area of Pascal, so the Pascal chip is way bigger than this, believing this is a 1cm on a side chip, if I pack it with floating point units which I drew it to scale you wouldn’t see it, that little red dot is a little bigger than scale, a double precision fused multiply add (DFMA) unit and that’s about 10 pJ/OP and can run at 2 GFLOPs. So if I fill this chip with floating point units and it consumed 200W, I get 20 TFLOPs on this one chip (100mm2 die). I put 50,000 of these inside racks, I have an Exascale Machine.

Of course, its an Exascale machine and its completely worthless because much like children or pets, floating point units are easy to get and hard to take care of. What’s hard about floating point units is feeding them and taking care of what they produce, you know the results. It’s moving the data to and forth that’s hard, not building the arithmetic unit. via NVIDIA@SC15 Conference

The talks detailed that an exascale system that will be implemented in systems around 2023 will consist of several heterogeneous nodes, made up of several throughput optimized cores aka GPUs known as TOCs, Latency Optimized Cores aka CPUs known as LOCs and will consist of tight communication between them, the memory and caches to enable good programming models. The GPUs will do the bulk of heavy lifting while the CPUs will focus on sequential processing. The reason explained for this is that the CPUs have great vector core performance but when those vector cores aren’t utilized, the scalar mode turns out to be pretty useless in HPC uses. The entire system will consist of large DRAM banks which will be connected inside a heterogeneous DRAM environment and will help solve two crucial problems on current generation systems, first is to exploit all available bandwidth on the system/node and second is to maximize the locality for frequently accessed data.



CPUs waste a lot of their energy in deciding what order to do the instruction in, that usually consists restricting, reorganizing, renaming the registers and a small fraction of energy actually is used to do the actual executions.

GPUs don’t care about latency of an individual instruction, they can execute instructions through pipelines as quickly as possible. They don’t have out of order execution or branch prediction and spend a lot more of the power budget on the actual execution. Some of the systems today have half of the energy go to actual system executions as opposed to very small amount of energy in past generations. The next generation GPUs will be able to utilize more of that energy to execute instructions.

On further explaining the next generation GPU architectures and efficiency, Stephen pointed out that HBM is a great memory architecture which will be implemented across Pascal and Volta chips but those chips have max bandwidth of 1.2 TB/s (Volta GPU). Moving forward, there exists a looming memory power crisis. HBM2 at 1.2 TB/s sure is great but it adds 60W to the power envelope on a standard GPU. The current implementation of HBM1 on Fiji chips adds around 25W to the chip. Moving onwards, chips with access of 2 TB/s bandwidth will increase the overall power limit on chips which will go from worse to breaking point. A chip with 2.5 TB/s HBM (2nd generation) memory will reach a 120W TDP for the memory architecture alone, a 1.5 times efficient HBM 2 architecture that outputs over 3 TB/s bandwidth will need 160W to feed the memory alone.

This is not the power of the whole chip mentioned but just the memory layout, typically, these chips will be considered non-efficient for the consumer and HPC sectors but NVIDIA is trying to change that and is exploring new means to solve the memory power crisis that exists ahead with HBM and higher bandwidth. In the near future, Pascal and Volta don’t see a major consumption increase from HBM but moving onward in 2020, when NVIDIA’s next gen architecture is expected to arrive, we will probably see a new memory architecture being introduced to solve the increased power needs.



We will be having more of these technical talks on upcoming GPU architectures as their launch approaches in 2016. To finish this post, NVIDIA confirmed that Pascal will be available in 2016 (as it was originally confirmed) on choice of CPU platforms ranging from x86, ARM64 and Power (IBM). On the HPC front, NVIDIA will introduce NVLINK while consumer and servers side will rely on PCI-E (16 GB/s) for communication via chips.

NVIDIA Pascal GPU Slides (GTC Taiwan 2015):

Next Generation FinFET Based GPUs Comparison (AMD/NVIDIA):

GPU Family
AMD Arctic Islands NVIDIA Pascal

GPU Family
AMD Arctic Islands
NVIDIA Pascal

GPU Name Next Generation GCN NVIDIA GP100
GPU Process GloFo 14nm FinFET TSMC 16nm FinFET
GPU Transistors 15-18 Billion ~17 Billion
HBM Memory (Consumers) Up to 16 GB (SK Hynix) HBM2 Up to 16 GB (SK Hynix/Samsung)
HBM2
HBM Memory (Dual-Chip Professional/ HPC) 32 GB (SK Hynix) HBM2 32 GB (SK Hynix/Samsung) HBM2
HBM2 Bandwidth 1 TB/s (Peak) 1 TB/s (Peak)
Graphics Architecture GCN 2.0? (New ISA) Next-CUDA (Compute Oriented)
Successor of (GPU) Fiji (Radeon 300/Fury) GM200 (Maxwell)


Read more: http://wccftech.com/nvidia-pascal-volta-gpus-sc15/#ixzz3sVB8vV7U


http://wccftech.com/nvidia-pascal-volta-gpus-sc15/
 
NVIDIA announces Drive PX 2, first 16nm FinFET Pascal powered mobile supercomputer



NVIDIA announces its first FinFET based device called Drive PX 2.

NVIDIA Drive PX 2 kicks off Pascal architecture
What’s Drive PX 2? It’s a supercomputer designed for self-driving automobiles. First device powered by Pascal architecture. Not exactly what we, graphics cards enthusiasts, were waiting for.

The device is powered by two Tegra SoCs and two Pascal GPU. In total we have 12 CPU Cores and two Pascal GPUs that can offer 8 TFLOPS of raw single-precision computing performance. That’s more than TITAN X. The GPUs were designed in 16nm FinFET fabrication process and it doesn’t look like they are powered by High-Bandwidth-Memory, but GDDR5 instead. The chips used in Drive PX 2 are most likely mid-range solution that might never reach pure gaming-oriented market.

NVIDIA has not shared any details about the architecture of Pascal. The exact specifications of those Pascal GPUs are still a mystery. We are just told that whole computer offers 8 TFLOPs of performance, that’s including GPUs and SoCs, so it’s impossible to extrapolate numbers just for GPUs.

To everyone’s surprise Drive PX 2 is liquid-cooled device. Obviously it was impossible to cool-down 250W of power in such a small format (size of a tablet) with conventional methods of cooling, but the decision to use liquid cooling with first Pascal GPUs is something I definitely not expected.

Hopefully we will hear more about Pascal GPUs in the coming months.







GPU side vs Tegra side

PRESS RELEASE
Accelerating the race to autonomous cars, NVIDIA (NASDAQ: NVDA) today launched NVIDIA DRIVE™ PX 2 — the world’s most powerful engine for in-vehicle artificial intelligence.

NVIDIA DRIVE PX 2 allows the automotive industry to use artificial intelligence to tackle the complexities inherent in autonomous driving. It utilizes deep learning on NVIDIA’s most advanced GPUs for 360-degree situational awareness around the car, to determine precisely where the car is and to compute a safe, comfortable trajectory.

“Drivers deal with an infinitely complex world,” said Jen-Hsun Huang, co-founder and CEO, NVIDIA. “Modern artificial intelligence and GPU breakthroughs enable us to finally tackle the daunting challenges of self-driving cars.
“NVIDIA’s GPU is central to advances in deep learning and supercomputing. We are leveraging these to create the brain of future autonomous vehicles that will be continuously alert, and eventually achieve superhuman levels of situational awareness. Autonomous cars will bring increased safety, new convenient mobility services and even beautiful urban designs — providing a powerful force for a better future.”

24 Trillion Deep Learning Operations per Second
Created to address the needs of NVIDIA’s automotive partners for an open development platform, DRIVE PX 2 provides unprecedented amounts of processing power for deep learning, equivalent to that of 150 MacBook Pros.

Its two next-generation Tegra® processors plus two next-generation discrete GPUs, based on the Pascal™ architecture, deliver up to 24 trillion deep learning operations per second, which are specialized instructions that accelerate the math used in deep learning network inference. That’s over 10 times more computational horsepower than the previous-generation product.

DRIVE PX 2’s deep learning capabilities enable it to quickly learn how to address the challenges of everyday driving, such as unexpected road debris, erratic drivers and construction zones. Deep learning also addresses numerous problem areas where traditional computer vision techniques are insufficient — such as poor weather conditions like rain, snow and fog, and difficult lighting conditions like sunrise, sunset and extreme darkness.

For general purpose floating point operations, DRIVE PX 2’s multi-precision GPU architecture is capable of up to 8 trillion operations per second. That’s over four times more than the previous-generation product. This enables partners to address the full breadth of autonomous driving algorithms, including sensor fusion, localization and path planning. It also provides high-precision compute when needed for layers of deep learning networks.

Deep Learning in Self-Driving Cars
Self-driving cars use a broad spectrum of sensors to understand their surroundings. DRIVE PX 2 can process the inputs of 12 video cameras, plus lidar, radar and ultrasonic sensors. It fuses them to accurately detect objects, identify them, determine where the car is relative to the world around it, and then calculate its optimal path for safe travel.

This complex work is facilitated by NVIDIA DriveWorks™, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles. DriveWorks enables sensor calibration, acquisition of surround data, synchronization, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2’s specialized and general-purpose processors. Software modules are included for every aspect of the autonomous driving pipeline, from object detection, classification and segmentation to map localization and path planning.

End-to-End Solution for Deep Learning
NVIDIA delivers an end-to-end solution — consisting of NVIDIA DIGITS™ and DRIVE PX 2 — for both training a deep neural network, as well as deploying the output of that network in a car.

DIGITS is a tool for developing, training and visualizing deep neural networks that can run on any NVIDIA GPU-based system — from PCs and supercomputers to Amazon Web Services and the recently announced Facebook Big Sur Open Rack-compatible hardware. The trained neural net model runs on NVIDIA DRIVE PX 2 within the car.

Strong Market Adoption
Since NVIDIA delivered the first-generation DRIVE PX last summer, more than 50 automakers, tier 1 suppliers, developers and research institutions have adopted NVIDIA’s AI platform for autonomous driving development. They are praising its performance, capabilities and ease of development.

“Using NVIDIA’s DIGITS deep learning platform, in less than four hours we achieved over 96 percent accuracy using Ruhr University Bochum’s traffic sign database. While others invested years of development to achieve similar levels of perception with classical computer vision algorithms, we have been able to do it at the speed of light.”
— Matthias Rudolph, director of Architecture Driver Assistance Systems at Audi

“BMW is exploring the use of deep learning for a wide range of automotive use cases, from autonomous driving to quality inspection in manufacturing. The ability to rapidly train deep neural networks on vast amounts of data is critical. Using an NVIDIA GPU cluster equipped with NVIDIA DIGITS, we are achieving excellent results.”
— Uwe Higgen, head of BMW Group Technology Office USA

“Due to deep learning, we brought the vehicle’s environment perception a significant step closer to human performance and exceed the performance of classic computer vision.”
— Ralf G. Herrtwich, director of Vehicle Automation at Daimler

“Deep learning on NVIDIA DIGITS has allowed for a 30X enhancement in training pedestrian detection algorithms, which are being further tested and developed as we move them onto NVIDIA DRIVE PX.”
— Dragos Maciuca, technical director of Ford Research and Innovation Center

The DRIVE PX 2 development engine will be generally available in the fourth quarter of 2016. Availability to early access development partners will be in the second quarter.

http://videocardz.com/58075/nvidia-...nm-finfet-pascal-powered-mobile-supercomputer
 
Back
Topo