Research

What Is a GPU and Why Does AI Need Them?

From graphics card to AI accelerator;the chip that changed everything.

[01]

The Chip That Was Not Built for This

A Graphics Processing Unit;GPU;was designed to draw pixels. When you move a character across a screen, the GPU calculates colour and light for millions of pixels simultaneously, updating the image 60 or more times per second. That task demands thousands of small processors working in parallel rather than one powerful processor working in sequence.

That architectural quirk turned out to be exactly what AI needed. A Central Processing Unit;CPU;is a general-purpose chip. It executes complex instructions in sequence, switching rapidly between tasks.

Modern CPUs have 8 to 128 cores, each capable of sophisticated logic. GPUs have thousands of smaller cores;NVIDIA's B200 has over 10,000 CUDA cores;each individually simple but collectively capable of enormous throughput when the same operation is applied to many data points at once. AI training is exactly that kind of workload.

Training a neural network involves multiplying large matrices of numbers together, billions of times, adjusting weights based on the results. It is not complex logic. It is repetitive arithmetic at scale. GPUs are extraordinarily good at it. CPUs are not.

[02]

What Makes a GPU Different

The architectural difference is fundamental. A CPU optimises for latency;completing one task as fast as possible. A GPU optimises for throughput;completing many tasks simultaneously, even if each individual task takes slightly longer. CPUs dedicate most of their silicon to cache memory and control logic that helps individual cores make smart decisions quickly.

GPUs dedicate most of their silicon to arithmetic units. The tradeoff is deliberate. CPUs excel at branching logic;if this, then that, else something else;which describes most software. GPUs excel at uniform computation;do this calculation to all 1,000 inputs simultaneously.

AI models, particularly deep neural networks, are structured as layers of matrix multiplications. Each layer transforms input data through millions of learned parameters. The GPU applies those transformations in parallel across the entire dataset. A CPU would process each data point in turn.

A GPU processes thousands simultaneously. For a dataset of 1 million training examples, the difference in training time is measured in days versus months. This is not a minor efficiency gain. It is what makes practical AI possible.

[03]

VRAM: Why Memory Matters as Much as Compute

The second critical attribute of a GPU is its memory;called VRAM (Video RAM). Standard system RAM connects to the CPU via a relatively slow bus. GPU memory sits directly on the chip, connected via a very wide, very fast bus called HBM;High Bandwidth Memory.

NVIDIA's B200 includes 192GB of HBM3e memory with 8 terabytes per second of memory bandwidth. That bandwidth figure matters enormously. Training a large language model requires keeping the entire model in memory while processing data.

GPT-3, with 175 billion parameters, requires roughly 350GB of VRAM in FP16 format. No single GPU holds that; it must be distributed across multiple GPUs connected by fast interconnects. The size of the model you can train;or serve for inference;is directly constrained by how much VRAM you can access simultaneously.

This is why the GPU market segments on memory as much as on compute. An H100 with 80GB is useful. An H100 NVL with 188GB is substantially more capable. An NVL72 rack with 72 B200s sharing 13.5TB of pooled HBM is a different category altogether.

[04]

Why AI Cannot Simply Use CPUs

This question gets asked. The honest answer is: AI training can use CPUs, but at a cost that renders it economically unviable at scale.

Training GPT-3 on CPUs at equivalent cost would require hundreds of years of compute time. With GPUs, it took roughly 355 GPU-years;completed in weeks by using thousands of GPUs in parallel.

The mathematical structure of neural networks;matrix multiplications, convolutions, attention operations;maps almost perfectly onto GPU architecture. Every advancement in GPU hardware has been co-optimised with AI workloads.

NVIDIA's Tensor Cores, introduced in the Volta architecture and refined through Ampere, Hopper, and Blackwell, are dedicated silicon for mixed-precision matrix multiplication;the single most common operation in AI training. These cores do not exist on CPUs. They are purpose-built for AI. Alternative chips exist;Google's TPUs, AWS's Trainium, Cerebras's wafer-scale processors;but they all follow the same principle: massive parallelism dedicated to the arithmetic of AI. The GPU simply got there first and built the largest software ecosystem around it.

[05]

The Ecosystem That Locks It In

NVIDIA's competitive advantage is not just hardware. CUDA;Compute Unified Device Architecture;is the programming layer that sits between software and GPU silicon. Launched in 2006, CUDA gave developers a way to write general-purpose programs that run on GPUs.

Every major AI framework;PyTorch, TensorFlow, JAX;is built on CUDA. Every AI researcher learns CUDA-compatible tooling. Every optimisation library;cuDNN, cuBLAS, TensorRT;targets CUDA.

Switching away from NVIDIA GPUs means re-writing or re-compiling against different tooling, accepting performance degradation during the transition, and rebuilding institutional knowledge. This is not impossible;AMD's ROCm platform is advancing;but it is costly. The result: NVIDIA commands roughly 80-90% of the AI training accelerator market.

That share is not maintained by hardware alone. It is maintained by 18 years of software ecosystem development. If you are building AI infrastructure, understanding this dynamic matters. The GPU you choose is also a software platform decision. Disintermediate works with operators and investors navigating hardware strategy decisions;contact us at disintermediate.global/contact.

Key Takeaways
01

GPUs have thousands of small parallel cores optimised for throughput; CPUs have fewer but more powerful cores optimised for latency;AI needs throughput

02

VRAM bandwidth (up to 8TB/s on B200 HBM3e) determines how large a model you can train or serve; memory is as important as compute

03

Training GPT-3 on CPUs would take hundreds of years; GPUs completed it in weeks by parallelising matrix arithmetic across thousands of cores

04

NVIDIA's CUDA ecosystem;built over 18 years;is as much a moat as the hardware; switching costs are real and significant

05

Alternative accelerators (TPUs, Trainium, Gaudi) follow the same parallel architecture principle but face CUDA's deep software entrenchment

Next Steps

This analysis is produced by Disintermediate, drawing on data from The GPU intelligence platform - tracking 2,800+ companies across 72 categories, real-time GPU pricing from 70+ providers, and advisory engagement experience across the GPU infrastructure value chain.