GPU Infrastructure Glossary
Terms, technologies, and concepts in AI compute infrastructure.
A
- All-Reduce Operation
All-reduce is a collective communication operation that aggregates data from all participating processes and distributes...
B
- Bare Metal
Bare metal refers to GPU servers accessed without virtualisation or hypervisor overhead — the customer receives direct h...
C
- Colocation
Colocation (colo) is a data centre service model where the facility provider supplies power, cooling, physical security,...
- Capacity Planning
Capacity planning is the discipline of forecasting GPU compute demand and aligning infrastructure procurement, deploymen...
- Cluster Utilisation
Cluster utilisation measures the percentage of available GPU capacity that is actively generating revenue at any given t...
D
- Distributed Training
Distributed training is the practice of splitting a deep learning training workload across multiple GPUs — within a node...
E
- Edge Compute
Edge compute deploys GPU processing capacity at the network edge — closer to end users and data sources rather than in c...
G
- GPU-as-a-Service (GPUaaS)
GPU-as-a-Service is a cloud delivery model where GPU compute capacity is rented on-demand or via reservation, rather tha...
- GPU Memory (VRAM)
GPU memory (VRAM, specifically HBM — High Bandwidth Memory) is the on-chip memory that stores model parameters, activati...
H
- Hyperscaler
A hyperscaler is a cloud infrastructure provider operating at massive global scale — specifically AWS (Amazon), Azure (M...
I
- InfiniBand
InfiniBand is a high-bandwidth, low-latency networking technology developed by Mellanox (now NVIDIA Networking) that ser...
- Immersion Cooling
Immersion cooling is a thermal management technique where IT equipment is fully submerged in a thermally conductive but ...
- Inference Endpoint
An inference endpoint is a deployed model serving layer that accepts input data and returns predictions or generated con...
L
- Liquid Cooling
Liquid cooling encompasses any data centre thermal management approach that uses liquid — typically water or a dielectri...
N
- Neocloud
A neocloud is a GPU-focused cloud provider that emerged outside the hyperscaler ecosystem to serve AI and high-performan...
- NVLink
NVLink is NVIDIA's proprietary high-speed interconnect for GPU-to-GPU communication within a single node. Unlike InfiniB...
- Network Topology
Network topology describes the physical and logical arrangement of interconnections between nodes in a GPU cluster. The ...
- Network Bandwidth
Network bandwidth is the maximum data transfer rate of a network connection, measured in gigabits per second (Gb/s) or t...
- Network Latency
Network latency is the time delay for data to travel between two points in a network, measured in microseconds (µs) or m...
P
- Pipeline Parallelism
Pipeline parallelism distributes different layers of a neural network across multiple GPUs or nodes, with each stage pro...
- Power Density
Power density measures the electrical power consumed per unit of data centre floor space, typically expressed as kilowat...
- Power Usage Effectiveness (PUE)
PUE is the ratio of total facility energy to IT equipment energy, measuring how efficiently a data centre delivers power...
Q
- Quantisation
Quantisation reduces the numerical precision of model weights and activations — from 32-bit floating point (FP32) to 16-...
R
- Reserved Instances
Reserved instances are GPU compute resources purchased via a time-bound commitment — typically 1, 6, 12, or 36 months — ...
S
- Spot Instances
Spot instances are GPU compute resources offered at variable, discounted pricing with the caveat that the provider can r...
- Sovereign Compute
Sovereign compute refers to nationally controlled GPU and AI infrastructure operated within a country's borders, subject...
T
- Tensor Parallelism
Tensor parallelism is a distributed computing strategy that splits individual neural network layers across multiple GPUs...
- Training Cluster
A training cluster is a tightly coupled array of GPU nodes connected via high-bandwidth interconnects — typically Infini...
U
- Unit Economics
Unit economics in GPU infrastructure refers to the revenue, cost, and margin analysis at the per-GPU or per-MW level. Th...
W
- Water Usage Effectiveness (WUE)
WUE measures the litres of water consumed per kilowatt-hour of IT energy, quantifying a data centre's water footprint. T...
Get The GPU Weekly
GPU infrastructure intelligence - read by NVIDIA, CoreWeave, and Brookfield. Every Saturday. Free.
Each term includes definition, technical context, and relevance to GPU infrastructure decision-making. Glossary updated as technology and market terminology evolve.
Have a term to suggest?