GPU-as-a-Service (GPUaaS)
GPU-as-a-Service is a cloud delivery model where GPU compute capacity is rented on-demand or via reservation, rather than purchased outright. Customers access GPU instances through APIs, web consoles, or orchestration platforms without managing underlying hardware. GPUaaS providers handle procurement, rack deployment, cooling, networking, and maintenance. Pricing is typically per-GPU-hour, with discounts for longer commitments — ranging from 25% for 12-month terms to 55%+ for 36-month contracts across major providers.
GPUaaS delivery spans the spectrum from fully managed Kubernetes clusters to bare-metal SSH access. Key differentiators include interconnect topology (fat-tree InfiniBand vs rail-optimised Ethernet), storage architecture (local NVMe vs shared parallel file systems), and scheduling capabilities (multi-tenant vs single-tenant isolation). The shift toward inference workloads is driving demand for fractional GPU access and serverless endpoints alongside traditional full-node reservations.
GPUaaS pricing and contract terms are central to our advisory work. We benchmark management business plans against live pricing data from 79 providers covering 12,800+ data points. Understanding the real economics of GPUaaS — not the aspirational projections in pitch decks — is what separates informed investment decisions from uninformed ones.
This glossary is maintained by Disintermediate as a reference for GPU infrastructure professionals, investors, and operators. Each entry reflects terminology as used in active advisory engagements and market intelligence work.