Cloud GPU Pricing Comparison 2026

Cloud GPU Pricing Comparison

Why GPU Cloud Pricing Is So Confusing in 2026 

The GPU cloud market in 2026 is the most dynamic it has ever been, which means it’s also the most confusing.

Several overlapping forces are pulling prices in opposite directions simultaneously. H100 spot and marketplace rates have fallen 25–40% since Q1 2025 as H200 and B200 supply ramps up, yet several premium neocloud providers actually raised their published on-demand H100 rates in early 2026 (Lambda went from $2.99 to $3.99–$4.29; Verda went from $2.29 to $3.25). AWS cut its H100 pricing 44% in June 2025, which pressured the entire market – but its list price still starts at $6.88/GPU/hr, well above what neoclouds charge.

The result is a market where the same NVIDIA H100 80GB GPU can cost anywhere from $1.80 to $12.29 per hour depending on who you rent it from and under what terms. That’s a 6.8× price gap for identical hardware.

Understanding why that gap exists – and which price tier is appropriate for your workload – is the most valuable thing this guide delivers.

Three things that make GPU cloud pricing hard to compare:

Billing models differ. AWS bills per instance, not per GPU. An AWS p5.48xlarge has 8 H100s and costs $55.04/hr – divide by 8 and you get $6.88/GPU/hr. CoreWeave bills per GPU but only sells 8-GPU clusters for H100. Vast.ai prices are per GPU but fluctuate by the hour. You have to normalize to per-GPU/hr to compare honestly, which most comparison articles don’t do.

On-demand vs. reserved vs. spot prices are quoted interchangeably. A provider quoting a 60%-off reserved price while a competitor quotes on-demand makes the cheaper-looking option misleading. This guide specifies billing tier for every number.

“Included” vs. “optional” features change total cost. Some providers include 50 GB ephemeral storage and free egress. Others charge $0.08–$0.12/GB for egress that adds hundreds or thousands of dollars per month on data-intensive workloads. Headline GPU price alone doesn’t capture this.

The 4-Tier GPU Cloud Market, Explained 

The GPU cloud market has stratified into four distinct tiers, each with a fundamentally different business model and customer profile.

Tier 1: Hyperscalers (AWS, GCP, Azure)

The big three cloud providers offer GPU compute as one line item in a 200+ service catalog. Their GPU pricing is 4–7× higher than neoclouds for equivalent hardware, but they offer what neoclouds can’t: global presence in 30+ regions, an ecosystem of integrated services (databases, AI platforms, DevOps tools, identity management), enterprise SLAs backed by contractual commitments, and compliance certifications that regulated industries require.

Getting H100 instances on a hyperscaler also requires quota approval – requests, wait times of days to weeks, and justification of use case. For many teams, this friction is a dealbreaker.

Who belongs here: Organizations with large existing cloud estates, regulated industries (healthcare, finance, government) that require hyperscaler compliance certifications, teams where cross-service integration value outweighs the GPU cost premium.

Tier 2: Self-Service Neoclouds

This is where most AI teams will find the best combination of price, reliability, and capability. Neoclouds are cloud providers built specifically for GPU-as-a-Service: CoreWeave, Lambda Labs, Nebius, Hyperstack, Verda (formerly DataCrunch), Crusoe, and others.

Self-service H100 pricing at neoclouds spans roughly $1.80–$6.16/GPU/hr depending on provider and configuration. Most offer self-service provisioning in minutes with no quota approval. The best neoclouds offer SOC 2 Type II compliance, managed Kubernetes, Slurm cluster management, and InfiniBand networking for distributed training.

Key caveat from SaturnCloud’s June 2026 GPU Report: many neoclouds market high-speed storage (VAST Data, WEKA) and InfiniBand fabrics, but these are often available only in reserved or bespoke contracts – not self-service on-demand. CoreWeave is the clearest exception where enterprise storage is selectable in self-service provisioning. Always verify what’s actually available without a sales call.

Who belongs here: Most AI startups and enterprise teams who need reliable GPU access at meaningful scale without hyperscaler pricing.

Tier 3: Marketplace Platforms

RunPod, Vast.ai, and SF Compute aggregate GPU supply from third-party providers – data centers, mining operations, research institutions – and present it through a unified marketplace. Prices are the lowest available (H100 from $1.38–$2.30/hr) but reliability varies because you’re renting from individual hardware operators.

Who belongs here: Budget-constrained researchers, teams with fault-tolerant checkpoint-based training workloads, experienced ML engineers comfortable managing infrastructure variability.
Also Read: Top GPU Marketplace and AI Platforms

Tier 4: Serverless GPU Platforms

Modal and similar serverless platforms bill per second of GPU execution time, with no minimum. For bursty inference workloads or short iterative jobs, this eliminates idle compute costs entirely. Not appropriate for continuous long-running training.

Who belongs here: Teams building inference APIs with variable traffic, batch processing pipelines, anyone who pays for idle GPU time with always-on providers.


Master Pricing Table: H100 80GB SXM Across Every Major Provider


All prices are per GPU per hour, on-demand unless otherwise noted. Verified June 2026.

Provider Tier H100 Rate Min GPUs Billing Free Egress Notes
Vast.ai Marketplace $1.38–$1.87 1 Hourly P2P marketplace; prices vary by host
SF Compute Marketplace from $1.82 1 Hourly Fluctuates with supply
Hyperstack Neocloud ~$1.60 1 Hourly Competitive, growing inventory
Modal Serverless ~$1.50 eff. Serverless Per-second Minimal No idle costs; best for inference
RunPod Community Marketplace $1.99 1 Hourly Variable reliability by host
RunPod Secure Neocloud $2.39 1 Hourly SLA-backed, more reliable
Nebius Neocloud $2.95 1 Hourly EU data centers; stable pricing
Lambda Labs Neocloud $2.49 (PCIe) / $3.29–$4.29 (SXM) 1 Hourly Free egress; raised SXM rates in 2026
Verda (DataCrunch) Neocloud $3.25 (up from $2.29) 1 Hourly EU-based; SOC 2 Type II
CoreWeave Neocloud $6.16/GPU (8× cluster) 8 Hourly Reserved: ~$2.47/GPU (60% off)
DigitalOcean Neocloud $1.99 1 Hourly Partial Simple setup; limited GPU selection
AWS P5 Hyperscaler $6.88 8 (instance) Hourly 44% price cut June 2025; quota required
GCP A3 Hyperscaler $10.98 8 (instance) Hourly Highest-cost hyperscaler; quota required
Azure ND H100 v5 Hyperscaler $12.29 8 (instance) Hourly List price; Spot 60–80% off


Key insight:
Self-service H100 pricing now spans roughly $1.80–6.16/hr depending on provider, form factor, and commitment, compared to $6.88/hr on AWS, $10.98/hr on GCP, and $12.29/hr on Azure. That means at list price, Azure H100 is 6.8× more expensive than Vast.ai’s marketplace rate for the same GPU.

4. Full GPU Pricing by Provider

 

Hyperscalers

Amazon Web Services (AWS)

H100 SXM instance: p5.48xlarge – 8× H100 80GB SXM, $55.04/hr ($6.88/GPU)
Quota process: Support ticket required; approval takes days to weeks
Spot pricing: Available; can reduce costs 60–70% for interruptible workloads
Reserved: 1-year Savings Plans reduce cost ~30–35%
Free egress: No – $0.087/GB outbound

AWS cut its H100 pricing by approximately 44% in June 2025, making it the most affordable hyperscaler GPU option. It remains roughly 4× more expensive than comparable neocloud on-demand rates, but for teams deeply embedded in the AWS ecosystem – with large S3 data lakes, SageMaker pipelines, EKS clusters – the cross-service integration value is real.

AWS’s GPU quota system is the most significant friction point. Unlike neoclouds where you provision in minutes, AWS H100 access typically requires a support ticket explaining your use case. Approvals can take days.

Full AWS H100 GPU pricing:

Instance GPUs GPU VRAM vCPUs RAM On-Demand/hr Per GPU/hr
p5.48xlarge 8× H100 SXM 640 GB 192 2,048 GB $55.04 $6.88
p5e.48xlarge 8× H100 SXM 640 GB 192 2,048 GB $61.12 $7.64

Best for: AWS-native enterprises with large existing infrastructure investment; SageMaker users; teams receiving AWS startup credits.

Google Cloud Platform (GCP)

H100 SXM instance: a3-highgpu-8g – 8× H100 80GB SXM, $87.84/hr ($10.98/GPU)
Quota process: Similar to AWS; region availability limited
Spot pricing: Available; ~60–70% discount
Free egress: No – standard GCP egress rates apply

GCP is the most expensive major hyperscaler for H100 GPUs at list price. Its differentiation is the Vertex AI platform and TPU ecosystem – for teams building on Google’s AI-native tooling, A3 GPU VMs enable hybrid GPU/TPU architectures that have no equivalent on other clouds.

GCP’s Jupiter network fabric provides high-bandwidth interconnect between A3 instances competitive with neocloud InfiniBand for many distributed training workloads.

Full GCP H100 GPU pricing:

Instance GPUs GPU VRAM vCPUs RAM On-Demand/hr Per GPU/hr
a3-highgpu-8g 8× H100 SXM 640 GB 208 1,872 GB $87.84 $10.98
a3-megagpu-8g 8× H100 SXM 640 GB 208 1,872 GB $112.01 $14.00

Best for: GCP-native teams, Vertex AI users, organizations building TPU + GPU hybrid training pipelines, BigQuery ML teams extending to GPU compute.

Microsoft Azure

H100 SXM instance: ND96isr H100 v5 – 8× H100 SXM, $98.32/hr ($12.29/GPU) at list price
NC H100 NVL v5: 1–2× H100 NVL (PCIe), unique to Azure
Spot pricing: Azure Spot VMs – 60–80% discount
Reserved: Up to 65% off with 3-year commitment
Confidential GPU: NCC H100 v5 – exclusive to Azure; TEE-protected GPU compute
Free egress: No

Azure’s list price is the highest of any major cloud, but this is rarely what enterprise customers pay. Azure Spot VMs at 60–80% off bring the effective rate to $2.46–$4.92/GPU/hr, which is competitive with premium neoclouds. Large enterprises on Azure Enterprise Agreements receive further negotiated discounts.

Azure’s unique position is the confidential computing tier (NCC H100 v5) and the H100 NVL configuration (exclusive in cloud). No other major provider offers TEE-protected GPU compute – making Azure the only choice for regulated industries that need GPU acceleration without exposing data in memory.

Azure H100 GPU pricing (key VM series):

VM Series GPUs GPU Type GPU VRAM On-Demand/GPU/hr Spot/GPU/hr
NC40ads H100 v5 1× H100 NVL PCIe 94 GB ~$3.29 ~$0.66–$1.32
NC80adis H100 v5 2× H100 NVL PCIe 188 GB ~$3.29 ~$0.66–$1.32
ND96isr H100 v5 8× H100 SXM SXM 640 GB $12.29 $2.46–$4.92
NCC H100 v5 H100 Tensor Core 80 GB Contact sales

Best for: Azure-ecosystem enterprises, healthcare/finance/government needing confidential GPU, teams requiring HIPAA or FedRAMP compliance, large-scale distributed training on Azure.

Neoclouds:

CoreWeave

H100 SXM pricing: $6.16/GPU/hr on-demand (8-GPU cluster = $49.24/hr)
Reserved discount: Up to 60% off (effective ~$2.47/GPU/hr)
Spot pricing: Available
Min GPUs: 8 (most configurations)
Free egress: No
Compliance: SOC 2 Type II; managed Kubernetes; Slurm

CoreWeave is the enterprise neocloud of record: its customers include OpenAI, Mistral AI, and Jane Street. Its architecture – Kubernetes-native from the start, InfiniBand optional at 400 Gb/s per GPU, NVLink 4.0, GPU Direct RDMA – is built for exactly the scale of workloads those customers run.

GPU portfolio (2026):

GPU Config Total VRAM On-Demand/GPU/hr Total/hr
H100 SXM NVL 640 GB $6.16 $49.24
H200 NVL 1.1 TB $6.30 $50.44
A100 NVL 640 GB $2.70 $21.60
B200 NVL 1.4 TB $8.60 $68.80
GB200 NVL72 744 GB $10.50 $42.00
L40S 384 GB $2.25 $18.00
L40 384 GB $1.25 $10.00
GH200 96 GB $6.50 $6.50
RTX Pro 6000 768 GB $2.50 $20.00

Best for: Enterprise multi-node training (8+ GPUs), teams needing InfiniBand interconnect, organizations with Kubernetes expertise, long-running committed workloads on reserved pricing.

Lambda Labs

H100 PCIe pricing: $2.49/hr (1× GPU)
H100 SXM pricing: $3.29–$4.29/hr (rates raised in early 2026)
8× H100 SXM: ~$23.92/hr ($2.99/GPU)
Free egress: Yes – unlimited
Min GPUs: 1
Billing: Hourly

Lambda Labs’ free egress policy remains its most distinctive feature in 2026. Note that several self-service neoclouds raised published on-demand H100 rates in early 2026 (Lambda $2.99 to $3.99–$4.29) – making Lambda less of the budget option it once was, while its egress advantage grows more valuable relative to competitors who charge $0.08–$0.12/GB.

Lambda Labs GPU pricing (2026):

GPU Config VRAM vCPUs RAM Price/GPU/hr
H100 PCIe 80 GB 26 225 GB $2.49
H100 SXM 80 GB varies varies $3.29–$4.29
H100 SXM 640 GB varies varies $2.99
B200 SXM 180 GB 26 360 GB $4.99–$6.99
A100 40GB 40 GB 30 220 GB $1.29
A100 80GB SXM 640 GB varies varies $1.10
RTX A6000 48 GB 14 100 GB $0.80
GH200 141 GB $1.99

Best for: Teams that move large data volumes (free egress matters), researchers needing single-GPU access, startups wanting simplicity without Kubernetes expertise, teams not ready to commit to reserved instances.

Nebius AI Cloud

H100 SXM pricing: ~$2.95/hr (held steady in 2026 while others raised rates)
Free egress: Yes (standard tier)
Compliance: SOC 2 Type II
Data centers: Finland, Netherlands, France (EU), plus US
Min GPUs: 1

Nebius (spun out of Yandex) is the strongest EU-focused GPU cloud option. Its H100 rate of $2.95/hr has stayed stable while several competitors raised rates – and its EU data center presence in Finland, Netherlands, and France addresses GDPR and data residency requirements that most US-based neoclouds can’t meet. Nebius offers managed Kubernetes and Slurm, placing it in the top tier of neocloud enterprise maturity.

Nebius GPU pricing:

GPU VRAM Price/GPU/hr
H100 SXM 80 GB $2.95
A100 SXM 80 GB $1.65
H200 SXM 141 GB ~$4.50

Best for: EU teams with GDPR compliance requirements, organizations needing EU data residency, teams wanting mature neocloud infrastructure (SOC 2, Kubernetes, Slurm) at sub-$3/hr H100 pricing.

Hyperstack

H100 SXM pricing: ~$1.60/hr
A100 pricing: from $1.35/hr
Free egress: No
Min GPUs: 1
Billing: Hourly, no minimum commitment

Hyperstack occupies the sweet spot between marketplace unreliability and premium neocloud pricing. At ~$1.60/hr for H100 SXM with managed infrastructure (no marketplace variability), it offers some of the best price-performance available in the self-service neocloud tier. Growing EU and UK data center presence makes it increasingly relevant for European teams.

Best for: Startups and scale-ups wanting managed GPU infrastructure below Lambda pricing, EU/UK teams, teams needing H100 or A100 without minimum commitment.

Verda (formerly DataCrunch)

H100 SXM pricing: $3.25/hr (raised from $2.29 in early 2026)
B300 SXM pricing: Available (contact sales)
Free egress: Yes
Compliance: SOC 2 Type II
Data centers: EU-focused (Finland, expanding)
Services: Instances, clusters, serverless containers, managed inference endpoints

Verda is a notable EU-based neocloud with a broad service portfolio that goes beyond raw GPU rental: managed inference endpoints, serverless containers, and instant InfiniBand clusters. Its SOC 2 Type II certification and EU infrastructure make it competitive with Nebius for European enterprise teams. The rate increase in 2026 ($2.29 → $3.25) is worth noting for teams budgeting based on older pricing.

Verda GPU portfolio (2026):

GPU VRAM Notes
H100 SXM 80 GB $3.25/hr
H200 SXM 141 GB Available
B200 SXM 180–192 GB Available
B300 SXM 262 GB Available
GB300 NVL72 New Available
A100 SXM 80 GB Available
RTX Pro 6000 96 GB Available

Best for: EU teams needing managed inference alongside training infrastructure, organizations wanting a single EU provider for the full AI stack (training → serving).

Crusoe

H100 pricing: Available; contact sales for specific rates
Compliance: SOC 2 Type II; FedRAMP In Process
Infrastructure: InfiniBand clusters; managed Kubernetes; Slurm
Egress: Free

Crusoe differentiates on two dimensions: sustainability (it runs on otherwise-wasted gas flare energy) and government/defense suitability (FedRAMP In Process certification). It’s one of the few neoclouds working toward US federal compliance, making it relevant for government contractors and regulated industries that want neocloud pricing without hyperscaler lock-in.

Best for: Government agencies, federal contractors, sustainability-focused enterprises, teams needing FedRAMP-compatible infrastructure at neocloud pricing.

Marketplace Platforms

RunPod

H100 community cloud: ~$1.99/hr
H100 secure cloud: ~$2.39/hr
Spot (community): from ~$1.25/hr
RTX 4090: from $0.34/hr
Free egress: No
Billing: Hourly (on-demand); per-request (serverless)

RunPod’s dual-tier model gives teams meaningful choice. The community cloud (GPUs from independent operators) delivers the lowest managed-ish H100 rates in the market at $1.99/hr. The secure cloud (RunPod-managed infrastructure) adds reliability guarantees at $2.39/hr. Serverless GPU functions with sub-3-second cold starts make RunPod viable for inference workloads without always-on GPU costs.

RunPod GPU pricing:

GPU Config VRAM Community $/hr Secure $/hr
RTX 4090 24 GB $0.34 $0.74
RTX 3090 24 GB $0.22
A100 SXM 80 GB $1.64 $2.21
H100 PCIe 80 GB $1.99 $2.39
H100 SXM 80 GB $1.99 $2.39

Best for: Budget-conscious startups, researchers running checkpoint-based training, serverless inference workloads, teams wanting the widest GPU selection including consumer cards.

Vast.ai

H100 pricing: from $1.38/hr (marketplace low) to ~$2.30/hr (typical)
A100 spot: from $0.29/hr
RTX 4090: from $0.20/hr
Free egress: No
Reliability: Variable – host-dependent

Vast.ai is a pure P2P GPU marketplace. Vast.ai’s peer-to-peer marketplace offers the absolute lowest rates (H100 from $1.87/hr), though with variable reliability. Great for research and checkpointed training workloads. The operational model requires that your training pipeline is genuinely fault-tolerant – hosts can take machines offline with limited warning.

Vast.ai GPU pricing (approximate market ranges, June 2026):

GPU VRAM Typical range Market low
H100 SXM 80 GB $1.87–$2.30/hr $1.38/hr
A100 80GB 80 GB $0.90–$1.50/hr $0.29/hr (spot)
RTX 4090 24 GB $0.20–$0.44/hr $0.20/hr
H200 SXM 141 GB $3.00–$4.50/hr varies

Best for: Research with checkpoint-based training, budget experimentation, teams comfortable with operational variability, experienced ML engineers.


GPU-by-GPU Pricing: A100, H100, H200, B200 

NVIDIA A100 80GB – The Established Workhorse

Still the most cost-effective GPU for many mid-scale training and inference workloads. A100 availability is high and prices have stabilized.

Provider Config Price/GPU/hr
Vast.ai 1× (spot) from $0.29
RunPod community $1.64
Lambda Labs 1× 40GB SXM $1.29
Lambda Labs 8× 80GB SXM $1.10
Nebius 1× SXM $1.65
Hyperstack from $1.35
CoreWeave 8× NVL $2.70
AWS (P4d) 8× (instance) $4.10

Verdict: A100 remains the best cost-per-TFLOP option for workloads that fit in 80 GB VRAM. For 7B–30B model training and most inference serving, A100 at $1.10–$1.65/hr from neoclouds is hard to beat.

NVIDIA H100 80GB – The Current Standard for AI Training

The most widely deployed GPU for frontier model training and high-throughput inference in 2026. Two configurations matter: PCIe (lower bandwidth, single-GPU friendly) and SXM (higher bandwidth, optimized for multi-GPU clusters).

Provider Config Price/GPU/hr
Vast.ai 1× SXM (marketplace) from $1.38
Hyperstack 1× SXM ~$1.60
RunPod community $1.99
DigitalOcean $1.99
Lambda Labs 1× PCIe $2.49
RunPod secure $2.39
Nebius 1× SXM $2.95
Lambda Labs 1× SXM $3.29–$4.29
Verda 1× SXM $3.25
CoreWeave 8× NVL (÷8) $6.16
AWS P5 8× SXM (÷8) $6.88
GCP A3 8× SXM (÷8) $10.98
Azure ND H100 v5 8× SXM (÷8) $12.29

Verdict: For single-GPU H100 access at managed infrastructure quality, RunPod secure ($2.39) or DigitalOcean ($1.99) offers strong value. For multi-GPU cluster training, CoreWeave reserved pricing (~$2.47/GPU after 60% discount) and Lambda’s 8× configuration ($2.99) are competitive. Hyperscalers remain 4–7× more expensive at list price.

NVIDIA H200 141GB – High-Memory Training and Inference

H200 (Hopper + HBM3e, 141 GB VRAM) is the right GPU when your model exceeds H100 VRAM capacity – 70B+ parameter inference without quantization, multimodal models with large context windows, or training configurations that exceed 80 GB.

Provider Config Price/GPU/hr
CoreWeave 8× NVL (÷8) $6.30
Nebius 1× SXM ~$4.50
Lambda Labs ~$6.99
Azure ND H200 v5 Available Contact sales
GCP A3 Mega 8× (÷8) ~$14.00

Verdict: H200 pricing remains significantly higher than H100, and availability outside top-tier providers is limited. For most workloads that “need more VRAM,” quantized H100 inference or model parallelism across H100 GPUs is usually more cost-effective than moving to H200.

NVIDIA B200 192GB – Blackwell Generation, High Performance

The B200 (Blackwell architecture, 192 GB HBM3e, dramatically higher FP8 throughput) is now self-service on-demand at a handful of providers in 2026. GB200/GB300 remains largely reserved or contact-sales.

Provider Config Price/GPU/hr
Lambda Labs 1× SXM $4.99–$6.99
CoreWeave 8× NVL (÷8) $8.60
Verda 1× SXM Available
Hyperstack Expanding Contact

Verdict: B200 delivers meaningfully better FP8 inference throughput than H100, making it relevant for high-throughput inference serving. For most training workloads where throughput is not the bottleneck, H100 at 3–4× lower cost is more economical. Evaluate B200 specifically when inference throughput per dollar is your optimization target.

NVIDIA RTX 4090 24GB – Consumer GPU for Smaller Workloads

The RTX 4090 is a consumer GPU that punches far above its price class for inference and fine-tuning workloads that fit in 24 GB VRAM. At $0.20–$0.74/hr, it’s the most cost-effective option for LoRA fine-tuning on 7B models, small-scale inference, and developer testing.

Provider Price/GPU/hr Notes
Vast.ai from $0.20/hr Marketplace; variable
RunPod community $0.34/hr Variable reliability
RunPod secure $0.74/hr SLA-backed

Verdict: For workloads under 24 GB VRAM, RTX 4090 at $0.34–$0.74/hr is 4–8× cheaper than H100 with similar practical throughput for inference. Don’t rent an H100 for work that fits on a 4090.

The Hidden Costs Nobody Talks About 

Headline GPU $/hr is only one component of your actual GPU cloud bill. These four factors routinely change the ranking when teams do honest total cost comparisons.

Egress Costs

Most neoclouds charge zero egress, unlike hyperscalers ($0.087–0.12/GB). Lambda Labs, Nebius, Verda, SF Compute, and Crusoe all offer free egress. CoreWeave, RunPod, Vast.ai, and hyperscalers charge per GB.

The real impact: A team downloading 5 TB of model checkpoints per month from a provider that charges $0.10/GB pays ~$512/month in egress alone – $6,144/year. On Lambda or Nebius, that same data transfer is free. For data-intensive workflows, free egress can easily offset a $0.50–$1.00/hr premium in GPU pricing.

Billing Granularity

Most providers bill hourly, rounding up. If your training run finishes in 23 minutes, you pay for 60. If you run 10 preprocessing jobs at 8 minutes each, you pay for 10 full hours. For teams running many short iterative jobs daily, this matters:

  • Hourly billing: Round up to next 60 minutes (most providers)
  • Per-minute billing: TensorDock, some RunPod configurations
  • Per-second billing: Modal – eliminates idle cost entirely for serverless workloads

Storage Costs

Provider Persistent Storage Notes
Lambda Labs $0.20/GB/mo Block storage
CoreWeave $0.08/GB/mo S3-compatible object storage
Nebius Included tiers Varies by plan
AWS $0.08/GB/mo (S3) Plus request fees
GCP $0.02–$0.04/GB/mo Coldline vs standard
Vast.ai Separate from GPU Host-provided

For a 10 TB training dataset stored long-term, storage cost differences range from $200 to $2,000/month depending on provider – sometimes exceeding the GPU compute cost for iteration-heavy workflows where the dataset is stable.

Reserved vs On-Demand Gap

The difference between on-demand and committed reserved pricing at the same provider is often larger than the difference between providers:

  • CoreWeave on-demand H100: $6.16/GPU/hr
  • CoreWeave reserved H100: ~$2.47/GPU/hr (60% off)
  • Lambda on-demand H100 SXM: $3.29–$4.29/GPU/hr
  • Lambda on-demand vs CoreWeave reserved: Lambda is more expensive on-demand than CoreWeave on reserved

Before switching providers to save money, check whether reserved pricing at your current provider would be cheaper.

Real Training Cost Scenarios



Scenario A: Fine-Tuning Llama 3.1 8B (LoRA, 1× A100, 40 Hours)

A typical research fine-tuning run for a 7B–8B parameter model.

Provider Rate 40hr Cost Notes
Vast.ai $0.90/hr $36 Variable reliability
RunPod community $1.64/hr $65.60 Acceptable reliability
Lambda Labs $1.29/hr (A100 40GB) $51.60 Free egress
Nebius $1.65/hr $66 EU data centers
AWS P4d $4.10/hr $164 4× neocloud cost

Winner: Vast.ai or Lambda Labs A100, depending on reliability tolerance.

Scenario B: Training a 70B Model (8× H100 SXM, 2 Weeks Continuous)

A serious foundation model training run.

Provider Config Rate 2-week Cost Notes
Vast.ai 8× H100 $1.87/GPU avg $4,492 Checkpoint required
RunPod secure 8× H100 $2.39/GPU $6,411 SLA-backed
Lambda Labs 8× H100 $2.99/GPU $8,027 Free egress
CoreWeave (on-demand) 8× H100 $6.16/GPU $16,545 Managed orchestration
CoreWeave (reserved) 8× H100 ~$2.47/GPU $6,629 Cheaper than Lambda on-demand
AWS P5 8× H100 $6.88/GPU $18,482 AWS ecosystem

Winner on cost: Vast.ai. Winner on reliability + cost: RunPod secure cloud or CoreWeave reserved.

Scenario C: Production Inference Serving (1× H100, Always-On, 6 Months)

An inference API serving a fintech application, always on.

Provider Rate 6-month cost Egress (1TB/mo) True 6-month total
Modal (serverless) ~$1.50/hr eff. ~$6,480 Minimal ~$6,700
RunPod secure $2.39/hr $10,450 ~$600 $11,050
Lambda Labs $2.49/hr $10,888 Free $10,888
Nebius $2.95/hr $12,882 Free $12,882
Azure Spot ~$2.46/hr $10,760 ~$600 $11,360
AWS P5 Spot ~$2.75/hr $12,018 ~$600 $12,618

Winner: Modal (serverless) for workloads with any quiet periods – it eliminates idle GPU costs. Lambda Labs for steady-state inference where free egress adds value.

Scenario D: Hyperparameter Search (100 short GPU jobs, 15 min each, 1× A100)

The billing granularity scenario – 100 jobs × 15 minutes = 25 actual GPU hours.

Provider Billing You Pay For Effective rate 25hr equivalent cost
Modal Per-second 25 GPU hours $1.50/hr $37.50
RunPod community Hourly 100 hours (rounded up) $1.64/hr $164
Lambda Labs Hourly 100 hours $1.29/hr $129

Winner: Modal – 4× cheaper than any hourly provider for short iterative jobs.


Buy vs. Rent: When Does Owning GPUs Make Sense? 

This is a question Cantech’s clients ask regularly, and the answer is more nuanced than most “rent vs. buy” articles suggest.

The break-even math:

An NVIDIA H100 SXM server (8 GPUs) typically costs $250,000–$350,000 new in 2026 (prices have come down significantly from 2023–2024 peaks). At RunPod secure cloud H100 rates of $2.39/GPU/hr:

  • 8 GPUs × $2.39/hr × 8,760 hrs/year = $167,575/year for 24/7 operation
  • Break-even on a $300,000 server: ~1.8 years at full utilization

For organizations running GPUs at near-100% utilization for 2+ years, purchasing can be economical. But the true total cost of ownership for owned hardware includes:

  • Data center colocation or build-out cost
  • Power and cooling ($0.10–$0.15/kWh × 10+ kW per server = $8,000–$13,000/year)
  • Hardware maintenance and support contracts
  • Engineering time for cluster management, firmware updates, driver maintenance
  • Opportunity cost of capital

Most teams that buy GPUs discover their utilization is 50–70%, not 100%, which pushes the break-even out to 3+ years – at which point the hardware is approaching end-of-support and newer GPU generations are available.

General rule: Rent unless you have consistent, predictable >80% GPU utilization for 24+ months with the operational staff to manage on-premise infrastructure. Very few AI teams meet all three criteria.

How to Choose Based on Your Workload 

“I’m a solo researcher or student”
→ Vast.ai (cheapest, if checkpoint-based) or RunPod community cloud. RTX 4090 instances at $0.34/hr if your model fits in 24 GB VRAM.

“I’m a startup fine-tuning 7B–30B models”
→ RunPod secure cloud ($2.39/hr H100) or Lambda Labs (free egress valuable if you download checkpoints often). DigitalOcean if you want the simplest possible setup.

“I’m doing large-scale 70B+ model training on 8+ GPUs”
→ CoreWeave reserved (most cost-effective after 60% discount if you can commit) or Lambda Labs 8× cluster ($2.99/GPU). Verify InfiniBand availability – it materially speeds up distributed training.

“I’m building a production inference API”
→ Modal (serverless, no idle costs) for variable traffic. Lambda Labs or Nebius (free egress) for steady-state traffic at predictable volume.

“My company has GDPR / EU data residency requirements”
→ Nebius (SOC 2, EU data centers) or Verda (SOC 2, EU-first). Both offer free egress.

“My company is in healthcare, finance, or government”
→ Azure NCC H100 v5 for confidential GPU computing (only option). Crusoe for FedRAMP-track compliance. AWS or Azure for existing compliance frameworks.

“I’m already deep in AWS / GCP / Azure”
→ Use the native GPU offering. Cross-cloud egress costs and operational overhead often exceed the per-GPU savings from switching for established teams.

“I need the newest GPUs (B200, GB200, B300)”
→ CoreWeave (best availability), Verda, or Hyperstack. Lambda is catching up but CoreWeave is clearest for newest Blackwell generation self-service access.

“I want to minimize total spend across training + inference + storage”
→ Multi-provider strategy: Vast.ai/RunPod for hyperparameter search, CoreWeave reserved or Lambda for committed training, Modal for inference. Cantech can design and manage this architecture.

How Cantech Helps You Optimize GPU Cloud Spend 

Most teams overspend on GPU cloud in predictable ways: they use the same provider for all workload types, they rent on-demand when reserved pricing is available, they ignore egress costs until they see the bill, and they don’t architect training jobs to take advantage of spot pricing.

At Cantech, we specialize in GPU cloud cost architecture – the work that happens before you provision a single GPU.

What We Offer

GPU Cloud Cost Assessment
We audit your current GPU cloud usage – provider, instance types, job durations, egress volume, utilization rates, and reserved vs on-demand mix – and produce a total cost of ownership analysis with projected savings under alternative configurations. Most teams are surprised by how much the egress and billing granularity picture changes their optimal provider choice.

Multi-Provider Architecture Design
We design GPU compute strategies that use each provider where it wins: Vast.ai or RunPod for hyperparameter search and early-stage experimentation, CoreWeave reserved or Lambda for committed training runs, Modal for production inference endpoints. This approach consistently delivers 35–55% reduction in total GPU cloud spend vs. single-provider strategies.

Reserved Instance Strategy
CoreWeave’s 60% reserved discount and Azure’s 65% Reserved Instance pricing are significant – but committing incorrectly can strand budget in the wrong GPU type or region. We analyze your compute utilization patterns and build a reservation strategy that maximizes discount without over-committing.

Migration and Containerization
Moving between GPU cloud providers is easier when your training environment is fully containerized. We build Docker images that run identically on any provider, eliminating the environment lock-in that makes migrations slow.

EU / GDPR-Compliant GPU Infrastructure
For European teams and regulated industries, we design GPU architectures using Nebius, Verda, or Azure EU regions that meet data residency requirements without sacrificing performance or paying hyperscaler premiums.

Confidential AI Deployment
For healthcare, finance, and government clients needing TEE-protected GPU compute, we specialize in deploying workloads on Azure’s NCC H100 v5 VMs – the only major cloud option for confidential GPU computing.

Results We Deliver

  • Average 43% GPU cloud cost reduction within 90 days
  • Multi-provider pipelines that eliminate single-provider availability risk
  • Reserved instance strategies that reduce committed spend 40–60%
  • Full MLOps stack: experiment tracking, checkpoint management, cost monitoring

Spending more than $5,000/month on GPU cloud? The ROI on a Cantech GPU cost assessment is typically 10:1 or better within 6 months.

 

Frequently Asked Questions 

What is the cheapest cloud GPU in 2026?

The cheapest H100 GPU access is on Vast.ai’s marketplace, with instances available from $1.38/hr – though this is a P2P marketplace with variable reliability. For managed infrastructure, Hyperstack (~$1.60/hr) and RunPod community cloud ($1.99/hr) offer the lowest prices. For non-H100 workloads, RTX 4090 instances on Vast.ai start from $0.20/hr.

Why is AWS so much more expensive than neoclouds for H100?

AWS remains the cheapest hyperscaler per GPU at $6.88/hr, still roughly 4× the cheapest self-service marketplace rate. GCP and Azure are 6–7× more expensive. The premium pays for: global presence in 30+ regions, an ecosystem of 200+ integrated services, enterprise SLAs with contractual guarantees, regulatory compliance frameworks (HIPAA, FedRAMP, SOC 2), and self-service quota without waitlists – things neoclouds can’t fully replicate.

Is H100 PCIe or H100 SXM better for my workload?

H100 SXM has higher memory bandwidth (3.35 TB/s vs 2.0 TB/s for PCIe) and is the better choice for training-dominated workloads where GPU-to-GPU communication speed matters. H100 PCIe is adequate for single-GPU inference and training where memory bandwidth isn’t the bottleneck, and is typically cheaper ($2.49/hr on Lambda vs $3.29+/hr for SXM). For distributed training across multiple H100 GPUs, always use SXM.

Do all GPU clouds offer free egress?

No. Free egress remains standard at most neoclouds, unlike hyperscalers ($0.087–0.12/GB). Lambda Labs, Nebius, Verda, SF Compute, and Crusoe offer free egress. CoreWeave, RunPod, Vast.ai, and all three hyperscalers charge for outbound data. For teams regularly transferring large datasets or checkpoints, this difference can add $500–$2,000+/month to effective GPU cloud cost.

How much does it cost to train a 70B model?

Using Lambda Labs’ 8× H100 SXM cluster at $2.99/GPU/hr for a 2-week continuous training run: ~$8,027. On CoreWeave reserved pricing: ~$6,629. On AWS P5 on-demand: ~$18,482. These figures don’t include storage, egress, or engineering time – all of which vary by provider and architecture.

What GPU cloud is best for inference serving?

For inference serving with variable traffic, Modal (per-second serverless billing) eliminates idle GPU costs – typically 3–4× cheaper than always-on providers for workloads with any quiet periods. For steady-state high-throughput inference, Lambda Labs (free egress), RunPod secure cloud, or Azure NC H100 v5 (for Azure-native applications) are strong options.

Should I use spot instances for GPU training?

For training workloads that use checkpoint-based fault tolerance – saving model state every N steps so the job can resume if interrupted – spot instances are strongly recommended. Savings are 60–80% on hyperscalers, and 30–50% on neoclouds that offer spot pricing (CoreWeave, RunPod community). For production inference serving, don’t use spot – interruptions affect end users.

Which GPU cloud providers offer SOC 2 Type II compliance?

As of June 2026, providers with verified SOC 2 Type II include: Nebius, CoreWeave, Crusoe, and Verda. AWS, GCP, and Azure all have SOC 2 Type II and broader compliance frameworks. For HIPAA or FedRAMP requirements, hyperscalers are the established path; Crusoe is pursuing FedRAMP authorization.

4-Tier GPU Cloud Market Pricing

Cloud GPU Pricing Comparison

Cloud GPU Pricing Comparison 2026

GPU cloud market in 2026

About the Author
Posted by Dharmesh Gohel

Dharmesh is a digital marketing and SEO specialist with 3+ years of experience in the web hosting and cloud infrastructure industry. He specializes in technical SEO, keyword research, analytics, and content creation related to VPS hosting, dedicated servers, cloud infrastructure, and server management.

Drive Growth and Success with Our VPS Server Starting at just ₹ 659/Mo