Why GPU Cloud Pricing Is So Confusing in 2026
The GPU cloud market in 2026 is the most dynamic it has ever been, which means it’s also the most confusing.
Several overlapping forces are pulling prices in opposite directions simultaneously. H100 spot and marketplace rates have fallen 25–40% since Q1 2025 as H200 and B200 supply ramps up, yet several premium neocloud providers actually raised their published on-demand H100 rates in early 2026 (Lambda went from $2.99 to $3.99–$4.29; Verda went from $2.29 to $3.25). AWS cut its H100 pricing 44% in June 2025, which pressured the entire market – but its list price still starts at $6.88/GPU/hr, well above what neoclouds charge.
The result is a market where the same NVIDIA H100 80GB GPU can cost anywhere from $1.80 to $12.29 per hour depending on who you rent it from and under what terms. That’s a 6.8× price gap for identical hardware.
Understanding why that gap exists – and which price tier is appropriate for your workload – is the most valuable thing this guide delivers.
Three things that make GPU cloud pricing hard to compare:
Billing models differ. AWS bills per instance, not per GPU. An AWS p5.48xlarge has 8 H100s and costs $55.04/hr – divide by 8 and you get $6.88/GPU/hr. CoreWeave bills per GPU but only sells 8-GPU clusters for H100. Vast.ai prices are per GPU but fluctuate by the hour. You have to normalize to per-GPU/hr to compare honestly, which most comparison articles don’t do.
On-demand vs. reserved vs. spot prices are quoted interchangeably. A provider quoting a 60%-off reserved price while a competitor quotes on-demand makes the cheaper-looking option misleading. This guide specifies billing tier for every number.
“Included” vs. “optional” features change total cost. Some providers include 50 GB ephemeral storage and free egress. Others charge $0.08–$0.12/GB for egress that adds hundreds or thousands of dollars per month on data-intensive workloads. Headline GPU price alone doesn’t capture this.
The 4-Tier GPU Cloud Market, Explained
The GPU cloud market has stratified into four distinct tiers, each with a fundamentally different business model and customer profile.
Tier 1: Hyperscalers (AWS, GCP, Azure)
The big three cloud providers offer GPU compute as one line item in a 200+ service catalog. Their GPU pricing is 4–7× higher than neoclouds for equivalent hardware, but they offer what neoclouds can’t: global presence in 30+ regions, an ecosystem of integrated services (databases, AI platforms, DevOps tools, identity management), enterprise SLAs backed by contractual commitments, and compliance certifications that regulated industries require.
Getting H100 instances on a hyperscaler also requires quota approval – requests, wait times of days to weeks, and justification of use case. For many teams, this friction is a dealbreaker.
Who belongs here: Organizations with large existing cloud estates, regulated industries (healthcare, finance, government) that require hyperscaler compliance certifications, teams where cross-service integration value outweighs the GPU cost premium.
Tier 2: Self-Service Neoclouds
This is where most AI teams will find the best combination of price, reliability, and capability. Neoclouds are cloud providers built specifically for GPU-as-a-Service: CoreWeave, Lambda Labs, Nebius, Hyperstack, Verda (formerly DataCrunch), Crusoe, and others.
Self-service H100 pricing at neoclouds spans roughly $1.80–$6.16/GPU/hr depending on provider and configuration. Most offer self-service provisioning in minutes with no quota approval. The best neoclouds offer SOC 2 Type II compliance, managed Kubernetes, Slurm cluster management, and InfiniBand networking for distributed training.
Key caveat from SaturnCloud’s June 2026 GPU Report: many neoclouds market high-speed storage (VAST Data, WEKA) and InfiniBand fabrics, but these are often available only in reserved or bespoke contracts – not self-service on-demand. CoreWeave is the clearest exception where enterprise storage is selectable in self-service provisioning. Always verify what’s actually available without a sales call.
Who belongs here: Most AI startups and enterprise teams who need reliable GPU access at meaningful scale without hyperscaler pricing.
Tier 3: Marketplace Platforms
RunPod, Vast.ai, and SF Compute aggregate GPU supply from third-party providers – data centers, mining operations, research institutions – and present it through a unified marketplace. Prices are the lowest available (H100 from $1.38–$2.30/hr) but reliability varies because you’re renting from individual hardware operators.
Who belongs here: Budget-constrained researchers, teams with fault-tolerant checkpoint-based training workloads, experienced ML engineers comfortable managing infrastructure variability.
Also Read: Top GPU Marketplace and AI Platforms
Tier 4: Serverless GPU Platforms
Modal and similar serverless platforms bill per second of GPU execution time, with no minimum. For bursty inference workloads or short iterative jobs, this eliminates idle compute costs entirely. Not appropriate for continuous long-running training.
Who belongs here: Teams building inference APIs with variable traffic, batch processing pipelines, anyone who pays for idle GPU time with always-on providers.
Master Pricing Table: H100 80GB SXM Across Every Major Provider
All prices are per GPU per hour, on-demand unless otherwise noted. Verified June 2026.
| Provider | Tier | H100 Rate | Min GPUs | Billing | Free Egress | Notes |
| Vast.ai | Marketplace | $1.38–$1.87 | 1 | Hourly | ❌ | P2P marketplace; prices vary by host |
| SF Compute | Marketplace | from $1.82 | 1 | Hourly | ✅ | Fluctuates with supply |
| Hyperstack | Neocloud | ~$1.60 | 1 | Hourly | ❌ | Competitive, growing inventory |
| Modal | Serverless | ~$1.50 eff. | Serverless | Per-second | Minimal | No idle costs; best for inference |
| RunPod Community | Marketplace | $1.99 | 1 | Hourly | ❌ | Variable reliability by host |
| RunPod Secure | Neocloud | $2.39 | 1 | Hourly | ❌ | SLA-backed, more reliable |
| Nebius | Neocloud | $2.95 | 1 | Hourly | ✅ | EU data centers; stable pricing |
| Lambda Labs | Neocloud | $2.49 (PCIe) / $3.29–$4.29 (SXM) | 1 | Hourly | ✅ | Free egress; raised SXM rates in 2026 |
| Verda (DataCrunch) | Neocloud | $3.25 (up from $2.29) | 1 | Hourly | ✅ | EU-based; SOC 2 Type II |
| CoreWeave | Neocloud | $6.16/GPU (8× cluster) | 8 | Hourly | ❌ | Reserved: ~$2.47/GPU (60% off) |
| DigitalOcean | Neocloud | $1.99 | 1 | Hourly | Partial | Simple setup; limited GPU selection |
| AWS P5 | Hyperscaler | $6.88 | 8 (instance) | Hourly | ❌ | 44% price cut June 2025; quota required |
| GCP A3 | Hyperscaler | $10.98 | 8 (instance) | Hourly | ❌ | Highest-cost hyperscaler; quota required |
| Azure ND H100 v5 | Hyperscaler | $12.29 | 8 (instance) | Hourly | ❌ | List price; Spot 60–80% off |
Key insight: Self-service H100 pricing now spans roughly $1.80–6.16/hr depending on provider, form factor, and commitment, compared to $6.88/hr on AWS, $10.98/hr on GCP, and $12.29/hr on Azure. That means at list price, Azure H100 is 6.8× more expensive than Vast.ai’s marketplace rate for the same GPU.
4. Full GPU Pricing by Provider
Hyperscalers
Amazon Web Services (AWS)
H100 SXM instance: p5.48xlarge – 8× H100 80GB SXM, $55.04/hr ($6.88/GPU)
Quota process: Support ticket required; approval takes days to weeks
Spot pricing: Available; can reduce costs 60–70% for interruptible workloads
Reserved: 1-year Savings Plans reduce cost ~30–35%
Free egress: No – $0.087/GB outbound
AWS cut its H100 pricing by approximately 44% in June 2025, making it the most affordable hyperscaler GPU option. It remains roughly 4× more expensive than comparable neocloud on-demand rates, but for teams deeply embedded in the AWS ecosystem – with large S3 data lakes, SageMaker pipelines, EKS clusters – the cross-service integration value is real.
AWS’s GPU quota system is the most significant friction point. Unlike neoclouds where you provision in minutes, AWS H100 access typically requires a support ticket explaining your use case. Approvals can take days.
Full AWS H100 GPU pricing:
| Instance | GPUs | GPU VRAM | vCPUs | RAM | On-Demand/hr | Per GPU/hr |
| p5.48xlarge | 8× H100 SXM | 640 GB | 192 | 2,048 GB | $55.04 | $6.88 |
| p5e.48xlarge | 8× H100 SXM | 640 GB | 192 | 2,048 GB | $61.12 | $7.64 |
Best for: AWS-native enterprises with large existing infrastructure investment; SageMaker users; teams receiving AWS startup credits.
Google Cloud Platform (GCP)
H100 SXM instance: a3-highgpu-8g – 8× H100 80GB SXM, $87.84/hr ($10.98/GPU)
Quota process: Similar to AWS; region availability limited
Spot pricing: Available; ~60–70% discount
Free egress: No – standard GCP egress rates apply
GCP is the most expensive major hyperscaler for H100 GPUs at list price. Its differentiation is the Vertex AI platform and TPU ecosystem – for teams building on Google’s AI-native tooling, A3 GPU VMs enable hybrid GPU/TPU architectures that have no equivalent on other clouds.
GCP’s Jupiter network fabric provides high-bandwidth interconnect between A3 instances competitive with neocloud InfiniBand for many distributed training workloads.
Full GCP H100 GPU pricing:
| Instance | GPUs | GPU VRAM | vCPUs | RAM | On-Demand/hr | Per GPU/hr |
| a3-highgpu-8g | 8× H100 SXM | 640 GB | 208 | 1,872 GB | $87.84 | $10.98 |
| a3-megagpu-8g | 8× H100 SXM | 640 GB | 208 | 1,872 GB | $112.01 | $14.00 |
Best for: GCP-native teams, Vertex AI users, organizations building TPU + GPU hybrid training pipelines, BigQuery ML teams extending to GPU compute.
Microsoft Azure
H100 SXM instance: ND96isr H100 v5 – 8× H100 SXM, $98.32/hr ($12.29/GPU) at list price
NC H100 NVL v5: 1–2× H100 NVL (PCIe), unique to Azure
Spot pricing: Azure Spot VMs – 60–80% discount
Reserved: Up to 65% off with 3-year commitment
Confidential GPU: NCC H100 v5 – exclusive to Azure; TEE-protected GPU compute
Free egress: No
Azure’s list price is the highest of any major cloud, but this is rarely what enterprise customers pay. Azure Spot VMs at 60–80% off bring the effective rate to $2.46–$4.92/GPU/hr, which is competitive with premium neoclouds. Large enterprises on Azure Enterprise Agreements receive further negotiated discounts.
Azure’s unique position is the confidential computing tier (NCC H100 v5) and the H100 NVL configuration (exclusive in cloud). No other major provider offers TEE-protected GPU compute – making Azure the only choice for regulated industries that need GPU acceleration without exposing data in memory.
Azure H100 GPU pricing (key VM series):
| VM Series | GPUs | GPU Type | GPU VRAM | On-Demand/GPU/hr | Spot/GPU/hr |
| NC40ads H100 v5 | 1× H100 NVL | PCIe | 94 GB | ~$3.29 | ~$0.66–$1.32 |
| NC80adis H100 v5 | 2× H100 NVL | PCIe | 188 GB | ~$3.29 | ~$0.66–$1.32 |
| ND96isr H100 v5 | 8× H100 SXM | SXM | 640 GB | $12.29 | $2.46–$4.92 |
| NCC H100 v5 | H100 Tensor Core | – | 80 GB | Contact sales | – |
Best for: Azure-ecosystem enterprises, healthcare/finance/government needing confidential GPU, teams requiring HIPAA or FedRAMP compliance, large-scale distributed training on Azure.
Neoclouds:
CoreWeave
H100 SXM pricing: $6.16/GPU/hr on-demand (8-GPU cluster = $49.24/hr)
Reserved discount: Up to 60% off (effective ~$2.47/GPU/hr)
Spot pricing: Available
Min GPUs: 8 (most configurations)
Free egress: No
Compliance: SOC 2 Type II; managed Kubernetes; Slurm
CoreWeave is the enterprise neocloud of record: its customers include OpenAI, Mistral AI, and Jane Street. Its architecture – Kubernetes-native from the start, InfiniBand optional at 400 Gb/s per GPU, NVLink 4.0, GPU Direct RDMA – is built for exactly the scale of workloads those customers run.
GPU portfolio (2026):
| GPU | Config | Total VRAM | On-Demand/GPU/hr | Total/hr |
| H100 SXM NVL | 8× | 640 GB | $6.16 | $49.24 |
| H200 NVL | 8× | 1.1 TB | $6.30 | $50.44 |
| A100 NVL | 8× | 640 GB | $2.70 | $21.60 |
| B200 NVL | 8× | 1.4 TB | $8.60 | $68.80 |
| GB200 NVL72 | 4× | 744 GB | $10.50 | $42.00 |
| L40S | 8× | 384 GB | $2.25 | $18.00 |
| L40 | 8× | 384 GB | $1.25 | $10.00 |
| GH200 | 1× | 96 GB | $6.50 | $6.50 |
| RTX Pro 6000 | 8× | 768 GB | $2.50 | $20.00 |
Best for: Enterprise multi-node training (8+ GPUs), teams needing InfiniBand interconnect, organizations with Kubernetes expertise, long-running committed workloads on reserved pricing.
Lambda Labs
H100 PCIe pricing: $2.49/hr (1× GPU)
H100 SXM pricing: $3.29–$4.29/hr (rates raised in early 2026)
8× H100 SXM: ~$23.92/hr ($2.99/GPU)
Free egress: Yes – unlimited
Min GPUs: 1
Billing: Hourly
Lambda Labs’ free egress policy remains its most distinctive feature in 2026. Note that several self-service neoclouds raised published on-demand H100 rates in early 2026 (Lambda $2.99 to $3.99–$4.29) – making Lambda less of the budget option it once was, while its egress advantage grows more valuable relative to competitors who charge $0.08–$0.12/GB.
Lambda Labs GPU pricing (2026):
| GPU | Config | VRAM | vCPUs | RAM | Price/GPU/hr |
| H100 PCIe | 1× | 80 GB | 26 | 225 GB | $2.49 |
| H100 SXM | 1× | 80 GB | varies | varies | $3.29–$4.29 |
| H100 SXM | 8× | 640 GB | varies | varies | $2.99 |
| B200 SXM | 1× | 180 GB | 26 | 360 GB | $4.99–$6.99 |
| A100 40GB | 1× | 40 GB | 30 | 220 GB | $1.29 |
| A100 80GB SXM | 8× | 640 GB | varies | varies | $1.10 |
| RTX A6000 | 1× | 48 GB | 14 | 100 GB | $0.80 |
| GH200 | 1× | 141 GB | – | – | $1.99 |
Best for: Teams that move large data volumes (free egress matters), researchers needing single-GPU access, startups wanting simplicity without Kubernetes expertise, teams not ready to commit to reserved instances.
Nebius AI Cloud
H100 SXM pricing: ~$2.95/hr (held steady in 2026 while others raised rates)
Free egress: Yes (standard tier)
Compliance: SOC 2 Type II
Data centers: Finland, Netherlands, France (EU), plus US
Min GPUs: 1
Nebius (spun out of Yandex) is the strongest EU-focused GPU cloud option. Its H100 rate of $2.95/hr has stayed stable while several competitors raised rates – and its EU data center presence in Finland, Netherlands, and France addresses GDPR and data residency requirements that most US-based neoclouds can’t meet. Nebius offers managed Kubernetes and Slurm, placing it in the top tier of neocloud enterprise maturity.
Nebius GPU pricing:
| GPU | VRAM | Price/GPU/hr |
| H100 SXM | 80 GB | $2.95 |
| A100 SXM | 80 GB | $1.65 |
| H200 SXM | 141 GB | ~$4.50 |
Best for: EU teams with GDPR compliance requirements, organizations needing EU data residency, teams wanting mature neocloud infrastructure (SOC 2, Kubernetes, Slurm) at sub-$3/hr H100 pricing.
Hyperstack
H100 SXM pricing: ~$1.60/hr
A100 pricing: from $1.35/hr
Free egress: No
Min GPUs: 1
Billing: Hourly, no minimum commitment
Hyperstack occupies the sweet spot between marketplace unreliability and premium neocloud pricing. At ~$1.60/hr for H100 SXM with managed infrastructure (no marketplace variability), it offers some of the best price-performance available in the self-service neocloud tier. Growing EU and UK data center presence makes it increasingly relevant for European teams.
Best for: Startups and scale-ups wanting managed GPU infrastructure below Lambda pricing, EU/UK teams, teams needing H100 or A100 without minimum commitment.
Verda (formerly DataCrunch)
H100 SXM pricing: $3.25/hr (raised from $2.29 in early 2026)
B300 SXM pricing: Available (contact sales)
Free egress: Yes
Compliance: SOC 2 Type II
Data centers: EU-focused (Finland, expanding)
Services: Instances, clusters, serverless containers, managed inference endpoints
Verda is a notable EU-based neocloud with a broad service portfolio that goes beyond raw GPU rental: managed inference endpoints, serverless containers, and instant InfiniBand clusters. Its SOC 2 Type II certification and EU infrastructure make it competitive with Nebius for European enterprise teams. The rate increase in 2026 ($2.29 → $3.25) is worth noting for teams budgeting based on older pricing.
Verda GPU portfolio (2026):
| GPU | VRAM | Notes |
| H100 SXM | 80 GB | $3.25/hr |
| H200 SXM | 141 GB | Available |
| B200 SXM | 180–192 GB | Available |
| B300 SXM | 262 GB | Available |
| GB300 NVL72 | New | Available |
| A100 SXM | 80 GB | Available |
| RTX Pro 6000 | 96 GB | Available |
Best for: EU teams needing managed inference alongside training infrastructure, organizations wanting a single EU provider for the full AI stack (training → serving).
Crusoe
H100 pricing: Available; contact sales for specific rates
Compliance: SOC 2 Type II; FedRAMP In Process
Infrastructure: InfiniBand clusters; managed Kubernetes; Slurm
Egress: Free
Crusoe differentiates on two dimensions: sustainability (it runs on otherwise-wasted gas flare energy) and government/defense suitability (FedRAMP In Process certification). It’s one of the few neoclouds working toward US federal compliance, making it relevant for government contractors and regulated industries that want neocloud pricing without hyperscaler lock-in.
Best for: Government agencies, federal contractors, sustainability-focused enterprises, teams needing FedRAMP-compatible infrastructure at neocloud pricing.
Marketplace Platforms
RunPod
H100 community cloud: ~$1.99/hr
H100 secure cloud: ~$2.39/hr
Spot (community): from ~$1.25/hr
RTX 4090: from $0.34/hr
Free egress: No
Billing: Hourly (on-demand); per-request (serverless)
RunPod’s dual-tier model gives teams meaningful choice. The community cloud (GPUs from independent operators) delivers the lowest managed-ish H100 rates in the market at $1.99/hr. The secure cloud (RunPod-managed infrastructure) adds reliability guarantees at $2.39/hr. Serverless GPU functions with sub-3-second cold starts make RunPod viable for inference workloads without always-on GPU costs.
RunPod GPU pricing:
| GPU | Config | VRAM | Community $/hr | Secure $/hr |
| RTX 4090 | 1× | 24 GB | $0.34 | $0.74 |
| RTX 3090 | 1× | 24 GB | $0.22 | – |
| A100 SXM | 1× | 80 GB | $1.64 | $2.21 |
| H100 PCIe | 1× | 80 GB | $1.99 | $2.39 |
| H100 SXM | 1× | 80 GB | $1.99 | $2.39 |
Best for: Budget-conscious startups, researchers running checkpoint-based training, serverless inference workloads, teams wanting the widest GPU selection including consumer cards.
Vast.ai
H100 pricing: from $1.38/hr (marketplace low) to ~$2.30/hr (typical)
A100 spot: from $0.29/hr
RTX 4090: from $0.20/hr
Free egress: No
Reliability: Variable – host-dependent
Vast.ai is a pure P2P GPU marketplace. Vast.ai’s peer-to-peer marketplace offers the absolute lowest rates (H100 from $1.87/hr), though with variable reliability. Great for research and checkpointed training workloads. The operational model requires that your training pipeline is genuinely fault-tolerant – hosts can take machines offline with limited warning.
Vast.ai GPU pricing (approximate market ranges, June 2026):
| GPU | VRAM | Typical range | Market low |
| H100 SXM | 80 GB | $1.87–$2.30/hr | $1.38/hr |
| A100 80GB | 80 GB | $0.90–$1.50/hr | $0.29/hr (spot) |
| RTX 4090 | 24 GB | $0.20–$0.44/hr | $0.20/hr |
| H200 SXM | 141 GB | $3.00–$4.50/hr | varies |
Best for: Research with checkpoint-based training, budget experimentation, teams comfortable with operational variability, experienced ML engineers.
GPU-by-GPU Pricing: A100, H100, H200, B200
NVIDIA A100 80GB – The Established Workhorse
Still the most cost-effective GPU for many mid-scale training and inference workloads. A100 availability is high and prices have stabilized.
| Provider | Config | Price/GPU/hr |
| Vast.ai | 1× (spot) | from $0.29 |
| RunPod community | 1× | $1.64 |
| Lambda Labs | 1× 40GB SXM | $1.29 |
| Lambda Labs | 8× 80GB SXM | $1.10 |
| Nebius | 1× SXM | $1.65 |
| Hyperstack | 1× | from $1.35 |
| CoreWeave | 8× NVL | $2.70 |
| AWS (P4d) | 8× (instance) | $4.10 |
Verdict: A100 remains the best cost-per-TFLOP option for workloads that fit in 80 GB VRAM. For 7B–30B model training and most inference serving, A100 at $1.10–$1.65/hr from neoclouds is hard to beat.
NVIDIA H100 80GB – The Current Standard for AI Training
The most widely deployed GPU for frontier model training and high-throughput inference in 2026. Two configurations matter: PCIe (lower bandwidth, single-GPU friendly) and SXM (higher bandwidth, optimized for multi-GPU clusters).
| Provider | Config | Price/GPU/hr |
| Vast.ai | 1× SXM (marketplace) | from $1.38 |
| Hyperstack | 1× SXM | ~$1.60 |
| RunPod community | 1× | $1.99 |
| DigitalOcean | 1× | $1.99 |
| Lambda Labs | 1× PCIe | $2.49 |
| RunPod secure | 1× | $2.39 |
| Nebius | 1× SXM | $2.95 |
| Lambda Labs | 1× SXM | $3.29–$4.29 |
| Verda | 1× SXM | $3.25 |
| CoreWeave | 8× NVL (÷8) | $6.16 |
| AWS P5 | 8× SXM (÷8) | $6.88 |
| GCP A3 | 8× SXM (÷8) | $10.98 |
| Azure ND H100 v5 | 8× SXM (÷8) | $12.29 |
Verdict: For single-GPU H100 access at managed infrastructure quality, RunPod secure ($2.39) or DigitalOcean ($1.99) offers strong value. For multi-GPU cluster training, CoreWeave reserved pricing (~$2.47/GPU after 60% discount) and Lambda’s 8× configuration ($2.99) are competitive. Hyperscalers remain 4–7× more expensive at list price.
NVIDIA H200 141GB – High-Memory Training and Inference
H200 (Hopper + HBM3e, 141 GB VRAM) is the right GPU when your model exceeds H100 VRAM capacity – 70B+ parameter inference without quantization, multimodal models with large context windows, or training configurations that exceed 80 GB.
| Provider | Config | Price/GPU/hr |
| CoreWeave | 8× NVL (÷8) | $6.30 |
| Nebius | 1× SXM | ~$4.50 |
| Lambda Labs | 1× | ~$6.99 |
| Azure ND H200 v5 | Available | Contact sales |
| GCP A3 Mega | 8× (÷8) | ~$14.00 |
Verdict: H200 pricing remains significantly higher than H100, and availability outside top-tier providers is limited. For most workloads that “need more VRAM,” quantized H100 inference or model parallelism across H100 GPUs is usually more cost-effective than moving to H200.
NVIDIA B200 192GB – Blackwell Generation, High Performance
The B200 (Blackwell architecture, 192 GB HBM3e, dramatically higher FP8 throughput) is now self-service on-demand at a handful of providers in 2026. GB200/GB300 remains largely reserved or contact-sales.
| Provider | Config | Price/GPU/hr |
| Lambda Labs | 1× SXM | $4.99–$6.99 |
| CoreWeave | 8× NVL (÷8) | $8.60 |
| Verda | 1× SXM | Available |
| Hyperstack | Expanding | Contact |
Verdict: B200 delivers meaningfully better FP8 inference throughput than H100, making it relevant for high-throughput inference serving. For most training workloads where throughput is not the bottleneck, H100 at 3–4× lower cost is more economical. Evaluate B200 specifically when inference throughput per dollar is your optimization target.
NVIDIA RTX 4090 24GB – Consumer GPU for Smaller Workloads
The RTX 4090 is a consumer GPU that punches far above its price class for inference and fine-tuning workloads that fit in 24 GB VRAM. At $0.20–$0.74/hr, it’s the most cost-effective option for LoRA fine-tuning on 7B models, small-scale inference, and developer testing.
| Provider | Price/GPU/hr | Notes |
| Vast.ai | from $0.20/hr | Marketplace; variable |
| RunPod community | $0.34/hr | Variable reliability |
| RunPod secure | $0.74/hr | SLA-backed |
Verdict: For workloads under 24 GB VRAM, RTX 4090 at $0.34–$0.74/hr is 4–8× cheaper than H100 with similar practical throughput for inference. Don’t rent an H100 for work that fits on a 4090.
The Hidden Costs Nobody Talks About
Headline GPU $/hr is only one component of your actual GPU cloud bill. These four factors routinely change the ranking when teams do honest total cost comparisons.
Egress Costs
Most neoclouds charge zero egress, unlike hyperscalers ($0.087–0.12/GB). Lambda Labs, Nebius, Verda, SF Compute, and Crusoe all offer free egress. CoreWeave, RunPod, Vast.ai, and hyperscalers charge per GB.
The real impact: A team downloading 5 TB of model checkpoints per month from a provider that charges $0.10/GB pays ~$512/month in egress alone – $6,144/year. On Lambda or Nebius, that same data transfer is free. For data-intensive workflows, free egress can easily offset a $0.50–$1.00/hr premium in GPU pricing.
Billing Granularity
Most providers bill hourly, rounding up. If your training run finishes in 23 minutes, you pay for 60. If you run 10 preprocessing jobs at 8 minutes each, you pay for 10 full hours. For teams running many short iterative jobs daily, this matters:
- Hourly billing: Round up to next 60 minutes (most providers)
- Per-minute billing: TensorDock, some RunPod configurations
- Per-second billing: Modal – eliminates idle cost entirely for serverless workloads
Storage Costs
| Provider | Persistent Storage | Notes |
| Lambda Labs | $0.20/GB/mo | Block storage |
| CoreWeave | $0.08/GB/mo | S3-compatible object storage |
| Nebius | Included tiers | Varies by plan |
| AWS | $0.08/GB/mo (S3) | Plus request fees |
| GCP | $0.02–$0.04/GB/mo | Coldline vs standard |
| Vast.ai | Separate from GPU | Host-provided |
For a 10 TB training dataset stored long-term, storage cost differences range from $200 to $2,000/month depending on provider – sometimes exceeding the GPU compute cost for iteration-heavy workflows where the dataset is stable.
Reserved vs On-Demand Gap
The difference between on-demand and committed reserved pricing at the same provider is often larger than the difference between providers:
- CoreWeave on-demand H100: $6.16/GPU/hr
- CoreWeave reserved H100: ~$2.47/GPU/hr (60% off)
- Lambda on-demand H100 SXM: $3.29–$4.29/GPU/hr
- Lambda on-demand vs CoreWeave reserved: Lambda is more expensive on-demand than CoreWeave on reserved
Before switching providers to save money, check whether reserved pricing at your current provider would be cheaper.
Real Training Cost Scenarios
Scenario A: Fine-Tuning Llama 3.1 8B (LoRA, 1× A100, 40 Hours)
A typical research fine-tuning run for a 7B–8B parameter model.
| Provider | Rate | 40hr Cost | Notes |
| Vast.ai | $0.90/hr | $36 | Variable reliability |
| RunPod community | $1.64/hr | $65.60 | Acceptable reliability |
| Lambda Labs | $1.29/hr (A100 40GB) | $51.60 | Free egress |
| Nebius | $1.65/hr | $66 | EU data centers |
| AWS P4d | $4.10/hr | $164 | 4× neocloud cost |
Winner: Vast.ai or Lambda Labs A100, depending on reliability tolerance.
Scenario B: Training a 70B Model (8× H100 SXM, 2 Weeks Continuous)
A serious foundation model training run.
| Provider | Config | Rate | 2-week Cost | Notes |
| Vast.ai | 8× H100 | $1.87/GPU avg | $4,492 | Checkpoint required |
| RunPod secure | 8× H100 | $2.39/GPU | $6,411 | SLA-backed |
| Lambda Labs | 8× H100 | $2.99/GPU | $8,027 | Free egress |
| CoreWeave (on-demand) | 8× H100 | $6.16/GPU | $16,545 | Managed orchestration |
| CoreWeave (reserved) | 8× H100 | ~$2.47/GPU | $6,629 | Cheaper than Lambda on-demand |
| AWS P5 | 8× H100 | $6.88/GPU | $18,482 | AWS ecosystem |
Winner on cost: Vast.ai. Winner on reliability + cost: RunPod secure cloud or CoreWeave reserved.
Scenario C: Production Inference Serving (1× H100, Always-On, 6 Months)
An inference API serving a fintech application, always on.
| Provider | Rate | 6-month cost | Egress (1TB/mo) | True 6-month total |
| Modal (serverless) | ~$1.50/hr eff. | ~$6,480 | Minimal | ~$6,700 |
| RunPod secure | $2.39/hr | $10,450 | ~$600 | $11,050 |
| Lambda Labs | $2.49/hr | $10,888 | Free | $10,888 |
| Nebius | $2.95/hr | $12,882 | Free | $12,882 |
| Azure Spot | ~$2.46/hr | $10,760 | ~$600 | $11,360 |
| AWS P5 Spot | ~$2.75/hr | $12,018 | ~$600 | $12,618 |
Winner: Modal (serverless) for workloads with any quiet periods – it eliminates idle GPU costs. Lambda Labs for steady-state inference where free egress adds value.
Scenario D: Hyperparameter Search (100 short GPU jobs, 15 min each, 1× A100)
The billing granularity scenario – 100 jobs × 15 minutes = 25 actual GPU hours.
| Provider | Billing | You Pay For | Effective rate | 25hr equivalent cost |
| Modal | Per-second | 25 GPU hours | $1.50/hr | $37.50 |
| RunPod community | Hourly | 100 hours (rounded up) | $1.64/hr | $164 |
| Lambda Labs | Hourly | 100 hours | $1.29/hr | $129 |
Winner: Modal – 4× cheaper than any hourly provider for short iterative jobs.
Buy vs. Rent: When Does Owning GPUs Make Sense?
This is a question Cantech’s clients ask regularly, and the answer is more nuanced than most “rent vs. buy” articles suggest.
The break-even math:
An NVIDIA H100 SXM server (8 GPUs) typically costs $250,000–$350,000 new in 2026 (prices have come down significantly from 2023–2024 peaks). At RunPod secure cloud H100 rates of $2.39/GPU/hr:
- 8 GPUs × $2.39/hr × 8,760 hrs/year = $167,575/year for 24/7 operation
- Break-even on a $300,000 server: ~1.8 years at full utilization
For organizations running GPUs at near-100% utilization for 2+ years, purchasing can be economical. But the true total cost of ownership for owned hardware includes:
- Data center colocation or build-out cost
- Power and cooling ($0.10–$0.15/kWh × 10+ kW per server = $8,000–$13,000/year)
- Hardware maintenance and support contracts
- Engineering time for cluster management, firmware updates, driver maintenance
- Opportunity cost of capital
Most teams that buy GPUs discover their utilization is 50–70%, not 100%, which pushes the break-even out to 3+ years – at which point the hardware is approaching end-of-support and newer GPU generations are available.
General rule: Rent unless you have consistent, predictable >80% GPU utilization for 24+ months with the operational staff to manage on-premise infrastructure. Very few AI teams meet all three criteria.
How to Choose Based on Your Workload
“I’m a solo researcher or student”
→ Vast.ai (cheapest, if checkpoint-based) or RunPod community cloud. RTX 4090 instances at $0.34/hr if your model fits in 24 GB VRAM.
“I’m a startup fine-tuning 7B–30B models”
→ RunPod secure cloud ($2.39/hr H100) or Lambda Labs (free egress valuable if you download checkpoints often). DigitalOcean if you want the simplest possible setup.
“I’m doing large-scale 70B+ model training on 8+ GPUs”
→ CoreWeave reserved (most cost-effective after 60% discount if you can commit) or Lambda Labs 8× cluster ($2.99/GPU). Verify InfiniBand availability – it materially speeds up distributed training.
“I’m building a production inference API”
→ Modal (serverless, no idle costs) for variable traffic. Lambda Labs or Nebius (free egress) for steady-state traffic at predictable volume.
“My company has GDPR / EU data residency requirements”
→ Nebius (SOC 2, EU data centers) or Verda (SOC 2, EU-first). Both offer free egress.
“My company is in healthcare, finance, or government”
→ Azure NCC H100 v5 for confidential GPU computing (only option). Crusoe for FedRAMP-track compliance. AWS or Azure for existing compliance frameworks.
“I’m already deep in AWS / GCP / Azure”
→ Use the native GPU offering. Cross-cloud egress costs and operational overhead often exceed the per-GPU savings from switching for established teams.
“I need the newest GPUs (B200, GB200, B300)”
→ CoreWeave (best availability), Verda, or Hyperstack. Lambda is catching up but CoreWeave is clearest for newest Blackwell generation self-service access.
“I want to minimize total spend across training + inference + storage”
→ Multi-provider strategy: Vast.ai/RunPod for hyperparameter search, CoreWeave reserved or Lambda for committed training, Modal for inference. Cantech can design and manage this architecture.
How Cantech Helps You Optimize GPU Cloud Spend
Most teams overspend on GPU cloud in predictable ways: they use the same provider for all workload types, they rent on-demand when reserved pricing is available, they ignore egress costs until they see the bill, and they don’t architect training jobs to take advantage of spot pricing.
At Cantech, we specialize in GPU cloud cost architecture – the work that happens before you provision a single GPU.
What We Offer
GPU Cloud Cost Assessment
We audit your current GPU cloud usage – provider, instance types, job durations, egress volume, utilization rates, and reserved vs on-demand mix – and produce a total cost of ownership analysis with projected savings under alternative configurations. Most teams are surprised by how much the egress and billing granularity picture changes their optimal provider choice.
Multi-Provider Architecture Design
We design GPU compute strategies that use each provider where it wins: Vast.ai or RunPod for hyperparameter search and early-stage experimentation, CoreWeave reserved or Lambda for committed training runs, Modal for production inference endpoints. This approach consistently delivers 35–55% reduction in total GPU cloud spend vs. single-provider strategies.
Reserved Instance Strategy
CoreWeave’s 60% reserved discount and Azure’s 65% Reserved Instance pricing are significant – but committing incorrectly can strand budget in the wrong GPU type or region. We analyze your compute utilization patterns and build a reservation strategy that maximizes discount without over-committing.
Migration and Containerization
Moving between GPU cloud providers is easier when your training environment is fully containerized. We build Docker images that run identically on any provider, eliminating the environment lock-in that makes migrations slow.
EU / GDPR-Compliant GPU Infrastructure
For European teams and regulated industries, we design GPU architectures using Nebius, Verda, or Azure EU regions that meet data residency requirements without sacrificing performance or paying hyperscaler premiums.
Confidential AI Deployment
For healthcare, finance, and government clients needing TEE-protected GPU compute, we specialize in deploying workloads on Azure’s NCC H100 v5 VMs – the only major cloud option for confidential GPU computing.
Results We Deliver
- Average 43% GPU cloud cost reduction within 90 days
- Multi-provider pipelines that eliminate single-provider availability risk
- Reserved instance strategies that reduce committed spend 40–60%
- Full MLOps stack: experiment tracking, checkpoint management, cost monitoring
Spending more than $5,000/month on GPU cloud? The ROI on a Cantech GPU cost assessment is typically 10:1 or better within 6 months.
Frequently Asked Questions
What is the cheapest cloud GPU in 2026?
The cheapest H100 GPU access is on Vast.ai’s marketplace, with instances available from $1.38/hr – though this is a P2P marketplace with variable reliability. For managed infrastructure, Hyperstack (~$1.60/hr) and RunPod community cloud ($1.99/hr) offer the lowest prices. For non-H100 workloads, RTX 4090 instances on Vast.ai start from $0.20/hr.
Why is AWS so much more expensive than neoclouds for H100?
AWS remains the cheapest hyperscaler per GPU at $6.88/hr, still roughly 4× the cheapest self-service marketplace rate. GCP and Azure are 6–7× more expensive. The premium pays for: global presence in 30+ regions, an ecosystem of 200+ integrated services, enterprise SLAs with contractual guarantees, regulatory compliance frameworks (HIPAA, FedRAMP, SOC 2), and self-service quota without waitlists – things neoclouds can’t fully replicate.
Is H100 PCIe or H100 SXM better for my workload?
H100 SXM has higher memory bandwidth (3.35 TB/s vs 2.0 TB/s for PCIe) and is the better choice for training-dominated workloads where GPU-to-GPU communication speed matters. H100 PCIe is adequate for single-GPU inference and training where memory bandwidth isn’t the bottleneck, and is typically cheaper ($2.49/hr on Lambda vs $3.29+/hr for SXM). For distributed training across multiple H100 GPUs, always use SXM.
Do all GPU clouds offer free egress?
No. Free egress remains standard at most neoclouds, unlike hyperscalers ($0.087–0.12/GB). Lambda Labs, Nebius, Verda, SF Compute, and Crusoe offer free egress. CoreWeave, RunPod, Vast.ai, and all three hyperscalers charge for outbound data. For teams regularly transferring large datasets or checkpoints, this difference can add $500–$2,000+/month to effective GPU cloud cost.
How much does it cost to train a 70B model?
Using Lambda Labs’ 8× H100 SXM cluster at $2.99/GPU/hr for a 2-week continuous training run: ~$8,027. On CoreWeave reserved pricing: ~$6,629. On AWS P5 on-demand: ~$18,482. These figures don’t include storage, egress, or engineering time – all of which vary by provider and architecture.
What GPU cloud is best for inference serving?
For inference serving with variable traffic, Modal (per-second serverless billing) eliminates idle GPU costs – typically 3–4× cheaper than always-on providers for workloads with any quiet periods. For steady-state high-throughput inference, Lambda Labs (free egress), RunPod secure cloud, or Azure NC H100 v5 (for Azure-native applications) are strong options.
Should I use spot instances for GPU training?
For training workloads that use checkpoint-based fault tolerance – saving model state every N steps so the job can resume if interrupted – spot instances are strongly recommended. Savings are 60–80% on hyperscalers, and 30–50% on neoclouds that offer spot pricing (CoreWeave, RunPod community). For production inference serving, don’t use spot – interruptions affect end users.
Which GPU cloud providers offer SOC 2 Type II compliance?
As of June 2026, providers with verified SOC 2 Type II include: Nebius, CoreWeave, Crusoe, and Verda. AWS, GCP, and Azure all have SOC 2 Type II and broader compliance frameworks. For HIPAA or FedRAMP requirements, hyperscalers are the established path; Crusoe is pursuing FedRAMP authorization.