TCO Calculator Expert

Compare Cloud API, Cloud GPU Rental, and On-Premise deployments with real-time hardware validation and cost breakdown. Built for senior architects making production decisions.

Pricing dataset

Loading verified October 2025 pricing…

v—

Start here

Three steps, one clean comparison

  1. 01
    Set the workload once.

    Enter queries and tokens in Cloud API. The same demand syncs across Cloud GPU and On-Premise.

  2. 02
    Align the model pair.

    Select the provider and the open-source match. VRAM sizing and GPU validation update automatically.

  3. 03
    Review results.

    Check the 3-year chart, breakeven, and recommendations before exporting.

Jump to results

Quick profiles

Pick a profile to prefill everything

Profiles sync across all tabs and can be edited anytime.

A
Regulated enterprise

LLM-based knowledge assistant, 500K queries/month.

  • Claude 3.7 Sonnet with Llama 3.3 70B FP8 baseline.
  • Compare Cloud GPU H100 commit vs dual H200 racks.
B
Industrial co-pilot

Streaming telemetry, 1.2M queries/month.

  • High volume, GPT-4o paired with DeepSeek 67B.
  • Auto-suggests 4x H100 or MI300X sizing.
C
Air-gapped research pod

80K queries/month, strict sovereignty.

  • Gemini 1.5 Pro paired with Qwen 32B.
  • On-Prem discounts removed, opex increased.
Scroll to the scenarios
1 USD = 0.92 EUR (Loading...)
Cloud API
€0
3-year TCO
Cloud GPU
€0
3-year TCO
On-Premise
€0
3-year TCO

📊 Default Scenario: Enterprise with Existing Datacenter (500K queries/month)

Profile: Mid-sized enterprise, 500K queries/month, Claude 3.7 Sonnet equivalent (Llama 3.3 70B FP8), 40% GPU discount, industrial power (€0.12/kWh), automated DevOps (0.05 FTE), existing datacenter.

View pricing assumptions and insights

Real pricing (Oct 2025): Cloud API €227K/3yr (€3/$15 per M tokens). Lambda H100 @ $1.85/hr = €91K/3yr (2× H100 80GB). On-premise 2× H200: €44K capex + €39K opex = €83K total. Breakeven at ~18 months.

Key Insight: The larger/smarter the LLM, the more on-premise wins! Small models (8B): Cloud API best (€10K). Medium models (70B): On-premise wins (€83K vs €91K Cloud GPU). Large models (671B): On-premise dominates with 62-82% savings. At 500K queries/month, self-hosting starts making financial sense. 5yr+ horizon: on-premise wins dramatically.

Hide advanced inputs for a clean flow.

Cloud API Configuration

Guided view hides advanced assumptions. Toggle above for full detail.

Step 1 of 3

📊 Workload

🔄 Auto-synced with Cloud GPU scenario
Typical RAG: context + question
⚠️ Output tokens cost 2-5× more than input!

💰 Pricing

Typical for RAG with document retrieval

💵 Cost Summary

Input tokens cost: €0
Output tokens cost: €0
Egress bandwidth: €0
Monthly Total: €0
Annual Total: €0
3-Year TCO: €0

Cloud GPU Configuration

Guided view hides advanced assumptions. Toggle above for full detail.

Step 1 of 3

📊 Workload

🔄 Auto-synced with API scenario
1200 input + 600 output = 1800 total
🚀 Performance:
  • Throughput: 480 tokens/sec
  • Max queries/hour: 1,570
  • GPU utilization: 87%

🤖 Model Selection

📊 Total VRAM Required: 88 GB
  • Model weights: 70 GB
  • KV cache: 18 GB
  • Safety margin (20%): 18 GB

🖥️ Hardware

Auto-filled from provider, or enter custom rate

💵 Cost Summary

GPU rental (24/7): €0
Storage: €0
Egress bandwidth: €0
Monthly Total: €0
Annual Total: €0
3-Year TCO: €0

On-Premise Configuration

Guided view hides advanced assumptions. Toggle above for full detail.

Step 1 of 3

🤖 Model Selection

📊 Total VRAM Required: 88 GB
  • Model weights: 70 GB
  • KV cache: 18 GB
  • Safety margin (20%): 18 GB

🖥️ Hardware Capex

2× H200 = sufficient for Llama 3.3 70B FP8
30-40% typical for multi-GPU enterprise orders
AMD EPYC 9554 or similar
Total Capex: $0

💵 Cost Summary

Capex (amortized 36 months): €0
Power & cooling: €0
Maintenance: €0
IT staff: €0
ISP: €0
Monthly Total: €0
Annual Total: €0
3-Year TCO: €0

📈 3-Year TCO Comparison

Metric Cloud API Cloud GPU On-Premise
Monthly Cost €0 €0 €0
3-Year TCO €0 €0 €0
Breakeven vs API - - -
ROI over 3 years - - -

💡 Recommendations

📥 Export Results