Cloud AI Pricing
Google Vertex AI Pricing 2026:
Gemini 2.5 Flash, Pro & Enterprise Costs
Complete Google Vertex AI pricing guide for 2026 — Gemini 2.5 Flash-Lite, Flash, and Pro. How Vertex AI compares to Gemini API direct, compliance features, and when to choose each path. Last verified: 2026-04-01.
10 min read·Updated April 2026
Gemini 2.0 shutdown notice: Gemini 2.0 Flash and Gemini 2.0 Flash-Lite are scheduled for shutdown on 2026-06-01. The current production models on Vertex AI are the Gemini 2.5 family. This page reflects Gemini 2.5 pricing.
Vertex AI Gemini 2.5 Pricing at a Glance
$0.10/M
Flash-Lite input (cheapest)
$0.30/M
Gemini 2.5 Flash input
$1.25/M
Gemini 2.5 Pro input
1M tokens
Context window (all tiers)
Gemini 2.5 on Vertex AI — Current Model Pricing
| Model | Input / 1M tokens | Output / 1M tokens | Context window | Best for |
|---|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M tokens | High-volume classification, chatbots, simple tasks |
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M tokens | Mid-range reasoning, long documents, coding |
| Gemini 2.5 Pro | $1.25 | $10.00 | 1M tokens | Complex reasoning, full codebase analysis, research |
| text-embedding-005 | $0.025 | N/A | 2K | Semantic search, RAG ingestion |
A key Gemini 2.5 advantage: 1M token context window is available at ALL tiers, including the cheapest Flash-Lite at $0.10/M — GPT-5.4 nano/mini are capped at 128K.
Vertex AI vs Gemini API Direct: Key Differences
| Feature | Vertex AI | Gemini API (AI Studio) |
|---|---|---|
| Pricing | Same token rates | Same token rates |
| Free tier | $300 Google Cloud credits | Generous free tier (Flash-Lite) |
| Enterprise compliance (GDPR, HIPAA, SOC 2) | Full support | Limited |
| Data residency | EU, US, APAC regions | US primarily |
| Fine-tuning (supervised) | Full support | Limited |
| Batch predictions (50% off) | Yes | Yes |
| Google Cloud integration (BigQuery, GCS) | Native | Not available |
| Private networking (VPC) | VPC Service Controls | Not available |
Choose Gemini API direct for development and cost-sensitive production. Choose Vertex AI when you need enterprise compliance, data residency, or deep GCP integration.
Gemini 2.5 vs OpenAI GPT-5.4 on Price
| Tier | Google model | Google input/1M | OpenAI model | OpenAI input/1M | Price gap |
|---|---|---|---|---|---|
| Budget | Gemini 2.5 Flash-Lite | $0.10 | GPT-5.4 nano | $0.20 | Google 2× cheaper |
| Mid-range | Gemini 2.5 Flash | $0.30 | GPT-5.4 mini | $0.75 | Google 2.5× cheaper |
| Premium | Gemini 2.5 Pro | $1.25 | GPT-5.4 | $2.50 | Google 2× cheaper |
Real-World Vertex AI Cost Example
Document Processing Pipeline (1M pages/month)
- Average page: 500 tokens input + 200 tokens output
- Total: 500M input + 200M output tokens
- Gemini 2.5 Flash-Lite: $50 + $80 = $130/month
- Gemini 2.5 Flash: $150 + $500 = $650/month
- GPT-5.4 nano (OpenAI): $100 + $250 = $350/month
- GPT-5.4 (OpenAI): $1,250 + $3,000 = $4,250/month
Vertex AI Fine-Tuning Costs
- Gemini 2.5 Flash fine-tuning: $8.00 per 1M training tokens
- Fine-tuned model inference: standard Gemini 2.5 Flash pricing applies
- Minimum training dataset: 100 examples
- Typical fine-tune: 10,000 examples = ~5M tokens = ~$40 one-time cost
When to Choose Vertex AI
- You're already on Google Cloud — consolidated billing, existing credits, no new vendor
- HIPAA/GDPR compliance required — Vertex AI is a Google Cloud HIPAA-eligible service
- Data needs to stay in EU or APAC — Vertex supports regional data residency
- You need 1M context at the lowest cost — Flash-Lite at $0.10/M with 1M context; no OpenAI equivalent
- You want batch processing discounts — 50% off for batch predictions via Vertex
Compare Vertex AI vs Azure vs Direct API
Calculate which cloud AI platform is cheapest for your workload volume.
AI API Cost Calculator