Gemini API Pricing 2026:
Gemini 2.5 Flash, Flash-Lite, Pro & 3.1 Preview
Current Google Gemini API pricing for production models: Gemini 2.5 Flash, 2.5 Flash-Lite, and 2.5 Pro. Includes Gemini 3.1 Pro Preview details and deprecation notes for Gemini 2.0 Flash. Last verified: 2026-04-01.
Google's Gemini pricing landscape has shifted significantly in 2026. For new production work, the recommended models are Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, and Gemini 2.5 Pro. Gemini 3.1 Pro Preview is also available for early access, clearly labeled as a preview model.
Production Models — Gemini 2.5 Family
| Model | Input / 1M tokens | Output / 1M tokens | Context | Notes |
|---|---|---|---|---|
| Gemini 2.5 Flash | $0.30 | $2.50 | 1M tokens | High-volume reasoning-capable default |
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | 1M tokens | Cheapest stable production option |
| Gemini 2.5 Pro | $1.25 / $2.50* | $10.00 / $15.00* | 1M tokens | Strongest 2.5-series; *tiered above 200K tokens |
Preview Models — Gemini 3.1
| Model | Input / 1M (≤200K ctx) | Input / 1M (>200K ctx) | Output / 1M (≤200K) | Output / 1M (>200K) |
|---|---|---|---|---|
| Gemini 3.1 Pro Preview | $2.00 | $4.00 | $12.00 | $18.00 |
Legacy / Deprecated — Gemini 2.0 Flash
| Model | Input / 1M | Output / 1M | Status |
|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | Deprecated — shutdown 2026-06-01 |
| Gemini 2.0 Flash-Lite | $0.075 | $0.30 | Deprecated — migrate to 2.5 Flash-Lite |
Gemini's Key Advantage: 1 Million Token Context
Every Gemini 2.5 model includes a 1 million token context window — 8× larger than GPT-4o (128K) and 5× larger than Claude Haiku 4.5 (200K). This makes Gemini the natural choice for:
- Full codebase analysis without chunking
- Processing entire legal or financial documents in one request
- Long-form video or audio transcription analysis
- Large dataset inspection without a separate retrieval layer
Gemini Free Tier
Google AI Studio offers a free tier for development and prototyping:
- Gemini 2.5 Flash: 15 requests/minute, 1,500 requests/day
- Gemini 2.5 Pro: 5 requests/minute, 25 requests/day
For early-stage projects and MVPs, Gemini can be effectively free until you hit production traffic levels.
Gemini 2.5 Flash vs GPT-4o mini vs Claude Haiku 4.5
| Monthly volume | Gemini 2.5 Flash | Gemini 2.5 Flash-Lite | GPT-4o mini | Claude Haiku 4.5 |
|---|---|---|---|---|
| 10M tokens | $3 | $1 | $1.50 | $10 |
| 100M tokens | $30 | $10 | $15 | $100 |
| 1B tokens | $300 | $100 | $150 | $1,000 |
When to Choose Gemini API
- Long document processing — 1M context window without extra cost
- Lowest-cost production option — Gemini 2.5 Flash-Lite is among the cheapest stable APIs
- Multimodal applications — Native image, video, and audio support in all 2.5 models
- Google Cloud ecosystem — Native Vertex AI, BigQuery, and Google Workspace integration
- Free-tier prototyping — Most generous free tier of the major providers
How to Access Gemini API
Obtain an API key via Google AI Studio (aistudio.google.com) for free-tier usage. For production with SLAs, deploy through Google Cloud Vertex AI. No approval process required for AI Studio API keys.
Frequently Asked Questions
Is Gemini 2.0 Flash still safe to start new projects with?
No. Gemini 2.0 Flash is deprecated with a shutdown date of 2026-06-01. Start new projects on Gemini 2.5 Flash or Gemini 2.5 Flash-Lite instead — both offer comparable or better pricing and are actively maintained.
Which Gemini model is cheapest for production?
Gemini 2.5 Flash-Lite at $0.10/M input is the cheapest stable production option. For tasks needing more reasoning capability, Gemini 2.5 Flash at $0.30/M is still highly competitive.
Should I use Gemini 3.1 Pro Preview in production?
Not yet. Preview models can change or be removed without notice. Use it for evaluation and early testing, but keep production workloads on stable Gemini 2.5 models until Gemini 3.1 reaches general availability.
When does Gemini beat OpenAI or Claude on cost?
For high-volume simple tasks, Gemini 2.5 Flash-Lite ($0.10/M) undercuts GPT-4o mini ($0.15/M) and is dramatically cheaper than Claude Haiku 4.5 ($1.00/M). For document-heavy tasks where 1M context matters, Gemini avoids chunking overhead that adds latency and cost in RAG setups.
How does Gemini's long context change total system cost?
If your application needs to process documents longer than 128K tokens, GPT-4o requires chunking + retrieval (RAG), adding engineering complexity, embedding costs, and latency. Gemini's 1M context lets you skip that layer entirely, which can reduce total system cost even if per-token pricing is slightly higher.
Calculate Your Gemini API Costs
Compare Gemini 2.5 vs OpenAI vs Claude for your specific usage volume.
Open API Cost Calculator