Gemini API Pricing 2026 | Gemini 2.5 Flash, Flash-Lite, Pro & 3.1 Preview

Google's Gemini pricing landscape has shifted significantly in 2026. For new production work, the recommended models are Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, and Gemini 2.5 Pro. Gemini 3.1 Pro Preview is also available for early access, clearly labeled as a preview model.

Production Models — Gemini 2.5 Family

Model	Input / 1M tokens	Output / 1M tokens	Context	Notes
Gemini 2.5 Flash	$0.30	$2.50	1M tokens	High-volume reasoning-capable default
Gemini 2.5 Flash-Lite	$0.10	$0.40	1M tokens	Cheapest stable production option
Gemini 2.5 Pro	$1.25 / $2.50*	$10.00 / $15.00*	1M tokens	Strongest 2.5-series; *tiered above 200K tokens

Preview Models — Gemini 3.1

Preview model notice: Gemini 3.1 Pro Preview is available for early testing but is not a stable production release. Pricing and availability may change. Do not build critical production dependencies on preview models.

Model	Input / 1M (≤200K ctx)	Input / 1M (>200K ctx)	Output / 1M (≤200K)	Output / 1M (>200K)
Gemini 3.1 Pro Preview	$2.00	$4.00	$12.00	$18.00

Legacy / Deprecated — Gemini 2.0 Flash

⚠ Gemini 2.0 Flash is deprecated. Google has announced shutdown on 2026-06-01. If you have existing integrations built on Gemini 2.0 Flash, migrate to Gemini 2.5 Flash or Gemini 2.5 Flash-Lite before that date. Do not start new projects on Gemini 2.0 Flash.

Model	Input / 1M	Output / 1M	Status
Gemini 2.0 Flash	$0.10	$0.40	Deprecated — shutdown 2026-06-01
Gemini 2.0 Flash-Lite	$0.075	$0.30	Deprecated — migrate to 2.5 Flash-Lite

Gemini's Key Advantage: 1 Million Token Context

Every Gemini 2.5 model includes a 1 million token context window — 8× larger than GPT-4o (128K) and 5× larger than Claude Haiku 4.5 (200K). This makes Gemini the natural choice for:

Full codebase analysis without chunking
Processing entire legal or financial documents in one request
Long-form video or audio transcription analysis
Large dataset inspection without a separate retrieval layer

Gemini Free Tier

Google AI Studio offers a free tier for development and prototyping:

Gemini 2.5 Flash: 15 requests/minute, 1,500 requests/day
Gemini 2.5 Pro: 5 requests/minute, 25 requests/day

For early-stage projects and MVPs, Gemini can be effectively free until you hit production traffic levels.

Gemini 2.5 Flash vs GPT-4o mini vs Claude Haiku 4.5

Monthly volume	Gemini 2.5 Flash	Gemini 2.5 Flash-Lite	GPT-4o mini	Claude Haiku 4.5
10M tokens	$3	$1	$1.50	$10
100M tokens	$30	$10	$15	$100
1B tokens	$300	$100	$150	$1,000

When to Choose Gemini API

Long document processing — 1M context window without extra cost
Lowest-cost production option — Gemini 2.5 Flash-Lite is among the cheapest stable APIs
Multimodal applications — Native image, video, and audio support in all 2.5 models
Google Cloud ecosystem — Native Vertex AI, BigQuery, and Google Workspace integration
Free-tier prototyping — Most generous free tier of the major providers

How to Access Gemini API

Obtain an API key via Google AI Studio (aistudio.google.com) for free-tier usage. For production with SLAs, deploy through Google Cloud Vertex AI. No approval process required for AI Studio API keys.

Frequently Asked Questions

Is Gemini 2.0 Flash still safe to start new projects with?

No. Gemini 2.0 Flash is deprecated with a shutdown date of 2026-06-01. Start new projects on Gemini 2.5 Flash or Gemini 2.5 Flash-Lite instead — both offer comparable or better pricing and are actively maintained.

Which Gemini model is cheapest for production?

Gemini 2.5 Flash-Lite at $0.10/M input is the cheapest stable production option. For tasks needing more reasoning capability, Gemini 2.5 Flash at $0.30/M is still highly competitive.

Should I use Gemini 3.1 Pro Preview in production?

Not yet. Preview models can change or be removed without notice. Use it for evaluation and early testing, but keep production workloads on stable Gemini 2.5 models until Gemini 3.1 reaches general availability.

When does Gemini beat OpenAI or Claude on cost?

For high-volume simple tasks, Gemini 2.5 Flash-Lite ($0.10/M) undercuts GPT-4o mini ($0.15/M) and is dramatically cheaper than Claude Haiku 4.5 ($1.00/M). For document-heavy tasks where 1M context matters, Gemini avoids chunking overhead that adds latency and cost in RAG setups.

How does Gemini's long context change total system cost?

If your application needs to process documents longer than 128K tokens, GPT-4o requires chunking + retrieval (RAG), adding engineering complexity, embedding costs, and latency. Gemini's 1M context lets you skip that layer entirely, which can reduce total system cost even if per-token pricing is slightly higher.

Gemini API Pricing 2026:
Gemini 2.5 Flash, Flash-Lite, Pro & 3.1 Preview

Production Models — Gemini 2.5 Family

Preview Models — Gemini 3.1

Legacy / Deprecated — Gemini 2.0 Flash

Gemini's Key Advantage: 1 Million Token Context

Gemini Free Tier

Gemini 2.5 Flash vs GPT-4o mini vs Claude Haiku 4.5

When to Choose Gemini API

How to Access Gemini API

Frequently Asked Questions

Is Gemini 2.0 Flash still safe to start new projects with?

Which Gemini model is cheapest for production?

Should I use Gemini 3.1 Pro Preview in production?

When does Gemini beat OpenAI or Claude on cost?

How does Gemini's long context change total system cost?

Calculate Your Gemini API Costs

Gemini API Pricing 2026:Gemini 2.5 Flash, Flash-Lite, Pro & 3.1 Preview

Production Models — Gemini 2.5 Family

Preview Models — Gemini 3.1

Legacy / Deprecated — Gemini 2.0 Flash

Gemini's Key Advantage: 1 Million Token Context

Gemini Free Tier

Gemini 2.5 Flash vs GPT-4o mini vs Claude Haiku 4.5

When to Choose Gemini API

How to Access Gemini API

Frequently Asked Questions

Is Gemini 2.0 Flash still safe to start new projects with?

Which Gemini model is cheapest for production?

Should I use Gemini 3.1 Pro Preview in production?

When does Gemini beat OpenAI or Claude on cost?

How does Gemini's long context change total system cost?

Calculate Your Gemini API Costs

Gemini API Pricing 2026:
Gemini 2.5 Flash, Flash-Lite, Pro & 3.1 Preview