Skip to content
API Pricing

Gemini API Pricing 2026:
Gemini 2.5 Flash, Flash-Lite, Pro & 3.1 Preview

Current Google Gemini API pricing for production models: Gemini 2.5 Flash, 2.5 Flash-Lite, and 2.5 Pro. Includes Gemini 3.1 Pro Preview details and deprecation notes for Gemini 2.0 Flash. Last verified: 2026-04-01.

11 min read·Updated April 2026
Gemini API Cost at a Glance
$0.10
2.5 Flash-Lite input / 1M
$0.30
2.5 Flash input / 1M
1M tokens
Context window
FREE
Free tier available

Google's Gemini pricing landscape has shifted significantly in 2026. For new production work, the recommended models are Gemini 2.5 Flash, Gemini 2.5 Flash-Lite, and Gemini 2.5 Pro. Gemini 3.1 Pro Preview is also available for early access, clearly labeled as a preview model.

Production Models — Gemini 2.5 Family

ModelInput / 1M tokensOutput / 1M tokensContextNotes
Gemini 2.5 Flash$0.30$2.501M tokensHigh-volume reasoning-capable default
Gemini 2.5 Flash-Lite$0.10$0.401M tokensCheapest stable production option
Gemini 2.5 Pro$1.25 / $2.50*$10.00 / $15.00*1M tokensStrongest 2.5-series; *tiered above 200K tokens

Preview Models — Gemini 3.1

Preview model notice: Gemini 3.1 Pro Preview is available for early testing but is not a stable production release. Pricing and availability may change. Do not build critical production dependencies on preview models.
ModelInput / 1M (≤200K ctx)Input / 1M (>200K ctx)Output / 1M (≤200K)Output / 1M (>200K)
Gemini 3.1 Pro Preview$2.00$4.00$12.00$18.00

Legacy / Deprecated — Gemini 2.0 Flash

⚠ Gemini 2.0 Flash is deprecated. Google has announced shutdown on 2026-06-01. If you have existing integrations built on Gemini 2.0 Flash, migrate to Gemini 2.5 Flash or Gemini 2.5 Flash-Lite before that date. Do not start new projects on Gemini 2.0 Flash.
ModelInput / 1MOutput / 1MStatus
Gemini 2.0 Flash$0.10$0.40Deprecated — shutdown 2026-06-01
Gemini 2.0 Flash-Lite$0.075$0.30Deprecated — migrate to 2.5 Flash-Lite

Gemini's Key Advantage: 1 Million Token Context

Every Gemini 2.5 model includes a 1 million token context window — 8× larger than GPT-4o (128K) and 5× larger than Claude Haiku 4.5 (200K). This makes Gemini the natural choice for:

  • Full codebase analysis without chunking
  • Processing entire legal or financial documents in one request
  • Long-form video or audio transcription analysis
  • Large dataset inspection without a separate retrieval layer

Gemini Free Tier

Google AI Studio offers a free tier for development and prototyping:

  • Gemini 2.5 Flash: 15 requests/minute, 1,500 requests/day
  • Gemini 2.5 Pro: 5 requests/minute, 25 requests/day

For early-stage projects and MVPs, Gemini can be effectively free until you hit production traffic levels.

Gemini 2.5 Flash vs GPT-4o mini vs Claude Haiku 4.5

Monthly volumeGemini 2.5 FlashGemini 2.5 Flash-LiteGPT-4o miniClaude Haiku 4.5
10M tokens$3$1$1.50$10
100M tokens$30$10$15$100
1B tokens$300$100$150$1,000

When to Choose Gemini API

  • Long document processing — 1M context window without extra cost
  • Lowest-cost production option — Gemini 2.5 Flash-Lite is among the cheapest stable APIs
  • Multimodal applications — Native image, video, and audio support in all 2.5 models
  • Google Cloud ecosystem — Native Vertex AI, BigQuery, and Google Workspace integration
  • Free-tier prototyping — Most generous free tier of the major providers

How to Access Gemini API

Obtain an API key via Google AI Studio (aistudio.google.com) for free-tier usage. For production with SLAs, deploy through Google Cloud Vertex AI. No approval process required for AI Studio API keys.

Frequently Asked Questions

Is Gemini 2.0 Flash still safe to start new projects with?

No. Gemini 2.0 Flash is deprecated with a shutdown date of 2026-06-01. Start new projects on Gemini 2.5 Flash or Gemini 2.5 Flash-Lite instead — both offer comparable or better pricing and are actively maintained.

Which Gemini model is cheapest for production?

Gemini 2.5 Flash-Lite at $0.10/M input is the cheapest stable production option. For tasks needing more reasoning capability, Gemini 2.5 Flash at $0.30/M is still highly competitive.

Should I use Gemini 3.1 Pro Preview in production?

Not yet. Preview models can change or be removed without notice. Use it for evaluation and early testing, but keep production workloads on stable Gemini 2.5 models until Gemini 3.1 reaches general availability.

When does Gemini beat OpenAI or Claude on cost?

For high-volume simple tasks, Gemini 2.5 Flash-Lite ($0.10/M) undercuts GPT-4o mini ($0.15/M) and is dramatically cheaper than Claude Haiku 4.5 ($1.00/M). For document-heavy tasks where 1M context matters, Gemini avoids chunking overhead that adds latency and cost in RAG setups.

How does Gemini's long context change total system cost?

If your application needs to process documents longer than 128K tokens, GPT-4o requires chunking + retrieval (RAG), adding engineering complexity, embedding costs, and latency. Gemini's 1M context lets you skip that layer entirely, which can reduce total system cost even if per-token pricing is slightly higher.

Calculate Your Gemini API Costs

Compare Gemini 2.5 vs OpenAI vs Claude for your specific usage volume.

Open API Cost Calculator