Skip to content
Cost Planning

How to Calculate AI API Costs
Before You Launch (2026 Guide)

Step-by-step framework for estimating AI API costs before building. Covers token math, usage assumptions, provider comparison, and the hidden cost factors that blow budgets. Last verified: 2026-04-01.

12 min read·Updated April 2026
The Formula
Monthly API Cost =
(Monthly Requests × Avg Input Tokens × Input Price / 1,000,000)
+ (Monthly Requests × Avg Output Tokens × Output Price / 1,000,000)

Step 1 — Understand Token Pricing

AI APIs charge separately for input tokens (your prompt, system context, conversation history) and output tokens (the model's response). Both are priced per 1 million tokens.

ModelInput / 1MOutput / 1MNote
Gemini 2.5 Flash-Lite$0.10$0.40Cheapest stable production model
Mistral Small 3.2$0.10$0.30Tied cheapest; open weights
GPT-5.4 nano$0.20$1.25Cheapest OpenAI model
Gemini 2.5 Flash$0.30$2.50Reasoning-capable budget-mid
Claude Haiku 4.5$1.00$5.00Anthropic budget tier
GPT-5.4$2.50$15.00OpenAI premium
Claude Sonnet 4.6$3.00$15.00Best coding/agentic

Step 2 — Measure or Estimate Your Token Usage

How to count tokens

As a rough rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words in English. A 500-word message is approximately 650 tokens.

  • Short chatbot turn (50-word response): ~65 tokens output
  • Medium analysis response (400 words): ~530 tokens output
  • Long document summary (1,500 words): ~2,000 tokens output
  • System prompt (200 words): ~265 tokens input per call
  • 1-page PDF (~500 words): ~650 tokens input
  • 10-page report (~5,000 words): ~6,500 tokens input

Input vs output ratio matters

Most applications have significantly more input than output. Common ratios:

App TypeTypical InputTypical OutputIn:Out Ratio
Simple chatbot300–500 tokens/turn100–300 tokens/turn2:1 to 4:1
RAG / document Q&A2,000–10,000 tokens200–500 tokens5:1 to 20:1
Code generation500–2,000 tokens500–3,000 tokens1:1 to 1:3
Summarization2,000–20,000 tokens200–500 tokens10:1 to 40:1
AI agent (research)5,000–50,000 tokens1,000–5,000 tokens5:1 to 20:1

Step 3 — Run the Formula

Example: Customer support chatbot

  • 50,000 conversations/month
  • Average 5 turns per conversation
  • Input per turn: 800 tokens (system prompt 300 + history + user message)
  • Output per turn: 200 tokens
  • Total calls: 50,000 × 5 = 250,000
  • Total input: 250,000 × 800 = 200M tokens
  • Total output: 250,000 × 200 = 50M tokens
ModelInput Cost (200M)Output Cost (50M)Total/Month
Gemini 2.5 Flash-Lite$20$20$40
Mistral Small 3.2$20$15$35
GPT-5.4 nano$40$62.50$102.50
Claude Haiku 4.5$200$250$450
Claude Sonnet 4.6$600$750$1,350

Step 4 — Factor In Cost Reducers

Prompt caching (Anthropic)

If your system prompt is large and reused across many calls, Anthropic's prompt caching can save 80–90% on the cached portion. Cache read costs: Haiku $0.10/M, Sonnet $0.30/M, Opus $0.50/M.

Example: 300-token system prompt × 250,000 calls = 75M tokens. At standard Haiku input ($1.00/M) = $75. With caching at $0.10/M = $7.50. Saving: $67.50/month.

Batch API (50% off)

OpenAI and Anthropic both offer ~50% discounts for async batch processing. If your workload is not latency-sensitive, batch it:

  • GPT-5.4 nano batch: $0.10/M input (vs $0.20/M standard)
  • Claude Haiku 4.5 batch: $0.50/M input (vs $1.00/M standard)
  • Claude Sonnet 4.6 batch: $1.50/M input (vs $3.00/M standard)

Model routing

Route simple requests to cheap models and complex requests to premium ones. Example: Use GPT-5.4 nano ($0.20/M) for classification → escalate to Claude Sonnet ($3.00/M) only for complex cases. If 80% of calls are simple, average cost drops by 70%+.

Step 5 — Identify Hidden Cost Factors

  • Context accumulation: Multi-turn conversations grow input tokens with every turn. A 20-turn conversation doesn't cost 20× a single turn — it costs progressively more because each call includes the full history.
  • System prompt size: A 500-token system prompt on 1M monthly calls costs $500/month on Sonnet ($3/M) just for the system prompt alone — before any user content.
  • Retry logic: Failed requests that are retried double or triple the token cost for those requests. Cap retries and use exponential backoff.
  • Output token sprawl: Without max_tokens set, models can generate verbose responses. A 500-token output vs 200-token output is 2.5× the output cost. Always set explicit limits.
  • Agentic loops: AI agents calling tools in loops can consume 10–100× more tokens than equivalent single-turn requests. Budget separately for agent tasks.
  • Long-context pricing tiers: Gemini 2.5 Pro has a price increase above 200k tokens. GPT-5.4 has a threshold around 270k tokens. Verify your p95 context size before committing.

Step 6 — Build Your Monthly Budget Estimate

Budget worksheet:
Monthly requests: _______ (Q)
Avg input tokens/request: _______ (I)
Avg output tokens/request: _______ (O)
Input price ($/1M): _______ (Pi)
Output price ($/1M): _______ (Po)
Monthly cost = Q × I × Pi / 1,000,000 + Q × O × Po / 1,000,000
Add 30–50% buffer for growth, retries, and context drift

Common Mistakes in AI API Cost Estimation

  1. Forgetting system prompt tokens: Every API call includes the system prompt. A 1,000-token system prompt × 1M calls = 1B extra input tokens
  2. Assuming input = output token ratio is 1:1: Most apps are heavily input-weighted; output-heavy apps like code generation can be output-weighted
  3. Not accounting for conversation history: RAG and chatbot contexts grow with each turn
  4. Using list price for batch-eligible workloads: Always check if your use case qualifies for Batch API
  5. Not testing with real prompts: Synthetic benchmarks don't reflect actual token counts from real user inputs
  6. Underestimating agent costs: Agent tasks with tool calls consume 10–100× more tokens than simple completions

Frequently Asked Questions

How do I count tokens before I build?

Use OpenAI's tiktoken library (Python) or Anthropic's token counting endpoint to count tokens in your prompt templates. Add your typical user input length based on the use case. Use the AI Token Counter tool above to estimate token counts from sample text.

How much should I budget for an MVP?

Most AI MVPs with <10,000 users/month cost $10–$200/month in API fees depending on model and usage patterns. Start with a budget-tier model (Gemini 2.5 Flash-Lite or Mistral Small 3.2) and upgrade based on quality requirements after launch.

How do I reduce costs after I've launched?

The most impactful levers: (1) implement prompt caching if on Anthropic, (2) use Batch API for non-latency-sensitive tasks, (3) implement model routing to send simple queries to cheaper models, (4) set explicit max_tokens limits, (5) compress conversation history by summarizing old turns.

What's a good cost target per user per month?

For SaaS AI products, $1–5/user/month in API costs is typical for light users. Heavy AI-native products may run $5–$20/user/month. Target 15–30% of your plan price as API cost to maintain healthy gross margins.

Calculate Your AI API Budget

Enter your estimated monthly requests and token counts to get exact cost projections across all providers.

Open AI API Cost Calculator