How to Calculate AI API Costs
Before You Launch (2026 Guide)
Step-by-step framework for estimating AI API costs before building. Covers token math, usage assumptions, provider comparison, and the hidden cost factors that blow budgets. Last verified: 2026-04-01.
Step 1 — Understand Token Pricing
AI APIs charge separately for input tokens (your prompt, system context, conversation history) and output tokens (the model's response). Both are priced per 1 million tokens.
| Model | Input / 1M | Output / 1M | Note |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $0.10 | $0.40 | Cheapest stable production model |
| Mistral Small 3.2 | $0.10 | $0.30 | Tied cheapest; open weights |
| GPT-5.4 nano | $0.20 | $1.25 | Cheapest OpenAI model |
| Gemini 2.5 Flash | $0.30 | $2.50 | Reasoning-capable budget-mid |
| Claude Haiku 4.5 | $1.00 | $5.00 | Anthropic budget tier |
| GPT-5.4 | $2.50 | $15.00 | OpenAI premium |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Best coding/agentic |
Step 2 — Measure or Estimate Your Token Usage
How to count tokens
As a rough rule of thumb: 1 token ≈ 4 characters ≈ 0.75 words in English. A 500-word message is approximately 650 tokens.
- Short chatbot turn (50-word response): ~65 tokens output
- Medium analysis response (400 words): ~530 tokens output
- Long document summary (1,500 words): ~2,000 tokens output
- System prompt (200 words): ~265 tokens input per call
- 1-page PDF (~500 words): ~650 tokens input
- 10-page report (~5,000 words): ~6,500 tokens input
Input vs output ratio matters
Most applications have significantly more input than output. Common ratios:
| App Type | Typical Input | Typical Output | In:Out Ratio |
|---|---|---|---|
| Simple chatbot | 300–500 tokens/turn | 100–300 tokens/turn | 2:1 to 4:1 |
| RAG / document Q&A | 2,000–10,000 tokens | 200–500 tokens | 5:1 to 20:1 |
| Code generation | 500–2,000 tokens | 500–3,000 tokens | 1:1 to 1:3 |
| Summarization | 2,000–20,000 tokens | 200–500 tokens | 10:1 to 40:1 |
| AI agent (research) | 5,000–50,000 tokens | 1,000–5,000 tokens | 5:1 to 20:1 |
Step 3 — Run the Formula
Example: Customer support chatbot
- 50,000 conversations/month
- Average 5 turns per conversation
- Input per turn: 800 tokens (system prompt 300 + history + user message)
- Output per turn: 200 tokens
- Total calls: 50,000 × 5 = 250,000
- Total input: 250,000 × 800 = 200M tokens
- Total output: 250,000 × 200 = 50M tokens
| Model | Input Cost (200M) | Output Cost (50M) | Total/Month |
|---|---|---|---|
| Gemini 2.5 Flash-Lite | $20 | $20 | $40 |
| Mistral Small 3.2 | $20 | $15 | $35 |
| GPT-5.4 nano | $40 | $62.50 | $102.50 |
| Claude Haiku 4.5 | $200 | $250 | $450 |
| Claude Sonnet 4.6 | $600 | $750 | $1,350 |
Step 4 — Factor In Cost Reducers
Prompt caching (Anthropic)
If your system prompt is large and reused across many calls, Anthropic's prompt caching can save 80–90% on the cached portion. Cache read costs: Haiku $0.10/M, Sonnet $0.30/M, Opus $0.50/M.
Example: 300-token system prompt × 250,000 calls = 75M tokens. At standard Haiku input ($1.00/M) = $75. With caching at $0.10/M = $7.50. Saving: $67.50/month.
Batch API (50% off)
OpenAI and Anthropic both offer ~50% discounts for async batch processing. If your workload is not latency-sensitive, batch it:
- GPT-5.4 nano batch: $0.10/M input (vs $0.20/M standard)
- Claude Haiku 4.5 batch: $0.50/M input (vs $1.00/M standard)
- Claude Sonnet 4.6 batch: $1.50/M input (vs $3.00/M standard)
Model routing
Route simple requests to cheap models and complex requests to premium ones. Example: Use GPT-5.4 nano ($0.20/M) for classification → escalate to Claude Sonnet ($3.00/M) only for complex cases. If 80% of calls are simple, average cost drops by 70%+.
Step 5 — Identify Hidden Cost Factors
- Context accumulation: Multi-turn conversations grow input tokens with every turn. A 20-turn conversation doesn't cost 20× a single turn — it costs progressively more because each call includes the full history.
- System prompt size: A 500-token system prompt on 1M monthly calls costs $500/month on Sonnet ($3/M) just for the system prompt alone — before any user content.
- Retry logic: Failed requests that are retried double or triple the token cost for those requests. Cap retries and use exponential backoff.
- Output token sprawl: Without
max_tokensset, models can generate verbose responses. A 500-token output vs 200-token output is 2.5× the output cost. Always set explicit limits. - Agentic loops: AI agents calling tools in loops can consume 10–100× more tokens than equivalent single-turn requests. Budget separately for agent tasks.
- Long-context pricing tiers: Gemini 2.5 Pro has a price increase above 200k tokens. GPT-5.4 has a threshold around 270k tokens. Verify your p95 context size before committing.
Step 6 — Build Your Monthly Budget Estimate
Common Mistakes in AI API Cost Estimation
- Forgetting system prompt tokens: Every API call includes the system prompt. A 1,000-token system prompt × 1M calls = 1B extra input tokens
- Assuming input = output token ratio is 1:1: Most apps are heavily input-weighted; output-heavy apps like code generation can be output-weighted
- Not accounting for conversation history: RAG and chatbot contexts grow with each turn
- Using list price for batch-eligible workloads: Always check if your use case qualifies for Batch API
- Not testing with real prompts: Synthetic benchmarks don't reflect actual token counts from real user inputs
- Underestimating agent costs: Agent tasks with tool calls consume 10–100× more tokens than simple completions
Frequently Asked Questions
How do I count tokens before I build?
Use OpenAI's tiktoken library (Python) or Anthropic's token counting endpoint to count tokens in your prompt templates. Add your typical user input length based on the use case. Use the AI Token Counter tool above to estimate token counts from sample text.
How much should I budget for an MVP?
Most AI MVPs with <10,000 users/month cost $10–$200/month in API fees depending on model and usage patterns. Start with a budget-tier model (Gemini 2.5 Flash-Lite or Mistral Small 3.2) and upgrade based on quality requirements after launch.
How do I reduce costs after I've launched?
The most impactful levers: (1) implement prompt caching if on Anthropic, (2) use Batch API for non-latency-sensitive tasks, (3) implement model routing to send simple queries to cheaper models, (4) set explicit max_tokens limits, (5) compress conversation history by summarizing old turns.
What's a good cost target per user per month?
For SaaS AI products, $1–5/user/month in API costs is typical for light users. Heavy AI-native products may run $5–$20/user/month. Target 15–30% of your plan price as API cost to maintain healthy gross margins.
Calculate Your AI API Budget
Enter your estimated monthly requests and token counts to get exact cost projections across all providers.
Open AI API Cost Calculator