What Is the Batch API?
50% Off AI Inference for Async Workloads Explained
The Batch API is a pricing tier offered by Anthropic and OpenAI that cuts AI inference costs by 50% for asynchronous, non-real-time jobs. Jobs complete within 24 hours. This guide explains how it works, when to use it, and how to implement it. Last verified: 2026-04-01.
How the Batch API Works
Instead of sending API requests one at a time and waiting for each response (real-time inference), the Batch API lets you:
- Submit a JSONL file containing many requests in a single job
- Receive a job ID immediately
- Poll for completion (or use a webhook) — jobs complete within 24 hours
- Download results as a JSONL file when complete
The provider can process your batch during off-peak hours, which is why they pass 50% of the savings to you. The trade-off is latency: you don't get responses instantly.
Batch vs Real-Time Pricing: All Major Models
| Model | Standard input/1M | Batch input/1M | Standard output/1M | Batch output/1M |
|---|---|---|---|---|
| Claude Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 |
| Claude Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 |
| Claude Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 |
| GPT-5.4 nano | $0.20 | $0.10 | $1.25 | $0.625 |
| GPT-5.4 mini | $0.75 | $0.375 | $4.50 | $2.25 |
| GPT-5.4 | $2.50 | $1.25 | $15.00 | $7.50 |
Gemini 2.5 models do not currently offer a Batch API equivalent — their standard pricing is already lower than batch pricing on comparable OpenAI/Anthropic models for many use cases.
When to Use the Batch API
| Use case | Good for batch? | Why |
|---|---|---|
| Nightly document enrichment | Yes | All documents processed overnight, results ready by morning |
| Bulk content generation | Yes | Product descriptions, email sequences, article outlines — no user waiting |
| Training data labeling | Yes | Thousands of samples can be labeled in a single batch job |
| Contract/document analysis | Yes | Upload 1,000 contracts, get structured analysis by end of day |
| Embeddings at scale | Yes | Large corpus indexing can run overnight in batch |
| Real-time chatbot responses | No | Users expect sub-second responses — batch doesn't work for interactive UX |
| Live customer support | No | Tickets need resolution in minutes, not hours |
| On-demand code completion | No | Latency requirement is <200ms — incompatible with batch model |
Batch API Implementation (Anthropic)
Anthropic's Message Batches API accepts up to 10,000 requests per batch:
Stacking Batch API with Prompt Caching
Batch API and prompt caching stack multiplicatively for maximum savings:
- Standard Haiku input: $1.00/M
- Batch API only: $0.50/M (50% off)
- Prompt caching only: $0.10/M for cache reads (90% off)
- Batch + cache reads: $0.05/M — 95% off standard rate
Example: 100K document summarizations with a 2,000-token shared system prompt. With batch processing and prompt caching on the system prompt, your blended input cost can approach $0.05–$0.10/M — the cheapest possible rate for any Anthropic model.
Batch API Limits
- Anthropic: 10,000 requests per batch, 256MB JSONL file limit
- OpenAI: Up to 50,000 requests per batch, 200MB file limit
- Completion time: Results guaranteed within 24 hours (usually much faster)
- Rate limits: Batch jobs share the same token-per-minute limits as real-time — large batches may take longer if you're rate-limited
- Partial results: Both providers return results for completed requests even if the batch partially fails
Calculate Batch API Savings
Enter your monthly token volume to see exactly how much you'd save with Batch API vs real-time inference.
AI API Cost Calculator