Skip to content
Cost Optimization

OpenAI Batch API:
50% Cost Savings — How It Works in 2026

OpenAI's Batch API cuts costs by 50% across GPT-4o, GPT-4o mini, and o-series models. Here's exactly how it works, when to use it, and how much you can save.

7 min read·Updated April 2026
Batch API vs Standard API — Pricing
50%
discount vs standard
24h
max turnaround time
$1.25
GPT-4o input (vs $2.50)
$5.00
GPT-4o output (vs $10.00)

Batch API Pricing vs Standard API

ModelStandard InputBatch InputStandard OutputBatch Output
GPT-4o$2.50/M$1.25/M$10.00/M$5.00/M
GPT-4o mini$0.15/M$0.075/M$0.60/M$0.30/M
o3-mini$1.10/M$0.55/M$4.40/M$2.20/M
o3$10.00/M$5.00/M$40.00/M$20.00/M
text-embedding-3-large$0.13/M$0.065/M

How the Batch API Works

The Batch API processes requests asynchronously — you submit a batch of requests, OpenAI processes them within 24 hours, and you retrieve results when ready:

  1. Create a batch file: JSONL file with up to 50,000 requests
  2. Upload the file: via Files API
  3. Submit the batch: POST to /v1/batches
  4. Poll for completion: GET /v1/batches/{batch_id}
  5. Retrieve results: Download output JSONL file

Key constraints:

  • Maximum 24-hour turnaround (not guaranteed latency)
  • Up to 50,000 requests per batch
  • Enqueued token limits: 90,000 tokens/minute per model by default
  • No streaming — results available only after batch completes

Real-World Batch API Savings

Example: Content Moderation at Scale

Processing 100,000 user reviews per day for sentiment and safety:

  • Average: 200 tokens input + 50 tokens output per review
  • Daily tokens: 20M input + 5M output
  • Standard GPT-4o mini: 20M × $0.15 + 5M × $0.60 = $3.00 + $3.00 = $6.00/day
  • Batch GPT-4o mini: 20M × $0.075 + 5M × $0.30 = $1.50 + $1.50 = $3.00/day
  • Annual savings: $1,095

Example: Research Paper Analysis Pipeline

Processing 10,000 academic papers monthly with GPT-4o:

  • Average: 3,000 tokens input + 800 tokens output
  • Monthly tokens: 30M input + 8M output
  • Standard: 30M × $2.50 + 8M × $10.00 = $75 + $80 = $155/month
  • Batch: 30M × $1.25 + 8M × $5.00 = $37.50 + $40 = $77.50/month
  • Annual savings: $930

When to Use Batch vs Real-Time API

Use Batch APIUse Standard API
✅ Nightly data processing jobs⚡ User-facing chatbots (response in <5s)
✅ Document classification pipelines⚡ Real-time content generation
✅ Embedding generation for vector DBs⚡ Interactive code completion
✅ Research analysis (offline)⚡ Real-time translation
✅ Content moderation (async)⚡ Customer service with SLA
✅ SEO metadata generation⚡ Safety-critical real-time decisions

Combining Batch API with Prompt Caching

For maximum savings, combine Batch API (50% off) with prompt caching (up to 90% off on cached tokens):

  • Long system prompt (10K tokens) shared across all batch requests
  • After first request, system prompt cached at 50% discount
  • Effective cost on cached prompt tokens in batch: $0.0625/M (GPT-4o mini)
  • That's 96% cheaper than standard GPT-4o mini uncached input

Calculate Your Batch API Savings

Enter your monthly token volume and see how much you'd save with Batch API.

AI Cost Calculator