Skip to content
Architecture Cost

AI RAG System Cost 2026:
Embeddings, Vector DBs, and LLM Retrieval Pricing

Complete cost breakdown for Retrieval-Augmented Generation (RAG) systems in 2026 — embedding API costs, vector database pricing (Pinecone, Weaviate, pgvector), retrieval costs, and full stack monthly budgets.

13 min read·Updated March 2026
RAG Cost Components
$0.02/M
OpenAI embedding tokens
$70–$700/mo
Pinecone vector DB
$0
pgvector (self-hosted)
$50–$500/mo
Full RAG system (typical)

What Makes Up a RAG System's Cost?

A RAG pipeline has 4 cost components:

  1. Embedding (ingestion): Converting documents to vectors — one-time + incremental
  2. Vector database: Storing and searching vectors — monthly recurring
  3. Retrieval (query embeddings): Embedding each user query — per-query cost
  4. LLM generation: Generating the final answer with retrieved context — per-query cost

Embedding API Pricing (2026)

ModelPrice per 1M tokensDimensionsNotes
OpenAI text-embedding-3-small$0.0201,536Best price/performance ratio
OpenAI text-embedding-3-large$0.1303,072Higher quality for complex search
Google text-embedding-004$0.000768Free via Vertex AI (limits apply)
Cohere embed-english-v3$0.1001,024Optimized for semantic search
Voyage AI voyage-3$0.0601,024Strong multilingual support
BAAI/bge-large (self-hosted)$0.0001,024Free, GPU/CPU inference required

Vector Database Pricing (2026)

DatabaseFree TierStarterGrowthVectors (free tier)
PineconeYes$70/mo$700+/mo1 index, 100K vectors
Weaviate CloudYes$25/moCustom1M vectors (sandbox)
Qdrant CloudYes$9/mo$25+/mo1GB storage free
Chroma (self-hosted)Free$0$0Unlimited (own infrastructure)
pgvector (PostgreSQL)Free$0$0Unlimited (own infrastructure)
Supabase pgvectorYes$25/mo$599+/mo500MB storage free

Full RAG System Monthly Cost Examples

Small RAG app (10,000 queries/month, 50K documents indexed)

  • Embeddings (initial): 50K docs × 500 tokens = 25M tokens × $0.020 = $0.50 one-time
  • Query embeddings: 10K × 200 tokens = 2M tokens × $0.020 = $0.04/month
  • Vector DB: Qdrant free tier = $0/month
  • LLM generation (GPT-4o mini, 1K tokens/query): 10M tokens = $1.50/month
  • Total: ~$2/month

Medium RAG app (100K queries/month, 1M documents)

  • Embeddings (initial): 1M × 500 tokens = 500M tokens × $0.020 = $10 one-time
  • Query embeddings: 100K × 200 tokens = 20M tokens = $0.40/month
  • Vector DB: Pinecone Starter = $70/month
  • LLM generation (GPT-4o mini): 100M tokens = $15/month
  • Total: ~$85/month

Enterprise RAG (1M queries/month, 10M documents)

  • Vector DB: Pinecone or Weaviate dedicated = $500–$2,000/month
  • LLM (GPT-4o mini at scale): ~$150/month
  • Embeddings refresh: ~$20/month (incremental)
  • Total: $670–$2,170/month

Cost Optimization for RAG Systems

  • Use pgvector instead of Pinecone — saves $70–$700/month for most scales
  • Cache frequent queries — 30–50% of queries are often identical, cache vector + LLM results
  • Reduce retrieved chunks — fetch top-3 instead of top-10 reduces input tokens 70%
  • Use small embedding models — text-embedding-3-small vs large saves 85% with minimal quality loss
  • Re-rank with a cheap model — use Cohere Rerank or cross-encoder to trim context before expensive LLM call

Calculate Your RAG System Costs

Estimate total monthly costs for your document count and query volume.

AI Cost Calculator