Skip to content
Unit Economics

AI COGS for SaaS 2026:
What Goes Into Cost of Goods Sold

AI SaaS cost of goods sold (COGS) is not just your LLM API bill. It includes vector DBs, hosting, support infrastructure, and model costs. Here's exactly what goes into AI COGS and how to target 70–80% gross margins. Last verified: 2026-04-01.

9 min read·Updated April 2026
AI SaaS COGS Target
<20%
Target COGS % of revenue
60–80%
LLM API share of COGS
5–15%
Vector DB share of COGS
80%+
Target gross margin

What Is COGS for an AI SaaS Product?

COGS (Cost of Goods Sold) in SaaS represents the direct costs of delivering your product to customers. For AI SaaS, this typically includes:

  • LLM API costs — what you pay Anthropic, OpenAI, or Google per token
  • Vector database — Pinecone, Weaviate, pgvector hosting for semantic search/RAG
  • Application hosting — servers, CDN, load balancers running your app
  • Customer-facing support infrastructure — support tooling, status page, monitoring
  • Embedding model costs — typically OpenAI embeddings at $0.02/M tokens
  • Third-party APIs — STT (Deepgram), TTS, search APIs if applicable

What is NOT in COGS: engineering salaries, marketing, sales, G&A. Those are operating expenses (OpEx).

COGS Breakdown by AI Product Type

Product typeLLM APIVector DBHostingOtherTotal COGS % revenue
AI writing tool (content gen)8%1%4%2%15%
AI customer support bot6%4%5%3%18%
AI coding copilot25%2%8%5%40%
AI voice agent (STT+LLM+TTS)15%2%5%18% (STT/TTS)40%
AI document processor4%3%5%2%14%
AI knowledge base / RAG tool10%12%6%2%30%

Coding copilots and voice agents have structurally higher COGS due to commoditized pricing and high infra costs. Document processors and content tools are the most margin-friendly AI product types.

Worked Example: $100K MRR AI Writing Tool

COGS line itemMonthly cost% of $100K MRR
LLM API (Claude Haiku 4.5, 10M output tokens)$5,0005.0%
LLM API (Claude Sonnet 4.6, 500K output tokens premium users)$7,5007.5%
Embeddings (OpenAI text-embedding-3-small, 100M tokens)$2000.2%
Vector DB (Pinecone Starter, 10M vectors)$700.1%
App hosting (Vercel + Railway)$8000.8%
Monitoring, logging, error tracking$3000.3%
Support tooling (Intercom)$5000.5%
Total COGS$14,37014.4%
Gross Margin$85,63085.6%

This is a realistic scenario for an AI writing tool routing premium users to Sonnet and the majority to Haiku. Gross margin of 85.6% is above the industry benchmark for AI-native SaaS.

COGS Warning: High-Compute Use Cases

ScenarioWhy COGS explodesCOGS % at riskFix
Flat-fee unlimited AI coding toolPower users run 1,000+ completions/day vs 50 for light users60–80%Hard daily limits or credit model
Agent-first product with no loop limit5-step agent can become 50-step if no max_iterations set100%+Always set max_iterations in agent code
Long-context document Q&A with no chunking100K-token document sent on every query = $0.30/query at Sonnet30–50%Use RAG to retrieve 3–5 relevant chunks instead
Chatbot with full conversation historyTurn 20 of a conversation has 10K+ accumulated input tokens20–40%Truncate to last 5 turns; summarize old turns

COGS Reduction Playbook

  1. Model tiering: Use Haiku for 80% of requests, Sonnet for premium users. At Haiku's $1/M vs Sonnet's $3/M, this alone cuts blended LLM cost by 60%.
  2. Prompt caching: System prompts cached at 90% discount. 2,000-token system prompt cached saves $0.0018/call on Haiku vs uncached.
  3. Batch API for async: All non-realtime processing (document enrichment, scheduled reports) should go through Batch API at 50% off.
  4. Output length control: Instruct models to be concise. Every token saved is money saved. "Respond in 3 bullet points max" vs "Respond thoroughly".
  5. Response caching: Cache LLM responses for identical or near-identical inputs (FAQ questions, common prompts). No API call needed for repeat questions.

Model Your AI COGS

Enter your monthly token volume and model mix to calculate exactly what your AI COGS will be.

AI API Cost Calculator