AI COGS for SaaS 2026: Cost of Goods Sold Breakdown for AI Products

What Is COGS for an AI SaaS Product?

COGS (Cost of Goods Sold) in SaaS represents the direct costs of delivering your product to customers. For AI SaaS, this typically includes:

LLM API costs — what you pay Anthropic, OpenAI, or Google per token
Vector database — Pinecone, Weaviate, pgvector hosting for semantic search/RAG
Application hosting — servers, CDN, load balancers running your app
Customer-facing support infrastructure — support tooling, status page, monitoring
Embedding model costs — typically OpenAI embeddings at $0.02/M tokens
Third-party APIs — STT (Deepgram), TTS, search APIs if applicable

What is NOT in COGS: engineering salaries, marketing, sales, G&A. Those are operating expenses (OpEx).

COGS Breakdown by AI Product Type

Product type	LLM API	Vector DB	Hosting	Other	Total COGS % revenue
AI writing tool (content gen)	8%	1%	4%	2%	15%
AI customer support bot	6%	4%	5%	3%	18%
AI coding copilot	25%	2%	8%	5%	40%
AI voice agent (STT+LLM+TTS)	15%	2%	5%	18% (STT/TTS)	40%
AI document processor	4%	3%	5%	2%	14%
AI knowledge base / RAG tool	10%	12%	6%	2%	30%

Coding copilots and voice agents have structurally higher COGS due to commoditized pricing and high infra costs. Document processors and content tools are the most margin-friendly AI product types.

Worked Example: $100K MRR AI Writing Tool

COGS line item	Monthly cost	% of $100K MRR
LLM API (Claude Haiku 4.5, 10M output tokens)	$5,000	5.0%
LLM API (Claude Sonnet 4.6, 500K output tokens premium users)	$7,500	7.5%
Embeddings (OpenAI text-embedding-3-small, 100M tokens)	$200	0.2%
Vector DB (Pinecone Starter, 10M vectors)	$70	0.1%
App hosting (Vercel + Railway)	$800	0.8%
Monitoring, logging, error tracking	$300	0.3%
Support tooling (Intercom)	$500	0.5%
Total COGS	$14,370	14.4%
Gross Margin	$85,630	85.6%

This is a realistic scenario for an AI writing tool routing premium users to Sonnet and the majority to Haiku. Gross margin of 85.6% is above the industry benchmark for AI-native SaaS.

COGS Warning: High-Compute Use Cases

Scenario	Why COGS explodes	COGS % at risk	Fix
Flat-fee unlimited AI coding tool	Power users run 1,000+ completions/day vs 50 for light users	60–80%	Hard daily limits or credit model
Agent-first product with no loop limit	5-step agent can become 50-step if no max_iterations set	100%+	Always set max_iterations in agent code
Long-context document Q&A with no chunking	100K-token document sent on every query = $0.30/query at Sonnet	30–50%	Use RAG to retrieve 3–5 relevant chunks instead
Chatbot with full conversation history	Turn 20 of a conversation has 10K+ accumulated input tokens	20–40%	Truncate to last 5 turns; summarize old turns

COGS Reduction Playbook

Model tiering: Use Haiku for 80% of requests, Sonnet for premium users. At Haiku's $1/M vs Sonnet's $3/M, this alone cuts blended LLM cost by 60%.
Prompt caching: System prompts cached at 90% discount. 2,000-token system prompt cached saves $0.0018/call on Haiku vs uncached.
Batch API for async: All non-realtime processing (document enrichment, scheduled reports) should go through Batch API at 50% off.
Output length control: Instruct models to be concise. Every token saved is money saved. "Respond in 3 bullet points max" vs "Respond thoroughly".
Response caching: Cache LLM responses for identical or near-identical inputs (FAQ questions, common prompts). No API call needed for repeat questions.

AI COGS for SaaS 2026:
What Goes Into Cost of Goods Sold

What Is COGS for an AI SaaS Product?

COGS Breakdown by AI Product Type

Worked Example: $100K MRR AI Writing Tool

COGS Warning: High-Compute Use Cases

COGS Reduction Playbook

Model Your AI COGS

AI COGS for SaaS 2026:What Goes Into Cost of Goods Sold

What Is COGS for an AI SaaS Product?

COGS Breakdown by AI Product Type

Worked Example: $100K MRR AI Writing Tool

COGS Warning: High-Compute Use Cases

COGS Reduction Playbook

Model Your AI COGS

AI COGS for SaaS 2026:
What Goes Into Cost of Goods Sold