Back to Insights
2026-05-26 4 min read Tanuj Garg

Unit Economics for AI: Calculating Cost Per Token, Per Inference, and Per Customer

Cloud & DevOps#Unit Economics#AI Cost#FinOps#LLM#SaaS Metrics

Introduction

Cloud bills tell you what you spent. Unit economics tell you whether what you spent was worth it.

For traditional SaaS, the formula is simple: infrastructure cost per customer per month. For AI-native products, the formula has more variables: tokens per request, model tier, cache hit rate, agent loop depth, and embedding costs—all of which vary per user, per feature, and per interaction.

49% of organizations now use unit economics to connect technology spend to business outcomes. For AI workloads, this is not optional—it is how you avoid building a product that loses money on every user.


Section 1: The Three Unit Economics Metrics for AI

Cost per token

The atomic unit of LLM spend:

cost_per_token = (input_tokens × input_price) + (output_tokens × output_price)

Track this per model, per use case. A support bot averaging 3,000 tokens per response at GPT-4o pricing costs ~$0.03–$0.05 per response. At 10,000 responses/day, that is $300–$500/day.

Cost per inference

One inference = one model call (which may include multiple tokens):

cost_per_inference = total_tokens × cost_per_token + embedding_cost + reranking_cost

For RAG systems, add retrieval costs:

cost_per_rag_query = embedding_cost + vector_search_cost + llm_inference_cost + reranking_cost

Cost per customer

The metric finance cares about:

cost_per_customer_month = (total_ai_spend + total_infra_spend) / active_customers

Compare against revenue per customer. If cost per customer exceeds gross margin, you have a unit economics problem—not a scaling problem.


Section 2: Building the Attribution Pipeline

You need to trace every dollar from API call to business unit:

Instrumentation layer

Log on every AI request:

  • request_id,
  • customer_id or tenant_id,
  • model_id,
  • input_tokens, output_tokens,
  • use_case (support, search, generation, classification),
  • cache_hit (boolean),
  • latency_ms,
  • estimated_cost_usd.

Aggregation layer

Roll up daily:

  • cost per customer per day,
  • cost per use case per day,
  • cost per model per day,
  • cache hit rate per use case.

Reporting layer

Weekly dashboard:

Customer tierAvg cost/customer/monthRevenue/customer/monthMargin
Free$0.45$0-$0.45
Pro$2.10$29$26.90
Enterprise$18.50$500$481.50

This table tells you whether your free tier is subsidized acceptably and whether enterprise margins hold at scale.


Section 3: The Agent Loop Multiplier

Agents are the unit economics wildcard. A single user request can trigger:

  • 1 planning call (2,000 tokens),
  • 3 tool invocations (1,500 tokens each),
  • 1 synthesis call (3,000 tokens),
  • 1 retry after a tool failure (5,000 tokens).

Total: 15,500 tokens for one "simple" user request. At GPT-4o pricing, that is $0.15–$0.25 per agent task vs $0.03 for a single-shot completion.

Track cost per agent task, not just cost per API call. Set token budgets per task and per session.


Section 4: Optimizing Unit Economics

Once you can measure, you can optimize:

LeverTypical savingsTradeoff
Semantic caching50–70% on repeated queriesStale responses if invalidation is poor
Model routing60–80% on simple tasksQuality drop if routing is too aggressive
Prompt compression20–40% on token countMay lose context for complex tasks
Batch processing30–50% on non-real-time workloadsIncreased latency
Context window management15–30% on long conversationsRequires summarization logic

Optimize in order of impact: caching first, then model routing, then prompt engineering.


Section 5: When to Worry

Red flags in your unit economics:

  • Cost per customer growing faster than revenue per customer (margin compression),
  • Free tier cost exceeding 10% of total AI spend (unsustainable subsidy),
  • Agent tasks costing 5x+ single-shot completions (loop control problem),
  • No correlation between AI spend and retention/engagement (feature may not justify cost).

Conclusion

Unit economics for AI is not a finance exercise—it is an engineering discipline. Instrument cost per token today, aggregate to cost per customer this week, and review margins monthly.

If you cannot tell your CFO what it costs to serve one customer, you are not ready to scale AI features.

Related reading:

For cost optimization: