Back to Insights
2026-05-31 5 min read Tanuj Garg

Building the 'Cost Observability' Layer: Every AI Architecture Needs One in 2026

AI & Automation#Cost Observability#AI Infrastructure#FinOps#LLM#Monitoring

Introduction

You have application monitoring. You have infrastructure monitoring. You probably do not have cost monitoring—and for AI systems, that gap is expensive.

Traditional APM tracks latency, error rates, and throughput. None of those tell you that a single agent task burned $0.47 in tokens, that your free tier users consume 3x the AI budget of paid users, or that switching models last Tuesday doubled inference cost with no quality improvement.

Cost observability is the practice of tracking dollar-level spend per request, per feature, per model, and per customer—with the same granularity you apply to latency and error rates. In 2026, it is foundational infrastructure for any AI-native product.


Section 1: What Cost Observability Tracks

Per-request cost

Every AI API call logs:

{
  "request_id": "req_abc123",
  "model": "gpt-4o",
  "input_tokens": 1842,
  "output_tokens": 356,
  "input_cost_usd": 0.0046,
  "output_cost_usd": 0.0036,
  "total_cost_usd": 0.0082,
  "latency_ms": 1240,
  "cache_hit": false,
  "use_case": "support_bot"
}

Per-task cost (agents)

Agents make multiple model calls per user request. Roll up:

{
  "task_id": "task_xyz789",
  "total_model_calls": 4,
  "total_tokens": 12400,
  "total_cost_usd": 0.047,
  "tool_calls": 3,
  "duration_ms": 8500,
  "outcome": "completed"
}

Per-customer cost

Aggregate daily/monthly:

{
  "customer_id": "cust_456",
  "period": "2026-06",
  "total_requests": 2840,
  "total_cost_usd": 42.30,
  "cost_per_request_usd": 0.0149,
  "revenue_usd": 29.00,
  "margin_usd": -13.30
}

That last row is the one that should trigger an alert.


Section 2: Architecture

Middleware layer

Insert cost tracking between your application and model providers:

App → Cost Middleware → Model Provider
         ↓
    Cost Event Store
         ↓
    Aggregation + Dashboard

The middleware:

  • intercepts every model call,
  • calculates cost from token counts and model pricing,
  • attaches cost metadata to the request context,
  • emits events to your cost store.

Cost event store

Options by scale:

  • Structured logs (CloudWatch, Datadog): good for < 100K requests/day,
  • Time-series DB (TimescaleDB, InfluxDB): good for aggregation and alerting,
  • Dedicated cost platforms (Helicone, LangSmith): good for LLM-specific features.

Pricing table

Maintain a versioned pricing table:

{
  "gpt-4o": {
    "input_per_1k": 0.0025,
    "output_per_1k": 0.01,
    "effective_date": "2026-01-01"
  }
}

Update when providers change pricing. Historical costs must use the price at time of request.


Section 3: Dashboards and Alerts

Essential dashboards

  1. Daily spend trend (total + by model + by use case),
  2. Cost per request distribution (p50, p95, p99),
  3. Top 10 most expensive customers/tasks,
  4. Cache hit rate vs cost savings,
  5. Model routing effectiveness (cost vs quality by model tier).

Essential alerts

AlertThresholdAction
Daily spend exceeds budget> 100% of daily budgetPage on-call
Cost per request spike> 2x 7-day averageInvestigate model/routing change
Customer margin negativecost > revenue for tierReview usage patterns
Agent task cost spike> $0.50 per taskCheck for infinite loops

Section 4: Integrating with Existing Observability

Cost observability should live alongside your APM, not in a separate silo:

  • Attach cost_usd as a span attribute in distributed traces,
  • Include cost in your existing request dashboards,
  • Correlate cost spikes with deployment events and model changes,
  • Export cost metrics to the same alerting system (PagerDuty, Opsgenie).

When debugging a latency spike, you should also see whether cost spiked—often the same root cause (model change, context window bloat, agent loop).


Section 5: Implementation Priority

Week 1:

  • Log token counts and estimated cost on every model call,
  • Create a daily spend spreadsheet from logs.

Week 2:

  • Build per-request cost middleware,
  • Set daily budget alert.

Week 3:

  • Add per-customer aggregation,
  • Build cost per request dashboard.

Week 4:

  • Integrate with distributed tracing,
  • Add cost to CI/CD eval reports.

Conclusion

Cost observability is not a FinOps luxury—it is production infrastructure. You cannot optimize what you cannot measure, and you cannot measure AI costs with cloud billing alone.

Build the middleware layer this week. The first time it catches a runaway agent loop, it pays for itself.

Related reading:

For cost optimization: