Transactional AI Agents: Patterns for Database Consistency and Safe Rollbacks

Introduction

Read-only AI agents are relatively safe. They query data, summarize results, and return answers. The risks are low because they don't modify state.

Write-capable AI agents are a different story.

When an AI agent creates a user account, updates a payment status, or provisions infrastructure, it's modifying state in your database. If the agent makes a mistake, gets interrupted mid-operation, or encounters a partial failure, you're left with inconsistent data.

Traditional software handles this with transactions, ACID guarantees, and compensation logic. AI agents need the same—but adapted for their probabilistic, multi-step nature.

Section 1: The Consistency Problem with Agent Writes

AI agents fail differently than traditional software, and these failure modes create unique consistency challenges:

Partial completion

An agent tasked with "create user, assign role, send welcome email" might succeed at creating the user, fail at assigning the role, and never reach the email step. The user exists in an inconsistent state—created but not properly configured.

Cascading writes

Agents often perform multiple database operations in sequence. If operation 3 fails, operations 1 and 2 have already committed. Without rollback logic, the database is left partially updated.

Hallucinated operations

An agent might "think" it already created a record and skip the creation step, or "think" a record doesn't exist and create a duplicate. The agent's internal state doesn't always match the actual database state.

Retry storms

If an agent encounters an error and retries, it might double-execute write operations that aren't idempotent. Two duplicate records, two charges, two emails.

Section 2: Pattern 1 — Database Transactions for Atomic Operations

The first line of defense: wrap related database operations in a transaction. If any operation fails, the entire transaction rolls back.

async function agentCreateUserWithRole(
  db: Database,
  userData: UserData,
  role: string
) {
  const result = await db.transaction(async (tx) => {
    // All operations inside this transaction succeed or fail together
    const user = await tx.users.create(userData);
    await tx.userRoles.create({ userId: user.id, role });
    await tx.auditLog.create({
      action: "user_created",
      userId: user.id,
      agentRunId: currentRun.id
    });
    return user;
  });

  // Only reached if transaction commits successfully
  await sendWelcomeEmail(result);
  return result;
}

Limitations of transactions

Transactions only cover database operations. If your agent also calls external APIs (send email, call webhook, provision cloud resource), those aren't rolled back by a database transaction. For those, you need compensation logic (see Section 4).

Also, long-running agent tasks may hold transactions open too long, causing lock contention. Keep transactions short—prefer executing the transaction at the end of a planning phase rather than holding it open during agent reasoning.

Section 3: Pattern 2 — Idempotency Keys for Safe Retries

Agents retry operations when they encounter errors. Without idempotency, retries cause duplicate writes.

Every agent-driven write should include an idempotency key that uniquely identifies the operation:

async function agentCreatePayment(db: Database, amount: number, runId: string) {
  const idempotencyKey = `agent-run-${runId}-payment`;

  // Check if this operation was already completed
  const existing = await db.payments.findByUnique({
    idempotencyKey
  });

  if (existing) {
    return existing; // Already done, return existing record
  }

  // Execute the operation
  return await db.payments.create({
    amount,
    idempotencyKey,
    createdBy: "agent",
    agentRunId: runId
  });
}

The idempotency key should be deterministic: given the same task and run ID, the same key is generated. This ensures that retries don't create duplicates.

Section 4: Pattern 3 — Compensation Transactions (Sagas)

For multi-step operations that span databases and external services, use the Saga pattern. Each step has a corresponding compensation action that undoes it.

Step 1: Create user → Compensation: Delete user
Step 2: Assign role → Compensation: Remove role
Step 3: Send email → Compensation: Log rollback (emails can't be unsent)

async function createUserSaga(db: Database, userData: UserData) {
  const completedSteps: CompletedStep[] = [];

  try {
    // Step 1
    const user = await db.users.create(userData);
    completedSteps.push({ type: "create_user", data: user });

    // Step 2
    await db.userRoles.create({ userId: user.id, role: "member" });
    completedSteps.push({ type: "assign_role", data: { userId: user.id } });

    // Step 3 (external)
    await sendWelcomeEmail(user);
    completedSteps.push({ type: "send_email", data: { userId: user.id } });

    return user;
  } catch (error) {
    // Compensation: undo completed steps in reverse order
    await compensate(completedSteps.reverse());
    throw error;
  }
}

async function compensate(steps: CompletedStep[]) {
  for (const step of steps) {
    try {
      switch (step.type) {
        case "create_user":
          await db.users.delete(step.data.id);
          break;
        case "assign_role":
          await db.userRoles.delete({ userId: step.data.userId });
          break;
        case "send_email":
          // Can't unsend email, but log the rollback
          await db.auditLog.create({
            action: "email_rollback_logged",
            userId: step.data.userId
          });
          break;
      }
    } catch (compError) {
      // Log but continue compensating remaining steps
      console.error("Compensation failed:", compError);
    }
  }
}

Section 5: Pattern 4 — Optimistic Locking for Concurrent Agents

If multiple agents (or an agent and a human user) might modify the same record concurrently, you need optimistic locking to prevent lost updates.

Add a version column to records that agents modify:

// Schema
{
  id: "uuid",
  data: "jsonb",
  version: "integer", // Incremented on each update
}

When the agent reads a record, it captures the version. When it writes, it includes a version check:

async function agentUpdateRecord(db: Database, id: string, newData: any) {
  // Read with current version
  const record = await db.records.findUnique({ id });

  // Agent does its reasoning...

  // Write with version check
  const result = await db.records.update({
    where: { id, version: record.version },
    data: { ...newData, version: record.version + 1 }
  });

  if (!result) {
    // Version mismatch: someone else updated the record
    throw new Error("Concurrent modification detected. Re-read and retry.");
  }

  return result;
}

If the version check fails, the agent should re-read the record, re-evaluate its plan, and retry the update.

Section 6: Pattern 5 — Dry-Run Mode for High-Stakes Operations

For operations with significant impact (large data changes, financial transactions, infrastructure changes), implement a dry-run mode where the agent plans the changes but doesn't execute them.

async function agentBulkUpdateUsers(
  db: Database,
  filter: UserFilter,
  updates: Partial<User>,
  dryRun: boolean = true
) {
  // First, show what would be affected
  const affectedUsers = await db.users.findMany({ where: filter });

  const plan = {
    operation: "bulk_update",
    filter,
    updates,
    affectedCount: affectedUsers.length,
    sampleUsers: affectedUsers.slice(0, 5),
  };

  if (dryRun) {
    return { plan, executed: false, message: "Dry run complete. Set dryRun=false to execute." };
  }

  // Execute with transaction
  const result = await db.transaction(async (tx) => {
    return await tx.users.updateMany({
      where: filter,
      data: updates
    });
  });

  return { plan, executed: true, result };
}

This gives humans visibility into what the agent intends to do before it does it.

Section 7: Audit Trails for Agent-Driven Changes

When agents modify data, you need an audit trail that captures:

What was changed (before/after values),
Why (the task the agent was executing),
Which agent run made the change,
When the change occurred.

async function agentUpdateWithAudit(
  db: Database,
  recordId: string,
  updates: any,
  runId: string
) {
  return await db.transaction(async (tx) => {
    const before = await tx.records.findUnique({ id: recordId });

    const updated = await tx.records.update({
      where: { id: recordId },
      data: updates
    });

    await tx.auditLog.create({
      entityType: "record",
      entityId: recordId,
      action: "agent_update",
      before: before,
      after: updated,
      agentRunId: runId,
      timestamp: new Date(),
    });

    return updated;
  });
}

This audit trail is essential for debugging agent mistakes and for compliance in regulated industries.

Conclusion

AI agents that write to databases need the same consistency patterns that reliable distributed systems use: transactions, idempotency, compensation logic, optimistic locking, and audit trails.

The difference is that agents add probabilistic failures to the mix. A transaction might fail because the agent's plan was wrong, not because of a database error. Compensation logic might need to handle cases where the agent's understanding of state doesn't match reality.

Build these patterns into your agent framework from the start. Agents that modify data without them will cause incidents that are difficult to diagnose and expensive to clean up.

Need help designing transactional AI agent systems with strong consistency guarantees?

Transactional AI Agents: Patterns for Database Consistency and Safe Rollbacks

Introduction

Section 1: The Consistency Problem with Agent Writes

Partial completion

Cascading writes

Hallucinated operations

Retry storms

Section 2: Pattern 1 — Database Transactions for Atomic Operations

Limitations of transactions

Section 3: Pattern 2 — Idempotency Keys for Safe Retries

Section 4: Pattern 3 — Compensation Transactions (Sagas)

Section 5: Pattern 4 — Optimistic Locking for Concurrent Agents

Section 6: Pattern 5 — Dry-Run Mode for High-Stakes Operations

Section 7: Audit Trails for Agent-Driven Changes

Conclusion

Related Insights

Vector Databases at Scale: Architecture Patterns for High-Throughput AI Applications

AI Agent Memory Systems: How Vector Databases Enable Long-Term Context and Learning

Continue Thinking

Introduction

Section 1: The Consistency Problem with Agent Writes

Partial completion

Cascading writes

Hallucinated operations

Retry storms

Section 2: Pattern 1 — Database Transactions for Atomic Operations

Limitations of transactions

Section 3: Pattern 2 — Idempotency Keys for Safe Retries

Section 4: Pattern 3 — Compensation Transactions (Sagas)

Section 5: Pattern 4 — Optimistic Locking for Concurrent Agents

Section 6: Pattern 5 — Dry-Run Mode for High-Stakes Operations

Section 7: Audit Trails for Agent-Driven Changes

Conclusion

Related Service: Backend System Scaling

Related Insights

Vector Databases at Scale: Architecture Patterns for High-Throughput AI Applications

AI Agent Memory Systems: How Vector Databases Enable Long-Term Context and Learning

Continue Thinking