Apply Process Builders

AI Audit Trails at Scale

7 min Grayson Campbell 15 Feb 2026
In this guide
  • What an AI audit trail must capture beyond traditional logging
  • Schema design for high-volume, queryable audit records
  • Strategies for balancing write throughput with query performance
  • How audit trails support debugging, compliance, and continuous improvement
7 min

When an AI workflow makes a decision - classifying an email, routing a lead, updating a CRM record - someone will eventually ask: "Why did it do that?" Your audit trail needs to answer that question completely, quickly, and at any scale.

This guide covers how to design audit systems that capture the full context of AI decisions, handle high-volume write loads, and remain queryable for debugging and compliance.

What AI Audit Trails Must Capture

Traditional application logs capture what happened: "Record X was updated at time T by user U." AI audit trails must also capture why it happened and what the AI considered when making the decision.

Every AI workflow execution should produce an audit record that includes:

Field Purpose Example
Run ID Unique execution identifier run_a1b2c3d4
Workflow ID Which workflow was executed wf_lead_triage
Trigger What started the execution webhook: new_email from [email protected]
Node trace Ordered list of nodes executed [source, classify, route, update-crm]
AI inputs Full prompt sent to each AI node Complete prompt with all context
AI outputs Raw model response Full JSON response
Decisions Conditional evaluations and results intent=purchase, confidence=0.91 → fast-track
Actions taken External side effects Created Salesforce Lead: 00Q123
Duration Per-node and total execution time classify: 1.2s, total: 4.8s
Errors Any failures, retries, or fallbacks Enrichment API timeout, retried 2x

The AI inputs and outputs are the critical addition. Without them, you can see that a lead was routed to the fast-track queue, but you cannot explain why the AI classified it as a hot lead. With them, you can reconstruct the entire decision chain.

{
  "runId": "run_a1b2c3d4",
  "workflowId": "wf_lead_triage",
  "workspaceId": "workspace_abc",
  "trigger": {
    "type": "webhook",
    "source": "gmail",
    "metadata": { "from": "[email protected]", "subject": "Pricing for enterprise plan" }
  },
  "nodes": [
    {
      "nodeId": "classify",
      "type": "ai",
      "startedAt": "2026-02-15T10:30:01Z",
      "completedAt": "2026-02-15T10:30:02.2Z",
      "input": {
        "prompt": "Classify this email...",
        "context_tokens": 450
      },
      "output": {
        "intent": "purchase_inquiry",
        "confidence": 0.91,
        "reasoning": "Explicitly asks for enterprise pricing, mentions team size"
      },
      "model": "claude-sonnet-4-20250514",
      "tokens_used": { "input": 450, "output": 85 }
    },
    {
      "nodeId": "route",
      "type": "conditional",
      "evaluation": "intent == 'purchase_inquiry' AND confidence >= 0.9",
      "result": "fast-track",
      "skippedBranches": ["standard-nurture", "support-queue"]
    }
  ],
  "totalDuration": 4800,
  "status": "completed"
}
Key Takeaway

AI audit trails capture the why, not just the what. Every AI decision must be traceable back to the exact prompt, context, and model output that produced it. Without this, you cannot debug incorrect classifications, satisfy compliance inquiries, or improve your workflows over time.

Schema Design for Scale

Audit records are write-heavy and query-occasional. Optimise the schema for fast writes with targeted query support.

Collection Structure

Separate the high-level run record from the detailed node traces:

// workflow_runs collection - one document per run
{
  _id: "run_a1b2c3d4",
  workflowId: "wf_lead_triage",
  status: "completed",
  startedAt: ISODate("2026-02-15T10:30:00Z"),
  completedAt: ISODate("2026-02-15T10:30:04.8Z"),
  trigger: { type: "webhook", source: "gmail" },
  summary: {
    nodesExecuted: 4,
    aiCallsMade: 1,
    actionsPerformed: ["salesforce.create_lead"],
    totalTokens: 535
  }
}

// workflow_node_runs collection - one document per node execution
{
  _id: "noderun_xyz789",
  runId: "run_a1b2c3d4",
  nodeId: "classify",
  type: "ai",
  startedAt: ISODate("2026-02-15T10:30:01Z"),
  completedAt: ISODate("2026-02-15T10:30:02.2Z"),
  input: { /* full prompt context */ },
  output: { /* full model response */ },
  metadata: { model: "claude-sonnet-4-20250514", tokens: { input: 450, output: 85 } }
}

This separation means querying run summaries (for dashboards and monitoring) doesn't require loading the full AI prompt and response data. When you need the detail for debugging, you query the node runs collection by run ID.

Indexing Strategy

Create indexes for the queries you will actually run:

// Run-level queries
db.workflow_runs.createIndex({ workflowId: 1, startedAt: -1 });  // Recent runs by workflow
db.workflow_runs.createIndex({ status: 1, startedAt: -1 });       // Failed runs
db.workflow_runs.createIndex({ "trigger.source": 1 });            // Runs by trigger source

// Node-level queries
db.workflow_node_runs.createIndex({ runId: 1 });                  // All nodes for a run
db.workflow_node_runs.createIndex({ type: 1, startedAt: -1 });    // AI nodes over time

Avoid indexing the full prompt or response text - those fields are large and rarely queried directly. If you need text search on AI outputs, use a separate search index or sampling-based analysis.

Write Throughput Patterns

A busy system might generate thousands of audit records per minute. Two patterns keep writes fast:

Batched Writes

Instead of writing each node completion individually, buffer node results and write the complete run record in a single operation:

class AuditBuffer {
  constructor(runId) {
    this.runId = runId;
    this.nodes = [];
  }

  recordNode(nodeId, data) {
    this.nodes.push({ nodeId, ...data, timestamp: new Date() });
  }

  async flush(wsDb) {
    // Single write for the complete run
    await wsDb.collection('workflow_runs').updateOne(
      { _id: this.runId },
      { $set: { nodes: this.nodes, completedAt: new Date(), status: 'completed' } }
    );
  }
}

Async Write-Behind

For highest throughput, decouple audit writes from the workflow execution path. Push audit events to a queue and write them asynchronously:

[Workflow Engine] → [Redis Stream] → [Audit Writer] → [MongoDB]

The workflow engine publishes audit events to a Redis stream and continues execution immediately. A separate audit writer process consumes the stream and writes to MongoDB in batches. This means a slow database write doesn't block workflow execution.

Technical Deep Dive

When using async write-behind, ensure the audit stream has durability guarantees. Redis Streams with consumer groups provide at-least-once delivery. If the audit writer crashes mid-batch, the unacknowledged messages are redelivered when it restarts. Design audit records with idempotent writes (using the run ID and node ID as a composite key) so duplicate processing doesn't create duplicate records.

Retention and Archival

Not all audit data has the same shelf life. Implement a tiered retention policy:

Data Hot Storage Warm Storage Cold Storage
Run summaries 90 days 1 year 7 years
Node traces 30 days 90 days 1 year
AI prompts/responses 30 days 90 days Per compliance
Error details 90 days 1 year 7 years

Hot storage is your primary database with full indexing. Warm storage is compressed archives with limited query capability. Cold storage is object storage for compliance retention.

Implement automatic archival:

// Monthly archival job
async function archiveOldRuns(wsDb, cutoffDate) {
  const oldRuns = await wsDb.collection('workflow_runs')
    .find({ completedAt: { $lt: cutoffDate } })
    .toArray();

  // Write to archive storage
  await archiveStorage.putBatch(
    `audit/${wsDb.databaseName}/${cutoffDate.toISOString()}.json.gz`,
    compress(JSON.stringify(oldRuns))
  );

  // Remove from hot storage
  await wsDb.collection('workflow_runs')
    .deleteMany({ completedAt: { $lt: cutoffDate } });
}

Using Audit Trails for Improvement

Audit trails are not just for compliance - they are your best tool for improving AI workflow performance:

Classification accuracy. Sample audit records where the AI classified with low confidence. Review the prompt and output to identify patterns where the AI struggles. Refine prompts accordingly.

Performance bottlenecks. Aggregate node duration data to find slow nodes. If the enrichment step consistently takes 3 seconds, consider caching or parallelising it.

Error patterns. Group errors by type and frequency. If 40% of failures are due to CRM API timeouts, the issue is infrastructure, not logic.

Default branch analysis. Query for runs where conditional nodes hit the default branch. Each one represents a classification gap in your prompt.

// Find workflow runs where the AI was uncertain
const uncertainRuns = await wsDb.collection('workflow_node_runs').find({
  type: 'ai',
  'output.confidence': { $lt: 0.7 },
  startedAt: { $gte: thirtyDaysAgo }
}).sort({ startedAt: -1 }).limit(100).toArray();
Try it in Outrun

Outrun's comprehensive audit trails capture every AI decision, routing event, and action across all workflow runs. The audit system stores full prompt context and model outputs, making it possible to trace any automated decision back to its inputs. Combined with multi-tenant isolation, each workspace's audit data is completely segregated.

Compliance Considerations

Different regulatory frameworks have specific audit requirements:

  • SOC 2: Requires evidence of access controls, data processing records, and change management logs. AI audit trails satisfy the processing records requirement.
  • GDPR: Right to explanation means you must be able to explain automated decisions that affect individuals. The AI input/output capture enables this.
  • Industry-specific regulations: Financial services, healthcare, and government contexts may require longer retention periods and additional metadata.

Design your audit schema to be extensible. Adding new metadata fields should not require a schema migration or backfill.

Wrapping Up the Process Builders Track

This article completes the Apply tier for process builders. You now have the patterns for building AI workflows (from basic pipelines to complex DAGs), implementing conditional logic and agent-based triage, ensuring reliability at scale, automating code changes safely, integrating data across tools, isolating tenant data, and auditing every decision.

The next step is to put these patterns into practice. Start with a single workflow - the one causing the most operational pain - and build from there.

Want the business perspective?
AI Governance and Audit Trails