Human-in-the-Loop Review

Human-in-the-loop (HITL) review lets you verify AI agent responses before they reach customers. When enabled, the agent's draft response is queued for human approval instead of being sent immediately.

Review Modes

Each agent has a configurable HITL mode:

Mode Behaviour Best For
review All responses go to the review queue New agents, high-stakes channels
confidence Only low-confidence responses are queued Established agents with correction history
autonomous Agent responds directly, no review Mature agents with proven accuracy

Set the mode in the agent's settings under Human-in-the-Loop.

Blocking vs. Non-Blocking

  • Blocking — the agent waits for human approval before sending the response. The visitor sees a "thinking" indicator until the reviewer acts. Best for high-stakes conversations where accuracy is critical.
  • Non-blocking — the agent responds immediately. If a human later edits the response, the correction is stored for future reference but the original response has already been sent. Best for high-volume, lower-stakes interactions.

The Review Queue

Navigate to HITL Review in the sidebar to see pending agent responses.

Each review item shows:

  • Conversation context — the full message history so you understand what the visitor is asking
  • Agent draft — the response the agent wants to send
  • Agent name — which specialist agent generated the response
  • Confidence — the agent's self-assessed confidence in its response
  • Routing history — how the message was routed (triage classification, agent selection)

Actions

Action Effect
Approve Sends the agent's draft as-is
Edit & Approve Sends your edited version and stores a correction
Reject Discards the draft. The visitor receives no response (use for spam or out-of-scope messages)

Keyboard Shortcuts

  • A — Approve
  • E — Focus the edit field
  • Enter (in edit mode) — Submit edited response
  • R — Reject

Correction Memory

Every time you edit an agent's response, a correction record is created:

{
  "trigger_message": "What's your uptime guarantee?",
  "agent_draft": "We aim for high availability across all services.",
  "human_correction": "Outrun guarantees 99.9% uptime on all paid plans, backed by our SLA. Enterprise plans include a 99.95% uptime SLA with dedicated support.",
  "agent_id": "pre-sales-agent",
  "category": "product-info"
}

How Corrections Improve Responses

  1. A new message arrives that's similar to a previously corrected one
  2. The system retrieves the top 2-3 matching corrections by semantic similarity
  3. The corrections are injected into the agent's prompt as few-shot examples
  4. The agent uses these examples to generate a better response

Over time, the correction rate decreases as the agent accumulates examples for common question patterns.

Managing Corrections

View stored corrections in the agent's Correction Memory Trail (Settings > Agent > Correction Memory). From here you can:

  • See all stored corrections and when they were created
  • Check retrieval frequency — how often each correction is being used
  • Delete outdated corrections (e.g. after a product change makes old answers wrong)
  • Identify patterns — if 5+ corrections address the same issue, update the system prompt instead

Prompt vs. Corrections

The system prompt defines general behaviour. Corrections handle specific edge cases. If you find yourself adding corrections for the same class of question repeatedly, the fix belongs in the prompt — not in more corrections.

Email Capture

For chat agents, HITL settings include an email capture delay — a configurable timer (in seconds) before the agent asks for the visitor's email address. This lets you balance engagement (asking too early feels pushy) with lead capture (waiting too long risks losing the visitor).

Set the delay per-agent in Settings > Agent > Email Capture Delay.

Metrics

Track HITL effectiveness with these metrics (visible in the Outrun dashboard):

  • Review volume — how many responses are queued per day
  • Approval rate — percentage approved without edits (target: increasing over time)
  • Edit rate — percentage requiring human edits (target: decreasing over time)
  • Average review time — how long items sit in the queue before action
  • Correction retrieval rate — how often stored corrections are matched to new messages

Recommended Rollout

  1. Week 1 — Set all agents to review mode with blocking enabled. Review every response to understand the agent's strengths and failure patterns.
  2. Week 2-3 — Switch well-performing agents to confidence mode. Only edge cases and low-confidence responses come to the queue.
  3. Month 2+ — Move mature agents with strong correction memory to autonomous mode. Continue monitoring correction rates and spot-check responses weekly.

Next Steps