Human-in-the-Loop Review

Home
>
Documentation
>
Ai
>
Human-in-the-Loop Review

Human-in-the-loop (HITL) review lets you verify AI agent responses before they reach customers. When enabled, the agent's draft response is queued for human approval instead of being sent immediately.

Review Modes

Each agent has a configurable HITL mode:

Mode	Behaviour	Best For
`review`	All responses go to the review queue	New agents, high-stakes channels
`confidence`	Only low-confidence responses are queued	Established agents with correction history
`autonomous`	Agent responds directly, no review	Mature agents with proven accuracy

Set the mode in the agent's settings under Human-in-the-Loop.

Blocking vs. Non-Blocking

Blocking — the agent waits for human approval before sending the response. The visitor sees a "thinking" indicator until the reviewer acts. Best for high-stakes conversations where accuracy is critical.
Non-blocking — the agent responds immediately. If a human later edits the response, the correction is stored for future reference but the original response has already been sent. Best for high-volume, lower-stakes interactions.

The Review Queue

Navigate to HITL Review in the sidebar to see pending agent responses.

Each review item shows:

Conversation context — the full message history so you understand what the visitor is asking
Agent draft — the response the agent wants to send
Agent name — which specialist agent generated the response
Confidence — the agent's self-assessed confidence in its response
Routing history — how the message was routed (triage classification, agent selection)

Actions

Action	Effect
Approve	Sends the agent's draft as-is
Edit & Approve	Sends your edited version and stores a correction
Reject	Discards the draft. The visitor receives no response (use for spam or out-of-scope messages)

Keyboard Shortcuts

A — Approve
E — Focus the edit field
Enter (in edit mode) — Submit edited response
R — Reject

Correction Memory

Every time you edit an agent's response, a correction record is created:

{
  "trigger_message": "What's your uptime guarantee?",
  "agent_draft": "We aim for high availability across all services.",
  "human_correction": "Outrun guarantees 99.9% uptime on all paid plans, backed by our SLA. Enterprise plans include a 99.95% uptime SLA with dedicated support.",
  "agent_id": "pre-sales-agent",
  "category": "product-info"
}

How Corrections Improve Responses

A new message arrives that's similar to a previously corrected one
The system retrieves the top 2-3 matching corrections by semantic similarity
The corrections are injected into the agent's prompt as few-shot examples
The agent uses these examples to generate a better response

Over time, the correction rate decreases as the agent accumulates examples for common question patterns.

Managing Corrections

View stored corrections in the agent's Correction Memory Trail (Settings > Agent > Correction Memory). From here you can:

See all stored corrections and when they were created
Check retrieval frequency — how often each correction is being used
Delete outdated corrections (e.g. after a product change makes old answers wrong)
Identify patterns — if 5+ corrections address the same issue, update the system prompt instead

Prompt vs. Corrections

The system prompt defines general behaviour. Corrections handle specific edge cases. If you find yourself adding corrections for the same class of question repeatedly, the fix belongs in the prompt — not in more corrections.

Email Capture

For chat agents, HITL settings include an email capture delay — a configurable timer (in seconds) before the agent asks for the visitor's email address. This lets you balance engagement (asking too early feels pushy) with lead capture (waiting too long risks losing the visitor).

Set the delay per-agent in Settings > Agent > Email Capture Delay.

Metrics

Track HITL effectiveness with these metrics (visible in the Outrun dashboard):

Review volume — how many responses are queued per day
Approval rate — percentage approved without edits (target: increasing over time)
Edit rate — percentage requiring human edits (target: decreasing over time)
Average review time — how long items sit in the queue before action
Correction retrieval rate — how often stored corrections are matched to new messages

Recommended Rollout

Week 1 — Set all agents to review mode with blocking enabled. Review every response to understand the agent's strengths and failure patterns.
Week 2-3 — Switch well-performing agents to confidence mode. Only edge cases and low-confidence responses come to the queue.
Month 2+ — Move mature agents with strong correction memory to autonomous mode. Continue monitoring correction rates and spot-check responses weekly.

Next Steps

AI Agents

Configure agent templates, prompts, tools, and multi-agent routing.

Read guide →

Chat Widget & SDK

Install the chat widget and identify visitors on your website.

Read guide →