Human-in-the-Loop Review
Human-in-the-loop (HITL) review lets you verify AI agent responses before they reach customers. When enabled, the agent's draft response is queued for human approval instead of being sent immediately.
Review Modes
Each agent has a configurable HITL mode:
| Mode | Behaviour | Best For |
|---|---|---|
review |
All responses go to the review queue | New agents, high-stakes channels |
confidence |
Only low-confidence responses are queued | Established agents with correction history |
autonomous |
Agent responds directly, no review | Mature agents with proven accuracy |
Set the mode in the agent's settings under Human-in-the-Loop.
Blocking vs. Non-Blocking
- Blocking — the agent waits for human approval before sending the response. The visitor sees a "thinking" indicator until the reviewer acts. Best for high-stakes conversations where accuracy is critical.
- Non-blocking — the agent responds immediately. If a human later edits the response, the correction is stored for future reference but the original response has already been sent. Best for high-volume, lower-stakes interactions.
The Review Queue
Navigate to HITL Review in the sidebar to see pending agent responses.
Each review item shows:
- Conversation context — the full message history so you understand what the visitor is asking
- Agent draft — the response the agent wants to send
- Agent name — which specialist agent generated the response
- Confidence — the agent's self-assessed confidence in its response
- Routing history — how the message was routed (triage classification, agent selection)
Actions
| Action | Effect |
|---|---|
| Approve | Sends the agent's draft as-is |
| Edit & Approve | Sends your edited version and stores a correction |
| Reject | Discards the draft. The visitor receives no response (use for spam or out-of-scope messages) |
Keyboard Shortcuts
- A — Approve
- E — Focus the edit field
- Enter (in edit mode) — Submit edited response
- R — Reject
Correction Memory
Every time you edit an agent's response, a correction record is created:
{
"trigger_message": "What's your uptime guarantee?",
"agent_draft": "We aim for high availability across all services.",
"human_correction": "Outrun guarantees 99.9% uptime on all paid plans, backed by our SLA. Enterprise plans include a 99.95% uptime SLA with dedicated support.",
"agent_id": "pre-sales-agent",
"category": "product-info"
}
How Corrections Improve Responses
- A new message arrives that's similar to a previously corrected one
- The system retrieves the top 2-3 matching corrections by semantic similarity
- The corrections are injected into the agent's prompt as few-shot examples
- The agent uses these examples to generate a better response
Over time, the correction rate decreases as the agent accumulates examples for common question patterns.
Managing Corrections
View stored corrections in the agent's Correction Memory Trail (Settings > Agent > Correction Memory). From here you can:
- See all stored corrections and when they were created
- Check retrieval frequency — how often each correction is being used
- Delete outdated corrections (e.g. after a product change makes old answers wrong)
- Identify patterns — if 5+ corrections address the same issue, update the system prompt instead
Prompt vs. Corrections
The system prompt defines general behaviour. Corrections handle specific edge cases. If you find yourself adding corrections for the same class of question repeatedly, the fix belongs in the prompt — not in more corrections.
Email Capture
For chat agents, HITL settings include an email capture delay — a configurable timer (in seconds) before the agent asks for the visitor's email address. This lets you balance engagement (asking too early feels pushy) with lead capture (waiting too long risks losing the visitor).
Set the delay per-agent in Settings > Agent > Email Capture Delay.
Metrics
Track HITL effectiveness with these metrics (visible in the Outrun dashboard):
- Review volume — how many responses are queued per day
- Approval rate — percentage approved without edits (target: increasing over time)
- Edit rate — percentage requiring human edits (target: decreasing over time)
- Average review time — how long items sit in the queue before action
- Correction retrieval rate — how often stored corrections are matched to new messages
Recommended Rollout
- Week 1 — Set all agents to
reviewmode with blocking enabled. Review every response to understand the agent's strengths and failure patterns. - Week 2-3 — Switch well-performing agents to
confidencemode. Only edge cases and low-confidence responses come to the queue. - Month 2+ — Move mature agents with strong correction memory to
autonomousmode. Continue monitoring correction rates and spot-check responses weekly.