Skip to main content

Trust-Based Review Model

Human code review is the single largest bottleneck in AI-accelerated delivery. This model defines how to progressively reduce human review overhead without sacrificing safety — by letting agents earn trust through demonstrated reliability.

The Core Principle

Trust is earned per agent, per stage, per project. An agent that demonstrates consistent quality at one stage does not automatically earn trust at another. Trust levels are not organizational — they are scoped.

Four Trust Levels

Trust Level 0: Full Supervision (Default)

Every new agent starts here. All outputs require human review before proceeding.

AspectRequirement
Human review100% of agent outputs reviewed by qualified human
ApprovalExplicit human approval required at every gate
MonitoringEvery agent action logged and reviewed weekly
DurationMinimum 2 weeks or 20 agent runs (whichever is longer)
Promotion criteria≥90% first-pass acceptance rate, 0 critical defects escaped, complete handoff artifacts

What this looks like in practice:

  • developer-agent produces code → Senior Developer reviews every line
  • qa-agent produces test matrix → QA Lead validates every test case
  • security-agent produces findings → Security Engineer verifies every finding
  • Every handoff artifact is manually checked for completeness

Trust Level 1: Guided Autonomy

Agent handles routine work autonomously. Human reviews exceptions and high-risk outputs.

AspectRequirement
Human review~60% of outputs — focus on Tier 2+ risk, new patterns, and flagged items
ApprovalAuto-approve for Tier 1 work within established patterns; human approval for Tier 2+
MonitoringWeekly sampling of 30% of auto-approved outputs
DurationMinimum 4 weeks or 50 agent runs at Level 1
Promotion criteria≥95% first-pass acceptance on sampled outputs, 0 critical defects escaped, ≤2 medium defects per 50 runs

What this looks like in practice:

  • developer-agent produces a standard CRUD endpoint → auto-reviewed by CI gates only
  • developer-agent produces authentication logic → routed to Senior Developer for review
  • qa-agent produces test matrix for known patterns → auto-accepted
  • qa-agent produces test matrix for new integration → QA Lead reviews

Risk-based routing rules at Level 1:

ConditionAction
Tier 1 work + established patternAuto-approve via CI gates
Tier 1 work + new patternRoute to human reviewer
Tier 2+ workAlways route to human reviewer
Agent flags uncertaintyAlways route to human reviewer
Architecture-impacting changeAlways route to Solution Architect
Auth/crypto/PII handlingAlways route to Security Engineer

Trust Level 2: Supervised Autonomy

Agent operates autonomously for most work. Human oversight shifts to exception-based and statistical sampling.

AspectRequirement
Human review~25% of outputs — only Tier 3+, flagged items, and random sampling
ApprovalAuto-approve for Tier 1-2 work; human approval for Tier 3+
MonitoringBi-weekly sampling of 15% of auto-approved outputs; automated anomaly detection
DurationMinimum 8 weeks or 100 agent runs at Level 2
Promotion criteria≥98% first-pass acceptance on sampled outputs, 0 critical/high defects escaped, automated anomaly detection in place

What this looks like in practice:

  • developer-agent handles most feature implementation autonomously
  • qa-agent generates and validates test matrices without human review for Tier 1-2
  • security-agent auto-classifies and routes findings; human reviews only critical/high
  • Statistical sampling catches drift before it becomes systemic

Trust Level 3: Autonomous with Guardrails

Agent operates with full autonomy within its contract boundaries. Human involvement is limited to non-negotiable checkpoints and anomaly response.

AspectRequirement
Human review~10% — only non-negotiable checkpoints (see below) + anomaly triggers
ApprovalAuto-approve for Tier 1-3; human approval for Tier 4 only
MonitoringContinuous automated monitoring; monthly human audit of 5% of outputs
DurationNo minimum — maintained as long as metrics hold
Demotion criteriaAny critical defect escaped, ≥3 high defects in 30-day window, anomaly detection triggers

Non-negotiable human checkpoints (even at Level 3):

CheckpointReason
Requirements approval (Gate 1)Business intent must be human-owned
Architecture approval for Tier 4Irreversible at enterprise scale
Critical security findingsFalse negatives have outsized impact
Production deployment approvalLast line of defense
Incident response decisionsHuman accountability required

Trust Promotion Process

┌──────────────────────────────────────────────────────────────────┐
│ TRUST PROMOTION WORKFLOW │
│ │
│ Level 0 Level 1 Level 2 Level 3 │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ FULL │─────>│ GUIDED │───────>│SUPERV. │─────>│ AUTO │ │
│ │ SUPER │ │ AUTON. │ │ AUTON. │ │ WITH │ │
│ │ │ │ │ │ │ │ GUARD │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │ │ │ │ │
│ │ 20 runs │ 50 runs │ 100 runs │ │
│ │ ≥90% pass │ ≥95% pass │ ≥98% pass │ │
│ │ 0 critical │ 0 critical │ 0 crit/high │ │
│ │ │ ≤2 medium │ anomaly det. │ │
│ │ │ │ │ │
│ └──── DEMOTION (any critical defect escaped) ──────┘ │
│ │
│ Demotion is IMMEDIATE and drops agent to Level 0. │
│ Re-promotion follows the full progression from Level 0. │
└──────────────────────────────────────────────────────────────────┘

Promotion Requirements

From → ToMinimum RunsMin DurationPass RateDefect ThresholdAdditional
0 → 1202 weeks≥90%0 criticalComplete handoffs
1 → 2504 weeks≥95%0 critical, ≤2 medium
2 → 31008 weeks≥98%0 critical, 0 highAnomaly detection active

Demotion Triggers

TriggerActionRecovery
Critical defect escaped to productionImmediate demotion to Level 0Full re-promotion from Level 0
3+ high defects in 30-day windowDemotion to Level 0Full re-promotion from Level 0
Anomaly detection firesPause agent, investigateResume at current level if false alarm; demote if real
Contract violation (forbidden action attempted)Immediate suspensionRoot cause analysis required before reactivation
Handoff completeness drops below 90%Demotion by 1 levelFix handoff generation, re-promote

Trust Metrics Dashboard

Track these metrics per agent to manage trust levels:

MetricFormulaTarget (Level 3)
First-pass acceptance rateApproved outputs / Total outputs≥98%
Defect escape rateDefects found after gate / Total outputs<0.5%
Handoff completenessComplete handoffs / Total handoffs100%
Mean time to gate passAvg time from output to gate approvalDecreasing trend
Anomaly trigger rateAnomaly detections / Total runs<2%
Human intervention rateOutputs requiring human review / Total outputs<10% at Level 3

Per-Stage Trust Configuration

Not all stages reach the same trust level at the same pace. Typical progression:

StageTypical Time to Level 1Typical Time to Level 2Typical Time to Level 3Notes
Requirements (Stage 1)2 weeks6 weeks12 weeksSlower — business intent requires careful validation
Design (Stage 2)3 weeks8 weeks16 weeksSlowest — architecture decisions are high-impact
Implementation (Stage 3)2 weeks4 weeks10 weeksFastest for routine code patterns
Testing (Stage 4)2 weeks5 weeks10 weeksModerate — test quality has compounding effect
Security (Stage 5)3 weeks8 weeksNever for critical findingsCritical findings always require human review
Deployment (Stage 6)N/AN/AN/AProduction deployment always requires human approval
Operations (Stage 7)2 weeks4 weeks8 weeksFast for monitoring; slow for incident response

Relationship to PRD-STD-002 (Code Review)

PRD-STD-002 mandates human review for production-bound code. The trust model does not override this standard. Instead, it defines how the scope and depth of human review changes:

Trust LevelPRD-STD-002 ComplianceReview Scope
Level 0Full manual review of all codeLine-by-line review
Level 1Full review for Tier 2+; sampling for Tier 1Logic and architecture focus
Level 2Full review for Tier 3+; statistical sampling for Tier 1-2Exception and anomaly focus
Level 3Full review for Tier 4; automated gates + sampling for Tier 1-3Non-negotiable checkpoints only

At every level, the human reviewer retains the authority to reject any agent output and trigger re-work.

warning

Trust Level 3 requires a formal variance approval from the CTO or equivalent authority, documented in the governance record. This acknowledges that the organization accepts reduced human review coverage in exchange for demonstrated agent reliability.