February 8, 2026

Section 10: Feedback Loops Implementation Plan

The Nervous System — Five feedback loops that turn execution into learning

1. Overview

Feedback loops are the P0 foundation of Forge. They transform a one-shot execution system into a continuously learning one. Every action generates data, every result creates a signal, and every loop feeds information back to where it's useful.

The five loops operate at different timescales:

Inner Loop (ms) — Tool results → agent reasoning
Phase Loop (min) — Review/test failures → fix → re-check
Run Loop (min-hr) — Post-run reflection → memory storage
Human Loop (hr-days) — Human feedback → pattern adjustment
Production Loop (days) — Deployed code → monitoring → new tasks

Build Priority: Loops 1-4 ship with MVP (Weeks 1-6). Loop 5 interface designed now, implemented post-MVP.

2. Loop 1: Inner Loop (Already Handled by Agent Loop)

2.1 What It Is

The inner loop is the perceive → reason → act → learn cycle inside every agent. Each tool call result feeds directly back into the next LLM reasoning step.

This loop already exists in the base agent design (Section 6 of SYSTEM-DESIGN.md). No special infrastructure needed.

2.2 Interface Points

The data that flows back from tool execution to the next reasoning step:

typescript
// ─── Tool execution result ────────────────────────────────
interface ToolResult<TOutput = unknown> {
  success: boolean;
  output?: TOutput;
  error?: Error;
  duration: number;
  cost?: { tokens: number; usd: number };
}

// ─── This feeds into WorkingMemory for next iteration ─────
interface WorkingMemory {
  messages: Message[];
  toolResults: ToolResult[];
  iterationCount: number;
  accumulatedCost: number;
}

// ─── The agent loop uses this to decide next action ───────
class BaseAgent implements Agent {
  private updateWorkingMemory(
    memory: WorkingMemory,
    decision: AgentDecision,
    result: ToolResult
  ): WorkingMemory {
    return {
      ...memory,
      messages: [
        ...memory.messages,
        { role: 'assistant', content: decision.reasoning },
        { role: 'tool', content: JSON.stringify(result) },
      ],
      toolResults: [...memory.toolResults, result],
      iterationCount: memory.iterationCount + 1,
      accumulatedCost: memory.accumulatedCost + (result.cost?.usd ?? 0),
    };
  }
}

2.3 Observable Metrics

Track these to measure inner loop health:

typescript
// ─── src/safety/metrics.ts ────────────────────────────────
interface InnerLoopMetrics {
  // Per-phase averages
  avgIterationsPerPhase: {
    planning: number;
    implementation: number;
    review: number;
    testing: number;
  };

  // Tool effectiveness
  toolSuccessRate: {
    [toolName: string]: {
      successRate: number;      // % of calls that succeeded
      avgDuration: number;       // ms
      avgCost: number;           // USD
    };
  };

  // Progress indicators
  stagnationDetection: {
    consecutiveIterationsWithoutProgress: number;
    threshold: 3;  // Trigger reflection if > 3
  };
}

// Calculate from events
function calculateInnerLoopMetrics(traceId: string): Promise<InnerLoopMetrics> {
  const events = await bus.replay(traceId);

  const toolEvents = events.filter(e => e.type === 'tool.executed');
  const phaseEvents = events.filter(e => e.type === 'phase.entered');

  // Group by phase, calculate averages
  // ...
}

2.4 Integration Timeline

Week 1: Already part of base agent implementation. Just add metrics tracking.

3. Loop 2: Phase Loop (The Bounce-Back)

3.1 What It Solves

When Review finds issues or Tests fail, the pipeline doesn't just halt — it bounces back to the Implementer with structured feedback. The Implementer fixes the issues and the work is re-checked. This is the most important loop for code quality.

3.2 Phase Loop Configuration

typescript
// ─── src/orchestrator/phase-loop.ts ───────────────────────
interface PhaseLoopConfig {
  // Global default
  maxBounces: number;

  // Per-phase overrides
  phases: {
    review: {
      onChangesRequested: 'implementation';  // Bounce target
      maxBounces: 3;
      diminishingReturnsThreshold: 2;  // If no improvement after 2 bounces, escalate
    };
    testing: {
      onFailure: 'implementation';
      maxBounces: 2;
      autoFixOnly: boolean;  // Only bounce if AI can suggest fix
    };
  };
}

// Default config
const DEFAULT_PHASE_LOOP_CONFIG: PhaseLoopConfig = {
  maxBounces: 3,
  phases: {
    review: {
      onChangesRequested: 'implementation',
      maxBounces: 3,
      diminishingReturnsThreshold: 2,
    },
    testing: {
      onFailure: 'implementation',
      maxBounces: 2,
      autoFixOnly: true,
    },
  },
};

3.3 Structured Feedback Types

The phase loop carries structured feedback, not just "it failed":

typescript
// ─── src/agents/reviewer.ts (output) ──────────────────────
interface Finding {
  id: string;
  severity: 'info' | 'warning' | 'error' | 'critical';
  category: 'style' | 'security' | 'correctness' | 'performance';
  message: string;
  file?: string;
  line?: number;
  confidence: number;
  fixable: boolean;
  suggestedFix?: {
    description: string;
    diff: string;
  };
}

interface ReviewResult {
  findings: Finding[];
  riskScore: {
    total: number;
    level: 'low' | 'medium' | 'high' | 'critical';
  };
  decision: 'approve' | 'request_changes' | 'require_human';
}

// ─── src/agents/tester.ts (output) ────────────────────────
interface FailureAnalysis {
  test: string;
  error: string;
  rootCause: {
    type: 'logic' | 'syntax' | 'runtime' | 'flaky' | 'env';
    description: string;
    location?: { file: string; line: number };
  };
  confidence: number;
  suggestedFix?: {
    description: string;
    changes: FileChange[];
  };
}

interface TestResult {
  summary: {
    total: number;
    passed: number;
    failed: number;
    skipped: number;
  };
  coverage: {
    line: number;
    branch: number;
    diff: number;  // Coverage on changed lines only
  };
  failures: FailureAnalysis[];
  generatedTests?: TestFile[];
}

3.4 Core Implementation: `runPipelineWithBounces()`

typescript
// ─── src/orchestrator/pipeline.ts ─────────────────────────
interface BounceState {
  phase: PhaseName;
  count: number;
  history: {
    bounce: number;
    findingsCount?: number;
    failuresCount?: number;
    timestamp: Date;
  }[];
}

async function runPipelineWithBounces(
  task: string,
  ctx: PipelineContext
): Promise<PipelineResult> {
  const bounceState: Record<PhaseName, BounceState> = {
    review: { phase: 'review', count: 0, history: [] },
    testing: { phase: 'testing', count: 0, history: [] },
  };

  // ── Planning phase ──────────────────────────────────────
  const plan = await runPhase('planning', { task }, ctx);

  ctx.bus.emit({
    type: 'phase.completed',
    source: 'orchestrator',
    traceId: ctx.traceId,
    payload: { phase: 'planning', plan },
  });

  // ── Implementation phase ────────────────────────────────
  let code = await runPhase('implementation', plan, ctx);

  // ── Review loop: implement → review → fix → re-review ──
  while (bounceState.review.count < ctx.config.phases.review.maxBounces) {
    const review = await runPhase('review', code, ctx);

    // Approved? Break out.
    if (review.decision === 'approve') {
      ctx.bus.emit({
        type: 'phase.approved',
        source: 'orchestrator',
        traceId: ctx.traceId,
        payload: { phase: 'review', bounces: bounceState.review.count },
      });
      break;
    }

    // Requires human? Gate.
    if (review.decision === 'require_human') {
      await ctx.gates.requestHumanReview(review);
      break;
    }

    // Request changes? Bounce back.
    const criticalFindings = review.findings.filter(
      f => f.severity === 'error' || f.severity === 'critical'
    );

    if (criticalFindings.length === 0) {
      // No critical issues, move forward
      break;
    }

    // Record bounce
    bounceState.review.count++;
    bounceState.review.history.push({
      bounce: bounceState.review.count,
      findingsCount: criticalFindings.length,
      timestamp: new Date(),
    });

    ctx.bus.emit({
      type: 'loop.phase_bounce',
      source: 'orchestrator',
      traceId: ctx.traceId,
      payload: {
        from: 'review',
        to: 'implementation',
        bounce: bounceState.review.count,
        findings: criticalFindings,
      },
    });

    // Check for diminishing returns
    if (bounceState.review.count >= ctx.config.phases.review.diminishingReturnsThreshold) {
      const improving = detectImprovement(bounceState.review.history);
      if (!improving) {
        ctx.bus.emit({
          type: 'loop.diminishing_returns',
          source: 'orchestrator',
          traceId: ctx.traceId,
          payload: { phase: 'review', bounces: bounceState.review.count },
        });
        await ctx.gates.requestHumanHelp({
          reason: 'Fixes not improving review findings',
          history: bounceState.review.history,
        });
        break;
      }
    }

    // Bounce back: feed findings to implementer
    code = await runPhase('implementation', {
      ...plan,
      mode: 'fix',  // ← Implementer knows this is a fix pass
      existingCode: code,
      fixFindings: criticalFindings,
    }, ctx);
  }

  // ── Test loop: implement → test → fix → re-test ────────
  while (bounceState.testing.count < ctx.config.phases.testing.maxBounces) {
    const tests = await runPhase('testing', code, ctx);

    // All tests pass? Break.
    if (tests.summary.failed === 0) {
      ctx.bus.emit({
        type: 'phase.passed',
        source: 'orchestrator',
        traceId: ctx.traceId,
        payload: { phase: 'testing', bounces: bounceState.testing.count },
      });
      break;
    }

    // Only auto-fix if failures are analyzable
    const fixableFailures = tests.failures.filter(
      f => f.suggestedFix && f.confidence > 0.7
    );

    if (fixableFailures.length === 0) {
      // Can't auto-fix, escalate to human
      await ctx.gates.requestHumanHelp({
        reason: 'Test failures cannot be auto-fixed',
        failures: tests.failures,
      });
      break;
    }

    // Record bounce
    bounceState.testing.count++;
    bounceState.testing.history.push({
      bounce: bounceState.testing.count,
      failuresCount: fixableFailures.length,
      timestamp: new Date(),
    });

    ctx.bus.emit({
      type: 'loop.phase_bounce',
      source: 'orchestrator',
      traceId: ctx.traceId,
      payload: {
        from: 'testing',
        to: 'implementation',
        bounce: bounceState.testing.count,
        failures: fixableFailures,
      },
    });

    // Bounce back: feed failure analysis to implementer
    code = await runPhase('implementation', {
      ...plan,
      mode: 'fix',
      existingCode: code,
      fixFailures: fixableFailures,
    }, ctx);
  }

  // ── Deploy phase ────────────────────────────────────────
  await runPhase('deployment', { code, review, tests }, ctx);

  return {
    status: 'completed',
    bounces: bounceState,
    output: code,
  };
}

// ─── Diminishing returns detection ────────────────────────
function detectImprovement(history: BounceState['history']): boolean {
  if (history.length < 2) return true;

  const latest = history[history.length - 1];
  const previous = history[history.length - 2];

  // If findings/failures didn't decrease, no improvement
  const latestCount = latest.findingsCount ?? latest.failuresCount ?? 0;
  const previousCount = previous.findingsCount ?? previous.failuresCount ?? 0;

  return latestCount < previousCount;
}

3.5 Implementer Prompt Changes for Fix Mode

When the Implementer receives a bounce-back, its system prompt changes:

typescript
// ─── src/agents/implementer.ts ────────────────────────────
function buildImplementerPrompt(input: PhaseInput): string {
  if (input.mode === 'fix' && input.fixFindings) {
    return IMPLEMENTER_FIX_PROMPT({
      findings: input.fixFindings,
      existingCode: input.existingCode,
      plan: input.plan,
    });
  }

  if (input.mode === 'fix' && input.fixFailures) {
    return IMPLEMENTER_FIX_FAILURES_PROMPT({
      failures: input.fixFailures,
      existingCode: input.existingCode,
      plan: input.plan,
    });
  }

  // Fresh implementation
  return IMPLEMENTER_FRESH_PROMPT(input.plan);
}

// ─── Prompt templates ──────────────────────────────────────
const IMPLEMENTER_FIX_PROMPT = (ctx: FixContext) => `
You are fixing code based on review findings. DO NOT rewrite everything — make surgical changes to address the specific issues.

## Review Findings to Fix

${ctx.findings.map(f => `
### ${f.severity.toUpperCase()}: ${f.message}
File: ${f.file}:${f.line}
${f.suggestedFix ? `Suggested fix: ${f.suggestedFix.description}` : ''}
`).join('\n')}

## Existing Code

${ctx.existingCode.files.map(f => `
File: ${f.path}
\`\`\`${f.language}
${f.content}
\`\`\`
`).join('\n')}

## Task

For each finding:
1. Locate the issue in the code
2. Apply the minimal fix needed
3. Run typecheck/tests to validate the fix
4. Do NOT introduce new issues

Output only the changed files with explanations.
`;

const IMPLEMENTER_FIX_FAILURES_PROMPT = (ctx: FailureContext) => `
You are fixing code to make tests pass. Focus on the root causes identified in the failure analysis.

## Test Failures to Fix

${ctx.failures.map(f => `
### Test: ${f.test}
Error: ${f.error}
Root Cause: ${f.rootCause.description}
${f.suggestedFix ? `Suggested fix: ${f.suggestedFix.description}` : ''}
`).join('\n')}

## Existing Code

${ctx.existingCode.files.map(f => `
File: ${f.path}
\`\`\`${f.language}
${f.content}
\`\`\`
`).join('\n')}

## Task

For each failure:
1. Understand the root cause
2. Apply the fix
3. Re-run the failing test to validate
4. Do NOT break other tests

Output only the changed files.
`;

3.6 Bounce Tracking Metrics

typescript
// ─── src/safety/metrics.ts ────────────────────────────────
interface PhaseLoopMetrics {
  // Bounce frequency
  avgBouncesPerRun: {
    review: number;
    testing: number;
  };

  // Bounce effectiveness
  bounceResolutionRate: {
    review: number;  // % of bounces that fixed issues
    testing: number;
  };

  // Quality improvement
  firstPassApprovalRate: number;  // % of reviews approved without bounces

  // Diminishing returns detection
  diminishingReturnsRate: number;  // % of runs that hit DR threshold
}

// Calculate from events
async function calculatePhaseLoopMetrics(
  traceId: string
): Promise<PhaseLoopMetrics> {
  const events = await bus.replay(traceId);

  const bounceEvents = events.filter(e => e.type === 'loop.phase_bounce');
  const drEvents = events.filter(e => e.type === 'loop.diminishing_returns');

  // Calculate averages, rates, etc.
  // ...
}

3.7 Integration Timeline

Week 3: Basic review bounce logic (review → fix → re-review) Week 4: Test bounce logic (test → fix → re-test) Week 6: Diminishing returns detection, bounce metrics

4. Loop 3: Run Loop (Post-Run Reflection)

4.1 What It Solves

After a full pipeline run completes (success or failure), the system reflects on the entire execution trace to extract durable learnings. These learnings populate the memory system and improve future runs.

4.2 Reflection Triggers

Reflection doesn't just happen at run end — it's triggered by multiple conditions:

typescript
// ─── src/orchestrator/reflection-triggers.ts ──────────────
interface ReflectionTrigger {
  name: string;
  condition: (ctx: PipelineContext) => boolean;
  priority: 'immediate' | 'deferred';
}

const REFLECTION_TRIGGERS: ReflectionTrigger[] = [
  {
    name: 'run_completed',
    condition: (ctx) => ctx.status === 'completed' || ctx.status === 'failed',
    priority: 'deferred',  // Run after pipeline finishes
  },
  {
    name: 'phase_bounced_2x',
    condition: (ctx) => {
      const reviewBounces = ctx.bounceState?.review?.count ?? 0;
      const testBounces = ctx.bounceState?.testing?.count ?? 0;
      return reviewBounces >= 2 || testBounces >= 2;
    },
    priority: 'immediate',  // Reflect mid-run to course-correct
  },
  {
    name: 'cost_exceeded_50pct',
    condition: (ctx) => ctx.cost.current > ctx.cost.budget * 0.5,
    priority: 'immediate',
  },
  {
    name: 'error_rate_high',
    condition: (ctx) => {
      const errorEvents = ctx.events.filter(e => e.type.includes('error'));
      const totalEvents = ctx.events.length;
      return errorEvents.length / totalEvents > 0.1;  // 10%
    },
    priority: 'immediate',
  },
  {
    name: 'human_overrode_decision',
    condition: (ctx) => ctx.events.some(e => e.type === 'gate.human_override'),
    priority: 'deferred',  // Learn from this after run
  },
];

4.3 Core Implementation: `reflectOnRun()`

typescript
// ─── src/orchestrator/reflection.ts ───────────────────────
interface RunReflection {
  summary: {
    task: string;
    outcome: 'success' | 'failure' | 'partial';
    phases: PhaseOutcome[];
    totalCost: number;
    totalDuration: number;
    bounces: { phase: string; count: number }[];
  };

  learnings: Learning[];
}

interface Learning {
  type: 'episodic' | 'semantic' | 'procedural';
  content: string;             // Human-readable insight
  context: string;             // When is this relevant?
  confidence: number;          // 0.0 - 1.0
  source: string;              // Which event/pattern triggered this
  tags: string[];              // For retrieval
}

async function reflectOnRun(
  traceId: string,
  ctx: PipelineContext
): Promise<RunReflection> {
  // ── Replay all events from this run ──────────────────────
  const events = await ctx.bus.replay(traceId);

  // ── Summarize events for LLM consumption ─────────────────
  const summary = summarizeEvents(events);

  // ── Ask LLM to extract learnings ─────────────────────────
  const reflection = await ctx.llm.chat({
    system: RUN_REFLECTION_PROMPT,
    messages: [{
      role: 'user',
      content: `Analyze this pipeline run and extract learnings.

Task: ${ctx.task}
Outcome: ${ctx.status}

Events Summary:
${JSON.stringify(summary, null, 2)}

Focus on:
- Patterns that could help future runs
- Mistakes to avoid
- Strategies that worked well
- Any surprises or anomalies
- Why bounces happened (if any)
- Human feedback (if any)

Output a JSON array of learnings.`,
    }],
  });

  // ── Parse learnings ───────────────────────────────────────
  const learnings = parseLearnings(reflection.content);

  // ── Store each learning in memory ────────────────────────
  for (const learning of learnings) {
    await ctx.memory.store({
      type: learning.type,
      content: learning.content,
      context: learning.context,
      confidence: learning.confidence,
      source: `run:${traceId}`,
      tags: learning.tags,
    });
  }

  // ── Emit reflection event ─────────────────────────────────
  ctx.bus.emit({
    type: 'reflection.completed',
    source: 'orchestrator',
    traceId,
    payload: {
      learningsCount: learnings.length,
      costOfReflection: reflection.usage.cost,
    },
  });

  return {
    summary: {
      task: ctx.task,
      outcome: ctx.status,
      phases: extractPhaseOutcomes(events),
      totalCost: calculateTotalCost(events),
      totalDuration: calculateDuration(events),
      bounces: extractBounces(events),
    },
    learnings,
  };
}

// ─── Event summarization ───────────────────────────────────
function summarizeEvents(events: ForgeEvent[]): object {
  // Group by type
  const byType = events.reduce((acc, e) => {
    acc[e.type] = acc[e.type] || [];
    acc[e.type].push(e);
    return acc;
  }, {} as Record<string, ForgeEvent[]>);

  // Extract key data
  return {
    phaseTransitions: byType['phase.entered']?.map(e => e.payload),
    toolCalls: byType['tool.executed']?.length ?? 0,
    findings: byType['finding.detected']?.map(e => e.payload) ?? [],
    bounces: byType['loop.phase_bounce']?.map(e => e.payload) ?? [],
    errors: Object.keys(byType)
      .filter(k => k.includes('error'))
      .flatMap(k => byType[k]),
    humanInteractions: byType['gate.requested']?.length ?? 0,
  };
}

4.4 Reflection Prompt (Full Text)

typescript
// ─── src/orchestrator/prompts.ts ──────────────────────────
const RUN_REFLECTION_PROMPT = `
You are a reflection engine analyzing an agentic SDLC pipeline run.

Your job is to extract learnings from the execution trace. Learnings should be:
- **Specific**: Tied to concrete context (not generic advice)
- **Actionable**: Future runs can apply this knowledge
- **Grounded**: Based on actual evidence from the trace

## Learning Types

**Episodic** — What happened in this specific run
- Example: "PR #42 review missed a null check in auth.ts:102"
- Use when: Recording a specific event for future reference

**Semantic** — Generalized patterns or facts
- Example: "Auth endpoints in this repo use JWT with 24h expiry"
- Use when: Extracting a pattern that applies across runs

**Procedural** — Strategies or "how to" knowledge
- Example: "When tests fail with mock errors, add clearAllMocks() in beforeEach"
- Use when: Identifying a strategy that worked or should be avoided

## Output Format

Respond with a JSON array of learnings:

[
  {
    "type": "episodic" | "semantic" | "procedural",
    "content": "Human-readable insight",
    "context": "When is this relevant? (e.g., 'reviewing auth code', 'testing with mocks')",
    "confidence": 0.5-1.0,  // How sure are you this is valuable?
    "tags": ["auth", "testing", "typescript"]  // For retrieval
  }
]

## Guidelines

- Extract 3-7 learnings per run (quality > quantity)
- Higher confidence (0.8+) for learnings with clear evidence
- Lower confidence (0.5-0.6) for tentative hypotheses
- Tag learnings with relevant domain/tech terms for retrieval
- If nothing notable happened, return empty array

## Analysis Instructions

1. Identify what went well (patterns to repeat)
2. Identify what went poorly (patterns to avoid)
3. Identify surprises (unexpected events)
4. Identify inefficiencies (wasted effort, bounces)
5. Extract actionable insights from each finding
`;

4.5 Learning Classification

How learnings are classified and stored:

typescript
// ─── src/memory/classification.ts ─────────────────────────
interface LearningClassifier {
  classify(
    content: string,
    context: string
  ): Promise<'episodic' | 'semantic' | 'procedural'>;
}

class HeuristicClassifier implements LearningClassifier {
  async classify(content: string, context: string): Promise<LearningType> {
    // Episodic indicators
    if (
      content.includes('PR #') ||
      content.includes('commit') ||
      content.match(/\d{4}-\d{2}-\d{2}/)  // Date
    ) {
      return 'episodic';
    }

    // Procedural indicators
    if (
      content.includes('when') ||
      content.includes('if') ||
      content.includes('strategy') ||
      content.includes('approach')
    ) {
      return 'procedural';
    }

    // Semantic (default)
    return 'semantic';
  }
}

4.6 Reflection Cost Budget

Reflection itself costs tokens — manage this:

typescript
// ─── src/orchestrator/reflection-budget.ts ────────────────
interface ReflectionBudget {
  maxCostPerReflection: number;  // USD
  maxCostPerRun: number;         // % of total run cost

  canAffordReflection(
    runCost: number,
    reflectionCost: number
  ): boolean {
    return (
      reflectionCost <= this.maxCostPerReflection &&
      reflectionCost <= runCost * this.maxCostPerRun
    );
  }
}

const DEFAULT_REFLECTION_BUDGET: ReflectionBudget = {
  maxCostPerReflection: 0.50,  // $0.50
  maxCostPerRun: 0.10,         // 10% of run cost
};

// Use a cheaper model for reflection
async function reflectOnRun(
  traceId: string,
  ctx: PipelineContext
): Promise<RunReflection> {
  const llm = ctx.llm.fastModel;  // Use Haiku, not Sonnet

  const reflection = await llm.chat({
    system: RUN_REFLECTION_PROMPT,
    // ...
  });

  // Check budget
  if (!ctx.reflectionBudget.canAffordReflection(ctx.cost.total, reflection.cost)) {
    console.warn('Reflection exceeded budget, skipping storage');
    return { summary, learnings: [] };
  }

  // Continue with storage...
}

4.7 Integration Timeline

Week 2: Basic run reflection (post-run only) Week 3: Mid-run reflection triggers (bounces, cost) Week 6: Reflection budget enforcement, metrics

5. Loop 4: Human Feedback Loop

5.1 What It Solves

The system learns from human behavior — both explicit feedback (dismissals, approvals) and implicit signals (which suggestions get applied, timing of dismissals).

5.2 Explicit Feedback Handlers

typescript
// ─── src/feedback/handlers.ts ─────────────────────────────

// ── Finding Dismissal ───────────────────────────────────────
async function onFindingDismissed(
  findingId: string,
  reason?: string,
  ctx: FeedbackContext
): Promise<void> {
  const finding = await ctx.db
    .select()
    .from(findings)
    .where(eq(findings.id, findingId))
    .get();

  if (!finding) return;

  // Mark as dismissed
  await ctx.db
    .update(findings)
    .set({
      dismissed: true,
      dismissedBy: reason ?? 'No reason provided',
    })
    .where(eq(findings.id, findingId));

  // Decrease confidence in related patterns
  const relatedPatterns = await ctx.memory.recall({
    context: `review finding: ${finding.category} in ${finding.file}`,
    type: 'semantic',
    limit: 5,
  });

  for (const pattern of relatedPatterns) {
    await ctx.memory.update(pattern.id, {
      confidence: Math.max(0, pattern.confidence - 0.2),  // Significant penalty
    });

    ctx.bus.emit({
      type: 'feedback.confidence_decreased',
      source: 'feedback-loop',
      traceId: ctx.traceId,
      payload: {
        memoryId: pattern.id,
        oldConfidence: pattern.confidence,
        newConfidence: pattern.confidence - 0.2,
        reason: 'finding_dismissed',
      },
    });
  }

  // Store dismissal as learning
  await ctx.memory.store({
    type: 'semantic',
    content: `Finding "${finding.message}" was dismissed by human. ${reason ?? ''}`,
    context: `reviewing ${finding.category} issues in ${finding.file}`,
    confidence: 0.7,  // Human-sourced = higher confidence
    source: `finding:${findingId}`,
    tags: [finding.category, finding.file ?? 'unknown'],
  });

  ctx.bus.emit({
    type: 'feedback.finding_dismissed',
    source: 'feedback-loop',
    traceId: ctx.traceId,
    payload: { findingId, reason },
  });
}

// ── Human Approved With Edits ──────────────────────────────
async function onHumanApprovedWithEdits(
  gateId: string,
  edits: string,
  ctx: FeedbackContext
): Promise<void> {
  // Learn from the edits
  await ctx.memory.store({
    type: 'procedural',
    content: `Human approved but made edits: ${edits}`,
    context: `gate approval at ${gateId}`,
    confidence: 0.8,  // High confidence, human-sourced
    source: `gate:${gateId}`,
    tags: ['human-edit', 'approval'],
  });

  // Analyze what changed
  const editAnalysis = await ctx.llm.chat({
    system: `Analyze what the human changed and why.`,
    messages: [{
      role: 'user',
      content: `The AI suggested an action, and the human approved it but made these edits:\n\n${edits}\n\nWhat pattern can we learn from this?`,
    }],
  });

  // Store the pattern
  await ctx.memory.store({
    type: 'procedural',
    content: editAnalysis.content,
    context: `gate approval edits`,
    confidence: 0.7,
    source: `gate:${gateId}`,
    tags: ['human-edit', 'preference'],
  });

  ctx.bus.emit({
    type: 'feedback.human_edited',
    source: 'feedback-loop',
    traceId: ctx.traceId,
    payload: { gateId, edits },
  });
}

// ── Suggested Fix Applied ──────────────────────────────────
async function onSuggestedFixApplied(
  findingId: string,
  ctx: FeedbackContext
): Promise<void> {
  const finding = await ctx.db
    .select()
    .from(findings)
    .where(eq(findings.id, findingId))
    .get();

  if (!finding || !finding.fix) return;

  // Boost confidence in the pattern that generated this fix
  const relatedPatterns = await ctx.memory.recall({
    context: `fix for ${finding.category} in ${finding.file}`,
    type: 'procedural',
    limit: 5,
  });

  for (const pattern of relatedPatterns) {
    await ctx.memory.update(pattern.id, {
      confidence: Math.min(1.0, pattern.confidence + 0.1),  // Modest boost
    });

    ctx.bus.emit({
      type: 'feedback.confidence_increased',
      source: 'feedback-loop',
      traceId: ctx.traceId,
      payload: {
        memoryId: pattern.id,
        oldConfidence: pattern.confidence,
        newConfidence: pattern.confidence + 0.1,
        reason: 'fix_applied',
      },
    });
  }

  ctx.bus.emit({
    type: 'feedback.fix_applied',
    source: 'feedback-loop',
    traceId: ctx.traceId,
    payload: { findingId },
  });
}

// ── Suggested Fix Rewritten ────────────────────────────────
async function onSuggestedFixRewritten(
  findingId: string,
  humanVersion: string,
  ctx: FeedbackContext
): Promise<void> {
  const finding = await ctx.db
    .select()
    .from(findings)
    .where(eq(findings.id, findingId))
    .get();

  if (!finding || !finding.fix) return;

  // Learn the preference difference
  await ctx.memory.store({
    type: 'procedural',
    content: `AI suggested: ${finding.fix}\nHuman preferred: ${humanVersion}`,
    context: `fixing ${finding.category} in ${finding.file}`,
    confidence: 0.8,  // High confidence from human feedback
    source: `finding:${findingId}`,
    tags: [finding.category, 'human-preference'],
  });

  // Analyze the difference
  const analysis = await ctx.llm.chat({
    system: `Compare AI and human fixes to extract preference patterns.`,
    messages: [{
      role: 'user',
      content: `AI fix: ${finding.fix}\nHuman fix: ${humanVersion}\n\nWhat pattern preference can we learn?`,
    }],
  });

  await ctx.memory.store({
    type: 'procedural',
    content: analysis.content,
    context: `${finding.category} fix preferences`,
    confidence: 0.7,
    source: `finding:${findingId}`,
    tags: [finding.category, 'preference', 'pattern'],
  });

  ctx.bus.emit({
    type: 'feedback.fix_rewritten',
    source: 'feedback-loop',
    traceId: ctx.traceId,
    payload: { findingId, humanVersion },
  });
}

// ── Human Added Comment ─────────────────────────────────────
async function onHumanAddedComment(
  pr: string,
  comment: string,
  ctx: FeedbackContext
): Promise<void> {
  // This is a gap the AI didn't catch
  await ctx.memory.store({
    type: 'semantic',
    content: `Human added comment on PR ${pr}: ${comment}`,
    context: `code review gaps`,
    confidence: 0.7,
    source: `pr:${pr}`,
    tags: ['review', 'gap', 'human-addition'],
  });

  // Analyze what the gap was
  const analysis = await ctx.llm.chat({
    system: `Identify what the AI reviewer missed.`,
    messages: [{
      role: 'user',
      content: `The AI reviewed a PR but didn't comment on this. A human did:\n\n"${comment}"\n\nWhat category of issue did the AI miss?`,
    }],
  });

  await ctx.memory.store({
    type: 'procedural',
    content: `Review gap identified: ${analysis.content}`,
    context: `code review completeness`,
    confidence: 0.7,
    source: `pr:${pr}`,
    tags: ['review', 'gap', 'improvement'],
  });

  ctx.bus.emit({
    type: 'feedback.human_comment_added',
    source: 'feedback-loop',
    traceId: ctx.traceId,
    payload: { pr, comment },
  });
}

5.3 Implicit Feedback Signals

Track signals that aren't explicit feedback but reveal preferences:

typescript
// ─── src/feedback/implicit.ts ─────────────────────────────
interface ImplicitSignal {
  type: string;
  value: number;  // Normalized 0-1 (positive signal)
  context: string;
}

// ── Dismissal Timing ────────────────────────────────────────
function onDismissalTiming(
  findingId: string,
  timeToDecisionMs: number
): ImplicitSignal {
  // Fast dismissal (< 5s) = obviously wrong
  // Slow dismissal (> 60s) = was worth considering

  const confidence = Math.min(1.0, timeToDecisionMs / 60000);  // 60s = 1.0

  return {
    type: 'dismissal_timing',
    value: confidence,
    context: `finding:${findingId}`,
  };
}

// ── Apply vs Ignore Rate ───────────────────────────────────
async function calculateSuggestionApplyRate(
  category: string,
  timeWindow: number = 30 * 24 * 60 * 60 * 1000  // 30 days
): Promise<number> {
  const recentFindings = await db
    .select()
    .from(findings)
    .where(
      and(
        eq(findings.category, category),
        gte(findings.createdAt, Date.now() - timeWindow)
      )
    );

  const suggested = recentFindings.filter(f => f.fixable);
  const applied = suggested.filter(f => !f.dismissed);

  return suggested.length > 0 ? applied.length / suggested.length : 0;
}

// Update memory confidence based on apply rate
async function adjustConfidenceByApplyRate(
  category: string,
  ctx: FeedbackContext
): Promise<void> {
  const applyRate = await calculateSuggestionApplyRate(category);

  const relatedPatterns = await ctx.memory.recall({
    context: category,
    type: 'procedural',
    limit: 10,
  });

  for (const pattern of relatedPatterns) {
    // Apply rate < 0.5 → decrease confidence
    // Apply rate > 0.7 → increase confidence
    const adjustment = (applyRate - 0.6) * 0.1;  // -0.1 to +0.1

    await ctx.memory.update(pattern.id, {
      confidence: Math.max(0, Math.min(1, pattern.confidence + adjustment)),
    });
  }
}

5.4 Integration Timeline

Week 3: Finding dismissal tracking and confidence penalty Week 6: Suggested fix tracking (apply vs rewrite) Post-MVP: Full implicit signal tracking and analysis

6. Loop 5: Production Loop (Interface Only, Post-MVP Implementation)

6.1 What It Will Do

After deployment, production metrics feed back into the system as new tasks or pattern updates. This loop closes the full cycle: code → deploy → monitor → learn → improve.

MVP Scope: Design the interface, stub the implementation, ship it later.

6.2 Production Feedback Interface

typescript
// ─── src/feedback/production.ts ───────────────────────────
interface ProductionFeedback {
  // Monitor detects anomaly correlated with recent deploy
  onAnomalyDetected(anomaly: {
    metric: string;           // "error_rate", "latency_p95"
    baseline: number;
    current: number;
    deploymentId: string;     // Which deploy caused this?
    severity: 'warning' | 'critical';
  }): Promise<void>;

  // Error report from production maps back to a code change
  onProductionError(error: {
    stack: string;
    frequency: number;
    firstSeen: Date;
    affectedUsers: number;
    relatedCommit?: string;   // Git blame correlation
  }): Promise<void>;

  // Health check failure
  onHealthCheckFailure(check: {
    endpoint: string;
    expected: number;  // Response code
    actual: number;
    deploymentId: string;
  }): Promise<void>;
}

// ─── MVP Implementation (Stubs) ────────────────────────────
class ProductionFeedbackHandler implements ProductionFeedback {
  async onAnomalyDetected(anomaly: AnomalyDetected): Promise<void> {
    // Log the event
    ctx.bus.emit({
      type: 'production.anomaly_detected',
      source: 'production-monitor',
      traceId: anomaly.deploymentId,
      payload: anomaly,
    });

    // TODO (post-MVP): Create a new task in the pipeline
    // "Investigate production error spike after deploy X"
    console.warn('Production anomaly detected (monitoring only):', anomaly);
  }

  async onProductionError(error: ProductionError): Promise<void> {
    ctx.bus.emit({
      type: 'production.error_detected',
      source: 'production-monitor',
      traceId: error.relatedCommit ?? 'unknown',
      payload: error,
    });

    // TODO (post-MVP): Correlate with recent changes, auto-file issue
    console.error('Production error detected (monitoring only):', error);
  }

  async onHealthCheckFailure(check: HealthCheckFailure): Promise<void> {
    ctx.bus.emit({
      type: 'production.health_check_failed',
      source: 'production-monitor',
      traceId: check.deploymentId,
      payload: check,
    });

    // TODO (post-MVP): Auto-rollback or create incident task
    console.error('Health check failed (monitoring only):', check);
  }
}

6.3 Future Implementation Plan

Post-MVP, the production loop will:

Receive signals from production monitoring (Datadog, Sentry, etc.)
Correlate errors with recent deployments (git blame + deploy time)
Create tasks automatically ("Fix production error in auth.ts:102")
Feed learnings into memory (error patterns, regression patterns)
Trigger rollbacks if health checks fail critically

6.4 Integration Timeline

Week 6: Interface design, event logging (stubs) Post-MVP (Q3): Full implementation with monitoring integration

7. Feedback Metrics (Cross-Loop)

7.1 Metric Definitions

typescript
// ─── src/safety/metrics.ts ────────────────────────────────
interface FeedbackMetrics {
  // ── Inner Loop Health ──────────────────────────────────
  avgIterationsPerPhase: {
    planning: number;
    implementation: number;
    review: number;
    testing: number;
  };

  toolSuccessRate: {
    [toolName: string]: {
      successRate: number;
      avgDuration: number;
      avgCost: number;
    };
  };

  stagnationRate: number;  // % of runs that hit stagnation

  // ── Phase Loop Health ──────────────────────────────────
  avgBouncesPerRun: {
    review: number;
    testing: number;
  };

  bounceResolutionRate: {
    review: number;  // % of bounces that fixed issues
    testing: number;
  };

  firstPassApprovalRate: number;  // % approved without bounces

  // ── Run Loop Health ────────────────────────────────────
  learningsPerRun: number;           // Avg learnings extracted
  learningApplicationRate: number;   // % of runs that recalled learnings
  memoryPrecision: number;           // % of recalled memories that were relevant

  // ── Human Loop Health ──────────────────────────────────
  findingDismissalRate: number;      // % of findings dismissed
  humanOverrideRate: number;         // % of decisions overridden
  timeToHumanResponse: number;       // Avg time for human to respond (ms)

  // ── Cross-Run Improvement ──────────────────────────────
  costPerRun: number;                // Trending down = efficiency improving
  successRateOverTime: number;       // Trending up = system learning
  firstPassApprovalRate: number;     // % of reviews approved without bounces
}

7.2 Metric Calculation

typescript
// ─── src/safety/metrics-calculator.ts ─────────────────────
class MetricsCalculator {
  async calculateFeedbackMetrics(
    timeWindow: number = 7 * 24 * 60 * 60 * 1000  // 7 days
  ): Promise<FeedbackMetrics> {
    const runs = await this.getRecentRuns(timeWindow);

    return {
      // Inner loop
      avgIterationsPerPhase: await this.calcAvgIterations(runs),
      toolSuccessRate: await this.calcToolSuccessRate(runs),
      stagnationRate: await this.calcStagnationRate(runs),

      // Phase loop
      avgBouncesPerRun: await this.calcAvgBounces(runs),
      bounceResolutionRate: await this.calcBounceResolution(runs),
      firstPassApprovalRate: await this.calcFirstPassApproval(runs),

      // Run loop
      learningsPerRun: await this.calcLearningsPerRun(runs),
      learningApplicationRate: await this.calcLearningApplication(runs),
      memoryPrecision: await this.calcMemoryPrecision(runs),

      // Human loop
      findingDismissalRate: await this.calcFindingDismissalRate(runs),
      humanOverrideRate: await this.calcHumanOverrideRate(runs),
      timeToHumanResponse: await this.calcTimeToResponse(runs),

      // Cross-run
      costPerRun: await this.calcAvgCostPerRun(runs),
      successRateOverTime: await this.calcSuccessRateTrend(runs),
    };
  }

  private async calcAvgIterations(
    runs: Run[]
  ): Promise<Record<PhaseName, number>> {
    const iterationsByPhase: Record<PhaseName, number[]> = {
      planning: [],
      implementation: [],
      review: [],
      testing: [],
    };

    for (const run of runs) {
      const events = await bus.replay(run.id);
      const iterationEvents = events.filter(e => e.type === 'agent.iteration');

      for (const phase of Object.keys(iterationsByPhase)) {
        const phaseIterations = iterationEvents.filter(
          e => e.payload.phase === phase
        );
        iterationsByPhase[phase].push(phaseIterations.length);
      }
    }

    return Object.entries(iterationsByPhase).reduce((acc, [phase, counts]) => {
      acc[phase] = counts.reduce((sum, c) => sum + c, 0) / counts.length;
      return acc;
    }, {} as Record<PhaseName, number>);
  }

  // Similar implementations for other metrics...
}

7.3 Metrics Dashboard (CLI)

typescript
// ─── src/cli/commands/metrics.ts ──────────────────────────
async function showFeedbackMetrics(options: { days?: number }): Promise<void> {
  const calculator = new MetricsCalculator();
  const metrics = await calculator.calculateFeedbackMetrics(
    (options.days ?? 7) * 24 * 60 * 60 * 1000
  );

  console.log(`
╔══════════════════════════════════════════════════════════════╗
║              FEEDBACK LOOP HEALTH (Last ${options.days ?? 7} days)              ║
╚══════════════════════════════════════════════════════════════╝

┌─ INNER LOOP ───────────────────────────────────────────────┐
│ Avg Iterations per Phase                                   │
│   Planning:        ${metrics.avgIterationsPerPhase.planning.toFixed(1).padEnd(8)} │
│   Implementation:  ${metrics.avgIterationsPerPhase.implementation.toFixed(1).padEnd(8)} │
│   Review:          ${metrics.avgIterationsPerPhase.review.toFixed(1).padEnd(8)} │
│   Testing:         ${metrics.avgIterationsPerPhase.testing.toFixed(1).padEnd(8)} │
│ Tool Success Rate: ${(metrics.toolSuccessRate['all'] * 100).toFixed(1)}%        │
└────────────────────────────────────────────────────────────┘

┌─ PHASE LOOP ───────────────────────────────────────────────┐
│ Avg Bounces per Run                                        │
│   Review:  ${metrics.avgBouncesPerRun.review.toFixed(2).padEnd(8)}                │
│   Testing: ${metrics.avgBouncesPerRun.testing.toFixed(2).padEnd(8)}                │
│ Bounce Resolution Rate: ${(metrics.bounceResolutionRate.review * 100).toFixed(1)}%  │
│ First Pass Approval:    ${(metrics.firstPassApprovalRate * 100).toFixed(1)}%        │
└────────────────────────────────────────────────────────────┘

┌─ RUN LOOP ─────────────────────────────────────────────────┐
│ Learnings per Run:        ${metrics.learningsPerRun.toFixed(1).padEnd(8)}       │
│ Learning Application:     ${(metrics.learningApplicationRate * 100).toFixed(1)}%  │
│ Memory Precision:         ${(metrics.memoryPrecision * 100).toFixed(1)}%          │
└────────────────────────────────────────────────────────────┘

┌─ HUMAN LOOP ───────────────────────────────────────────────┐
│ Finding Dismissal Rate:   ${(metrics.findingDismissalRate * 100).toFixed(1)}%     │
│ Human Override Rate:      ${(metrics.humanOverrideRate * 100).toFixed(1)}%        │
│ Avg Response Time:        ${(metrics.timeToHumanResponse / 60000).toFixed(1)} min │
└────────────────────────────────────────────────────────────┘

┌─ CROSS-RUN TRENDS ─────────────────────────────────────────┐
│ Avg Cost per Run:        $${metrics.costPerRun.toFixed(2).padEnd(8)}      │
│ Success Rate:             ${(metrics.successRateOverTime * 100).toFixed(1)}%      │
└────────────────────────────────────────────────────────────┘
  `);
}

8. Feedback Store (Database Schema)

8.1 Additional Tables Needed

The feedback loops require additional tables beyond the core schema:

typescript
// ─── src/memory/schema.ts (additions) ─────────────────────
import { sqliteTable, text, integer, real } from 'drizzle-orm/sqlite-core';

// ─── Feedback Events ───────────────────────────────────────
export const feedbackEvents = sqliteTable('feedback_events', {
  id: text('id').primaryKey(),
  traceId: text('trace_id').notNull(),
  type: text('type').notNull(),  // dismissal, approval, edit, etc.
  findingId: text('finding_id'),  // If related to a finding
  gateId: text('gate_id'),        // If related to a gate
  humanId: text('human_id'),      // Who provided feedback
  content: text('content', { mode: 'json' }),  // Feedback details
  timestamp: integer('timestamp', { mode: 'timestamp_ms' }).notNull(),
});

// ─── Bounce History ────────────────────────────────────────
export const bounces = sqliteTable('bounces', {
  id: text('id').primaryKey(),
  runId: text('run_id').notNull(),
  phase: text('phase').notNull(),  // review | testing
  bounceNumber: integer('bounce_number').notNull(),
  findingsCount: integer('findings_count'),
  failuresCount: integer('failures_count'),
  resolved: integer('resolved', { mode: 'boolean' }).default(false),
  timestamp: integer('timestamp', { mode: 'timestamp_ms' }).notNull(),
});

// ─── Reflections ───────────────────────────────────────────
export const reflections = sqliteTable('reflections', {
  id: text('id').primaryKey(),
  runId: text('run_id').notNull(),
  trigger: text('trigger').notNull(),  // run_completed, cost_exceeded, etc.
  learningsCount: integer('learnings_count').notNull(),
  costUsd: real('cost_usd').notNull(),
  timestamp: integer('timestamp', { mode: 'timestamp_ms' }).notNull(),
});

// ─── Metrics Snapshots ─────────────────────────────────────
export const metricsSnapshots = sqliteTable('metrics_snapshots', {
  id: text('id').primaryKey(),
  timestamp: integer('timestamp', { mode: 'timestamp_ms' }).notNull(),
  metrics: text('metrics', { mode: 'json' }).notNull(),  // Full FeedbackMetrics object
});

8.2 Query Patterns

Common queries for feedback analysis:

typescript
// ─── src/memory/queries.ts ────────────────────────────────

// Get all bounces for a run
async function getRunBounces(runId: string): Promise<Bounce[]> {
  return db
    .select()
    .from(bounces)
    .where(eq(bounces.runId, runId))
    .orderBy(bounces.timestamp);
}

// Get dismissal rate for a category over time
async function getDismissalRateTrend(
  category: string,
  timeWindow: number
): Promise<{ date: Date; rate: number }[]> {
  const findings = await db
    .select()
    .from(findings)
    .where(
      and(
        eq(findings.category, category),
        gte(findings.createdAt, Date.now() - timeWindow)
      )
    );

  // Group by day, calculate rate
  const byDay = groupByDay(findings);
  return byDay.map(day => ({
    date: day.date,
    rate: day.findings.filter(f => f.dismissed).length / day.findings.length,
  }));
}

// Get most effective learnings (high access count, high confidence)
async function getTopLearnings(limit: number = 10): Promise<Memory[]> {
  return db
    .select()
    .from(memories)
    .where(gte(memories.confidence, 0.7))
    .orderBy(desc(memories.accessCount))
    .limit(limit);
}

// Get runs with diminishing returns
async function getRunsWithDiminishingReturns(): Promise<Run[]> {
  const runs = await db
    .select()
    .from(runs)
    .where(eq(runs.status, 'completed'));

  const runsWithDR = [];
  for (const run of runs) {
    const events = await bus.replay(run.id);
    const drEvents = events.filter(e => e.type === 'loop.diminishing_returns');
    if (drEvents.length > 0) {
      runsWithDR.push(run);
    }
  }

  return runsWithDR;
}

9. Integration Timeline (Exact Mapping to Build Order)

This maps each feedback loop piece to the exact week it ships:

Week 1 ── Core + Inner Loop
├── Event bus (emit, subscribe, replay)
├── Events table (SQLite)
├── Bus replay functionality
├── Tool result → working memory flow (already in base agent)
└── Inner loop metrics tracking

Week 2 ── Memory + Run Loop Foundation
├── Memory store (CRUD + similarity search)
├── Learning classification (episodic/semantic/procedural)
├── Run reflection (basic, post-run only)
├── RUN_REFLECTION_PROMPT
├── Learnings storage
└── Reflection budget enforcement

Week 3 ── Reviewer + Phase Loop (Review Bounce)
├── Review bounce logic (review → fix → re-review)
├── Finding dismissal tracking
├── onFindingDismissed() handler
├── Confidence penalty on dismissal
├── PhaseLoopConfig type
└── Review bounce metrics

Week 4 ── Tester + Phase Loop (Test Bounce)
├── Test bounce logic (test → fix → re-test)
├── FailureAnalysis type
├── Test failure → fix → re-test flow
├── Failure pattern memory
└── Test bounce metrics

Week 5 ── Planner + Implementer
├── Implementer fix mode (separate from fresh implementation)
├── IMPLEMENTER_FIX_PROMPT
├── IMPLEMENTER_FIX_FAILURES_PROMPT
└── Fix quality tracking

Week 6 ── Orchestrator + Full Phase Loop
├── runPipelineWithBounces() — full implementation
├── Diminishing returns detection
├── Mid-run reflection triggers (bounces, cost, error rate)
├── Feedback metrics calculation
├── Metrics dashboard (CLI)
├── onSuggestedFixApplied() handler
├── onSuggestedFixRewritten() handler
└── Production loop interface (stubs only)

Week 7 ── Polish
├── CLI metrics command
├── Metrics snapshots (periodic storage)
├── Feedback visualization
└── Documentation

Post-MVP (Q3+)
├── Production monitoring integration
├── Full production loop implementation
├── Implicit signal tracking (apply rate, timing, etc.)
├── Meta-reflection (reflect on reflection quality)
└── Cross-run pattern consolidation

10. Acceptance Criteria

The feedback loops are complete when:

Loop 1 (Inner Loop)

Tool results flow into next iteration's working memory
Inner loop metrics (iterations, tool success rate) are tracked
Stagnation detection triggers reflection

Loop 2 (Phase Loop)

Review findings bounce back to implementer with structured feedback
Test failures bounce back to implementer with root cause analysis
Implementer prompt changes based on fix mode
Diminishing returns detection halts unproductive bounces
Bounce history is persisted and queryable

Loop 3 (Run Loop)

Post-run reflection extracts 3-7 learnings per run
Learnings are classified (episodic/semantic/procedural)
Mid-run reflection triggers on bounces, cost, error rate
Reflection cost budget is enforced
Learnings are stored in memory with appropriate confidence

Loop 4 (Human Loop)

Finding dismissal decreases pattern confidence
Suggested fix application increases pattern confidence
Human edits are learned as preferences
Gap detection when human adds comments AI missed
All human feedback is logged as events

Loop 5 (Production Loop)

ProductionFeedback interface is defined
Stub implementations log events
Post-MVP plan is documented

Cross-Loop Metrics

All 15+ feedback metrics are calculable
Metrics dashboard shows trends over 7/30/90 days
Metrics snapshots are stored periodically
CLI command forge metrics shows feedback health

11. Example: Full Feedback Flow

Here's how all loops work together in one run:

1. User runs: forge run "add auth"

2. PLANNING PHASE
   - Planner recalls memories about auth (Run Loop 3)
   - Finds procedural memory: "Use JWT with 24h expiry" (confidence: 0.8)
   - Creates plan

3. IMPLEMENTATION PHASE
   - Implementer generates code
   - Uses recalled JWT pattern
   - Each tool call feeds back into next iteration (Inner Loop 1)
   - 8 iterations total (metrics tracked)

4. REVIEW PHASE (first pass)
   - Reviewer finds 3 issues:
     - Critical: SQL injection vulnerability
     - Error: Missing error handling
     - Warning: Inconsistent naming

5. PHASE BOUNCE #1 (Loop 2)
   - Critical + Error findings → bounce back to implementation
   - Implementer receives structured feedback
   - Prompt switches to fix mode
   - Implements fixes

6. REVIEW PHASE (second pass)
   - Critical issue fixed ✓
   - Error issue fixed ✓
   - Warning issue remains (but not critical)
   - Approved

7. TESTING PHASE (first pass)
   - 12 tests run, 2 fail
   - Tester analyzes root cause: "Mock not cleared between tests"
   - Suggests fix: "Add clearAllMocks() in beforeEach"

8. PHASE BOUNCE #2 (Loop 2)
   - Implementer receives failure analysis
   - Applies suggested fix
   - Re-runs tests → all pass ✓

9. DEPLOYMENT PHASE
   - Human gate: production deploy requires approval
   - Human reviews, approves with minor edit (Loop 4)
   - System learns: "Human prefers explicit type annotations" (confidence: 0.8)

10. POST-RUN REFLECTION (Loop 3)
    - Replay 147 events from this run
    - Extract 5 learnings:
      - Episodic: "Run #42 fixed auth SQL injection"
      - Semantic: "Auth endpoints should validate JWT expiry"
      - Procedural: "When mock tests fail, add clearAllMocks()"
      - Semantic: "Human prefers explicit types in auth code"
      - Procedural: "Review bounced once on error handling — add earlier"
    - Store learnings with confidence 0.5-0.8

11. NEXT RUN (1 day later)
    - User runs: forge run "add password reset"
    - Planner recalls JWT pattern (confidence now 0.9, reinforced)
    - Recalls error handling lesson → plans it upfront
    - Recalls mock clearing pattern → implementer applies it
    - First pass approval (no bounces) — system is learning ✓

12. HUMAN DISMISSES A FINDING (Loop 4)
    - Week later, human dismisses a "missing null check" finding
    - System decreases confidence in related pattern from 0.7 → 0.5
    - Stores dismissal as learning for future

13. PRODUCTION MONITORING (Loop 5, post-MVP)
    - Deploy goes live
    - Monitoring detects latency spike
    - Correlates with recent auth change
    - Creates new task: "Investigate auth endpoint latency"
    - Feeds into next planning cycle

12. Open Questions & Future Work

Resolved in This Plan

✓ How tool results feed back into reasoning (Inner Loop)
✓ How review/test failures bounce back to implementer (Phase Loop)
✓ How to detect diminishing returns in bounce cycles
✓ What triggers run reflection (events, not just completion)
✓ How human feedback adjusts confidence scores

Deferred to Post-MVP

How to weight recent vs. historical learnings? (Recency decay)
When to consolidate similar learnings into patterns?
How to share learnings across agent instances/teams?
How to validate that reflection is extracting useful insights?
What's the optimal reflection cost budget?

This plan provides complete implementations for Loops 1-4, interface design for Loop 5, and exact integration into the 8-week build schedule. Feedback loops are the nervous system — they ship with Week 1 foundations and complete by Week 6.