33 min
architecture
February 8, 2026

Implementation Plan: Agent Designs (Section 7)

Implementation Plan: Agent Designs (Section 7)

Plan: 07-agent-designs.md Date: 2026-02-07 Scope: Detailed implementation specifications for all five specialized agents in the Forge system Dependencies: Core abstractions (types.ts, base.ts), Tool Layer, Memory Layer


Overview

This plan provides complete implementation specifications for the five core agents in the Forge pipeline. Each agent extends BaseAgent and follows the perceive → reason → act → learn loop, with specialized prompts, tools, and decision logic.

Agents covered:

  1. Planner Agent — Requirements → Architecture + Task List
  2. Implementer Agent — Tasks → Code + Tests
  3. Reviewer Agent — Code → Findings + Risk Score
  4. Tester Agent — Code → Test Results + Analysis
  5. Deployer Agent — Validated Code → Deployment

Each agent specification includes:

  • Full system prompt text
  • Available tools
  • Input/output types
  • Decision flow logic
  • Class implementation extending BaseAgent

1. Planner Agent

1.1 Purpose

Transforms natural language requirements into structured implementation plans with architecture decisions, task decomposition, and risk assessment.

1.2 System Prompt

markdown
# ROLE You are a senior software architect creating implementation plans from requirements. # TASK Given a task description and codebase context, you will: 1. Analyze the requirements and decompose them into implementable units 2. Design system architecture and component interfaces 3. Create an ordered task list with dependencies 4. Assess implementation risk and complexity 5. Estimate effort and identify blockers # AVAILABLE TOOLS - read_file(path): Read file contents - glob(pattern): Find files matching pattern - grep(pattern, path?): Search for text in files - llm_analyze(text, question): Ask follow-up questions to an LLM # ARCHITECTURE DECISIONS When designing architecture: - Prefer existing patterns in the codebase - Minimize new dependencies - Consider testability and maintainability - Document trade-offs explicitly - Flag breaking changes # RISK ASSESSMENT Risk level is determined by: - **Low**: Bug fix, UI tweak, refactor of non-critical code - **Medium**: New feature, moderate complexity, some integration needed - **High**: Core system changes, breaking changes, security-sensitive - **Critical**: Authentication, payment processing, data integrity Risk score components: - Complexity (cyclomatic, cognitive) - Criticality (affects core business logic) - Change size (lines of code) - Test coverage (existing vs. new) - Uncertainty (novel patterns, unclear requirements) # OUTPUT FORMAT Return a structured ImplementationPlan with: ```typescript { architecture: { components: Component[], // What needs to be built/changed interfaces: InterfaceSpec[], // APIs, types, contracts decisions: ArchDecision[], // Key decisions with rationale dependencies: string[] // New packages needed }, tasks: Task[], // Ordered list with dependencies risk: { level: 'low' | 'medium' | 'high' | 'critical', score: number, // 0-100 factors: { complexity, criticality, changeSize, coverage, uncertainty }, mitigations: string[] // How to reduce risk }, estimates: { complexity: number, // Story points or T-shirt size effort: number, // Hours estimate uncertainty: 'low' | 'medium' | 'high' } }

MEMORY INTEGRATION

Before planning, query memory for:

  • Similar tasks (context: task description)
  • Architecture patterns (context: feature area)
  • Past failures (context: similar requirements)

Apply relevant learnings to this plan.

ITERATION

You may use tools multiple times to:

  • Explore the codebase structure
  • Understand existing patterns
  • Validate assumptions
  • Refine the plan

When done, return { done: true, result: implementationPlan }.


### 1.3 Tools Available

```typescript
const plannerTools: Tool[] = [
  {
    name: 'read_file',
    description: 'Read the contents of a file',
    schema: {
      input: z.object({ path: z.string() }),
      output: z.object({ content: z.string() })
    },
    execute: async ({ path }) => {
      return { content: await fs.readFile(path, 'utf-8') };
    }
  },
  {
    name: 'glob',
    description: 'Find files matching a glob pattern',
    schema: {
      input: z.object({ pattern: z.string() }),
      output: z.object({ files: z.array(z.string()) })
    },
    execute: async ({ pattern }) => {
      return { files: await glob(pattern) };
    }
  },
  {
    name: 'grep',
    description: 'Search for text in files',
    schema: {
      input: z.object({
        pattern: z.string(),
        path: z.string().optional()
      }),
      output: z.object({
        matches: z.array(z.object({
          file: z.string(),
          line: z.number(),
          content: z.string()
        }))
      })
    },
    execute: async ({ pattern, path }) => {
      return { matches: await searchFiles(pattern, path) };
    }
  },
  {
    name: 'llm_analyze',
    description: 'Ask a follow-up question to analyze context',
    schema: {
      input: z.object({ text: z.string(), question: z.string() }),
      output: z.object({ answer: z.string() })
    },
    execute: async ({ text, question }, ctx) => {
      const response = await ctx.llm.chat({
        system: 'You are a helpful assistant analyzing code and requirements.',
        messages: [{ role: 'user', content: `${text}\n\nQuestion: ${question}` }],
        temperature: 0.2
      });
      return { answer: response.content };
    }
  }
];

1.4 Input/Output Types

typescript
interface PlannerInput { task: string; // Natural language description codebaseContext?: { rootPath: string; language: string; frameworks: string[]; existingPatterns?: string[]; }; } interface ImplementationPlan { architecture: { components: Component[]; interfaces: InterfaceSpec[]; decisions: ArchDecision[]; dependencies: string[]; }; tasks: Task[]; risk: RiskAssessment; estimates: { complexity: number; effort: number; uncertainty: 'low' | 'medium' | 'high'; }; } interface Component { id: string; name: string; type: 'service' | 'library' | 'function' | 'module'; responsibilities: string[]; interfaces: string[]; dependencies: string[]; } interface InterfaceSpec { id: string; name: string; type: 'api' | 'function' | 'type' | 'event'; signature: string; documentation: string; } interface ArchDecision { id: string; title: string; context: string; decision: string; consequences: string[]; alternatives: string[]; } interface Task { id: string; title: string; description: string; type: 'implement' | 'test' | 'document' | 'refactor'; dependencies: string[]; // Task IDs that must complete first estimatedEffort: number; // Hours files: string[]; // Files expected to change } interface RiskAssessment { level: 'low' | 'medium' | 'high' | 'critical'; score: number; // 0-100 factors: { complexity: number; // 0-20 criticality: number; // 0-20 changeSize: number; // 0-20 coverage: number; // 0-20 uncertainty: number; // 0-20 }; mitigations: string[]; }

1.5 Decision Flow

┌─────────────────────────────────────────┐
│ 1. PERCEIVE                             │
│   - Load task description               │
│   - Query memory for similar tasks      │
│   - Explore codebase structure          │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 2. REASON - Decompose Requirements      │
│   - Break down task into components     │
│   - Identify what exists vs. new        │
│   - Map to existing patterns            │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 3. REASON - Design Architecture         │
│   - Define component boundaries         │
│   - Specify interfaces                  │
│   - Make technology decisions           │
│   - Document trade-offs                 │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 4. REASON - Create Task List            │
│   - Order tasks by dependencies         │
│   - Estimate effort per task            │
│   - Identify critical path              │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 5. REASON - Assess Risk                 │
│   - Calculate risk score                │
│   - Identify risk factors               │
│   - Suggest mitigations                 │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 6. OUTPUT - Return ImplementationPlan   │
└─────────────────────────────────────────┘

1.6 Risk Scoring Logic

typescript
function calculateRiskScore(plan: ImplementationPlan): number { // Each factor scored 0-20 const factors = { complexity: scoreComplexity(plan), // Cyclomatic + cognitive criticality: scoreCriticality(plan), // Business impact changeSize: scoreChangeSize(plan), // Lines of code coverage: scoreCoverage(plan), // Test coverage delta uncertainty: scoreUncertainty(plan) // Novel patterns, unclear reqs }; return Object.values(factors).reduce((a, b) => a + b, 0); } function scoreComplexity(plan: ImplementationPlan): number { // Estimate cyclomatic complexity from task descriptions const estimatedComplexity = plan.tasks.reduce((acc, task) => { if (task.description.includes('conditional') || task.description.includes('loop') || task.description.includes('switch')) { return acc + 5; } return acc + 1; }, 0); // Normalize to 0-20 return Math.min(20, estimatedComplexity / plan.tasks.length); } function scoreCriticality(plan: ImplementationPlan): number { const criticalKeywords = [ 'auth', 'payment', 'security', 'data integrity', 'user data', 'credentials', 'encryption' ]; const taskText = plan.tasks.map(t => t.description).join(' ').toLowerCase(); const matches = criticalKeywords.filter(kw => taskText.includes(kw)).length; return Math.min(20, matches * 5); } function scoreChangeSize(plan: ImplementationPlan): number { const totalFiles = new Set(plan.tasks.flatMap(t => t.files)).size; if (totalFiles < 3) return 2; if (totalFiles < 10) return 8; if (totalFiles < 20) return 14; return 20; } function scoreCoverage(plan: ImplementationPlan): number { // How much new code vs. existing code with tests const hasTests = plan.tasks.some(t => t.type === 'test'); const hasNewCode = plan.tasks.some(t => t.type === 'implement'); if (!hasNewCode) return 0; if (hasTests) return 5; return 15; // High risk if no tests planned } function scoreUncertainty(plan: ImplementationPlan): number { return plan.estimates.uncertainty === 'high' ? 15 : plan.estimates.uncertainty === 'medium' ? 8 : 3; } function determineRiskLevel(score: number): RiskLevel { if (score < 20) return 'low'; if (score < 50) return 'medium'; if (score < 70) return 'high'; return 'critical'; }

1.7 Memory Queries

Before planning, the agent queries memory for relevant patterns:

typescript
async function queryRelevantMemories( task: string, ctx: AgentContext ): Promise<Memory[]> { const queries = [ { type: 'episodic' as const, context: `planning similar task: ${task}`, limit: 5 }, { type: 'semantic' as const, context: `architecture patterns for ${extractFeatureArea(task)}`, limit: 5 }, { type: 'procedural' as const, context: `planning strategy`, limit: 3 } ]; const results = await Promise.all( queries.map(q => ctx.memory.recall(q)) ); return results.flat(); } function extractFeatureArea(task: string): string { // Simple keyword extraction - could use LLM for better results const keywords = ['auth', 'payment', 'search', 'notification', 'analytics']; return keywords.find(k => task.toLowerCase().includes(k)) || 'general'; }

1.8 Class Implementation

typescript
import { BaseAgent } from './base'; import type { Agent, PhaseInput, PhaseOutput, AgentContext } from '../core/types'; export class PlannerAgent extends BaseAgent { type = 'planner' as const; tools = plannerTools; systemPrompt = PLANNER_SYSTEM_PROMPT; // From section 1.2 async execute( input: PlannerInput, ctx: AgentContext ): Promise<ImplementationPlan> { // This calls the base agent loop which handles: // - Iteration tracking // - Circuit breaker checks // - Tool execution // - Reflection on completion return super.execute(input, ctx) as Promise<ImplementationPlan>; } protected buildPrompt( input: PlannerInput, memories: Memory[] ): string { let prompt = `# Task\n${input.task}\n\n`; if (input.codebaseContext) { prompt += `# Codebase Context\n`; prompt += `- Language: ${input.codebaseContext.language}\n`; prompt += `- Frameworks: ${input.codebaseContext.frameworks.join(', ')}\n`; prompt += `- Root: ${input.codebaseContext.rootPath}\n\n`; } if (memories.length > 0) { prompt += `# Relevant Learnings\n`; memories.forEach(m => { prompt += `- ${m.content} (confidence: ${m.confidence.toFixed(2)})\n`; }); prompt += '\n'; } prompt += `Create an implementation plan following the OUTPUT FORMAT.`; return prompt; } }

2. Implementer Agent

2.1 Purpose

Takes an implementation plan and generates code + tests, with self-validation and refinement based on feedback from the Reviewer.

2.2 System Prompt

markdown
# ROLE You are a senior software engineer implementing features from specifications. # TASK Given an implementation plan (or a plan + fix findings from review), you will: 1. Read relevant files to understand existing patterns 2. Generate code following the task specifications 3. Create comprehensive tests for new code 4. Self-validate: run typecheck and tests 5. Fix issues if validation fails 6. If this is a bounce-back from review, address specific findings # AVAILABLE TOOLS - read_file(path): Read file contents - write_file(path, content): Write/overwrite a file - run_command(cmd): Execute a shell command (typecheck, test, lint) - search_code(query): Search for code patterns - llm_generate(prompt, context): Generate code with LLM assistance # IMPLEMENTATION STRATEGY **File-by-File Approach (MVP):** - Process tasks sequentially - For each task, identify target files - Read existing code, understand patterns - Generate changes, write files - Validate changes (typecheck, test) - Fix issues before moving to next task **Future: Feature-by-Feature with Swarm:** - Spawn sub-agents for parallel implementation - Each sub-agent handles one feature/module - Coordinate integration # SELF-VALIDATION LOOP After writing code: 1. Run typecheck → fix type errors 2. Run affected tests → fix failures 3. Run lint → auto-fix formatting 4. Loop until clean or max iterations (5) # HANDLING REVIEW FEEDBACK If input includes `fixFindings`: - Read each finding (file, line, message) - Understand the issue - Apply fix - Validate fix doesn't break tests - Re-run self-validation # OUTPUT FORMAT ```typescript { files: FileChange[], testsAdded: string[], validated: boolean, validationResults: { typecheck: { passed: boolean, errors: string[] }, tests: { passed: boolean, failures: string[] }, lint: { passed: boolean, issues: string[] } } }

MEMORY INTEGRATION

Query memory for:

  • Implementation patterns (context: language + feature)
  • Common pitfalls (context: similar tasks)
  • Test strategies (context: file type)

ITERATION

Use tools iteratively to:

  • Read code to understand patterns
  • Generate code
  • Validate code
  • Fix issues

When done and validated, return { done: true, result: codeChanges }.


### 2.3 Tools Available

```typescript
const implementerTools: Tool[] = [
  {
    name: 'read_file',
    description: 'Read file contents',
    schema: {
      input: z.object({ path: z.string() }),
      output: z.object({ content: z.string() })
    },
    execute: async ({ path }) => {
      return { content: await fs.readFile(path, 'utf-8') };
    }
  },
  {
    name: 'write_file',
    description: 'Write or overwrite a file',
    schema: {
      input: z.object({ path: z.string(), content: z.string() }),
      output: z.object({ success: z.boolean() })
    },
    execute: async ({ path, content }) => {
      await fs.writeFile(path, content, 'utf-8');
      return { success: true };
    }
  },
  {
    name: 'run_command',
    description: 'Execute a shell command',
    schema: {
      input: z.object({ cmd: z.string() }),
      output: z.object({
        exitCode: z.number(),
        stdout: z.string(),
        stderr: z.string()
      })
    },
    execute: async ({ cmd }, ctx) => {
      return await ctx.shell.exec(cmd);
    }
  },
  {
    name: 'search_code',
    description: 'Search for code patterns',
    schema: {
      input: z.object({ query: z.string(), filePattern: z.string().optional() }),
      output: z.object({
        results: z.array(z.object({
          file: z.string(),
          line: z.number(),
          snippet: z.string()
        }))
      })
    },
    execute: async ({ query, filePattern }) => {
      return { results: await searchCodebase(query, filePattern) };
    }
  },
  {
    name: 'llm_generate',
    description: 'Generate code using LLM',
    schema: {
      input: z.object({ prompt: z.string(), context: z.string().optional() }),
      output: z.object({ code: z.string() })
    },
    execute: async ({ prompt, context }, ctx) => {
      const response = await ctx.llm.chat({
        system: 'You are a code generation assistant. Output only code, no explanations.',
        messages: [{
          role: 'user',
          content: context ? `${context}\n\n${prompt}` : prompt
        }],
        temperature: 0.2
      });
      return { code: response.content };
    }
  }
];

2.4 Input/Output Types

typescript
interface ImplementerInput { plan: ImplementationPlan; // Bounce-back from reviewer existingCode?: CodeChanges; fixFindings?: Finding[]; // Bounce-back from tester fixFailures?: FailureAnalysis[]; } interface CodeChanges { files: FileChange[]; testsAdded: string[]; validated: boolean; validationResults: { typecheck: ValidationResult; tests: ValidationResult; lint: ValidationResult; }; } interface FileChange { path: string; action: 'created' | 'modified' | 'deleted'; before?: string; after: string; diff?: string; } interface ValidationResult { passed: boolean; errors?: string[]; warnings?: string[]; }

2.5 Implementation Strategy

┌─────────────────────────────────────────┐
│ 1. PERCEIVE                             │
│   - Load implementation plan            │
│   - Query memory for patterns           │
│   - If bounce-back, load findings       │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 2. For each task in plan.tasks:         │
│   ┌─────────────────────────────────┐   │
│   │ a. Read target files            │   │
│   │ b. Understand existing patterns │   │
│   │ c. Generate code                │   │
│   │ d. Write files                  │   │
│   │ e. Generate tests               │   │
│   │ f. Write test files             │   │
│   └──────────────┬──────────────────┘   │
│                  │                       │
└──────────────────┼───────────────────────┘
                   ▼
┌─────────────────────────────────────────┐
│ 3. SELF-VALIDATION LOOP                 │
│   (max 5 iterations)                    │
│   ┌─────────────────────────────────┐   │
│   │ a. Run typecheck                │   │
│   │ b. If errors → fix → retry      │   │
│   │ c. Run tests                    │   │
│   │ d. If failures → fix → retry    │   │
│   │ e. Run lint                     │   │
│   │ f. If issues → auto-fix         │   │
│   └─────────────────────────────────┘   │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 4. OUTPUT - Return CodeChanges          │
└─────────────────────────────────────────┘

2.6 Self-Validation Loop

typescript
async function selfValidate( files: FileChange[], ctx: AgentContext ): Promise<ValidationResults> { let iteration = 0; const maxIterations = 5; while (iteration < maxIterations) { iteration++; // Typecheck const typecheckResult = await ctx.shell.exec('bun run typecheck'); if (typecheckResult.exitCode !== 0) { // Try to fix type errors const fixes = await generateTypeFixes(typecheckResult.stderr, ctx); if (fixes) { await applyFixes(fixes); continue; // Retry } else { // Can't auto-fix, return error return { typecheck: { passed: false, errors: [typecheckResult.stderr] }, tests: { passed: false }, lint: { passed: false } }; } } // Test const testResult = await ctx.shell.exec('bun test'); if (testResult.exitCode !== 0) { const fixes = await generateTestFixes(testResult.stderr, ctx); if (fixes) { await applyFixes(fixes); continue; } else { return { typecheck: { passed: true }, tests: { passed: false, errors: [testResult.stderr] }, lint: { passed: false } }; } } // Lint const lintResult = await ctx.shell.exec('bun run lint --fix'); // All passed return { typecheck: { passed: true }, tests: { passed: true }, lint: { passed: lintResult.exitCode === 0 } }; } // Max iterations reached throw new Error('Self-validation failed after max iterations'); }

2.7 Handling Bounce-Back from Reviewer

When the reviewer sends back findings:

typescript
async function handleReviewFindings( findings: Finding[], existingCode: CodeChanges, ctx: AgentContext ): Promise<CodeChanges> { for (const finding of findings) { // Read the file with the issue const file = await ctx.tools.read_file({ path: finding.file }); // Generate fix const fix = await ctx.llm.chat({ system: 'You are fixing code review findings.', messages: [{ role: 'user', content: ` File: ${finding.file} Line: ${finding.line} Issue: ${finding.message} Suggested fix: ${finding.suggestedFix || 'none'} Current code: ${file.content} Fix the issue and return the complete updated file. ` }], temperature: 0.1 }); // Write fixed file await ctx.tools.write_file({ path: finding.file, content: fix.content }); } // Re-validate const validationResults = await selfValidate(existingCode.files, ctx); return { ...existingCode, validated: validationResults.typecheck.passed && validationResults.tests.passed, validationResults }; }

2.8 Class Implementation

typescript
export class ImplementerAgent extends BaseAgent { type = 'implementer' as const; tools = implementerTools; systemPrompt = IMPLEMENTER_SYSTEM_PROMPT; async execute( input: ImplementerInput, ctx: AgentContext ): Promise<CodeChanges> { // If this is a bounce-back, handle findings first if (input.fixFindings && input.existingCode) { return this.handleReviewFindings( input.fixFindings, input.existingCode, ctx ); } if (input.fixFailures && input.existingCode) { return this.handleTestFailures( input.fixFailures, input.existingCode, ctx ); } // Normal implementation flow return super.execute(input, ctx) as Promise<CodeChanges>; } protected buildPrompt( input: ImplementerInput, memories: Memory[] ): string { let prompt = `# Implementation Plan\n`; input.plan.tasks.forEach((task, i) => { prompt += `\n## Task ${i + 1}: ${task.title}\n`; prompt += `${task.description}\n`; prompt += `Files: ${task.files.join(', ')}\n`; }); if (memories.length > 0) { prompt += `\n# Relevant Patterns\n`; memories.forEach(m => { prompt += `- ${m.content}\n`; }); } prompt += `\nImplement the tasks following the strategy in the system prompt.`; return prompt; } private async handleReviewFindings( findings: Finding[], existingCode: CodeChanges, ctx: AgentContext ): Promise<CodeChanges> { // Implementation from section 2.7 return handleReviewFindings(findings, existingCode, ctx); } private async handleTestFailures( failures: FailureAnalysis[], existingCode: CodeChanges, ctx: AgentContext ): Promise<CodeChanges> { // Similar to handleReviewFindings but for test failures // TODO: Implement return existingCode; } }

3. Reviewer Agent

3.1 Purpose

Performs multi-layer code review (static → security → AI) and calculates risk scores to determine gate decisions.

3.2 System Prompt

markdown
# ROLE You are a senior code reviewer performing AI-assisted code review. # TASK You are the **third layer** of a three-layer review pipeline: 1. Static Analysis (already run) 2. Security Scan (already run) 3. **AI Review** (your task) Given: - Code changes (diff) - Static analysis results - Security scan results You will: 1. Review code for logic correctness 2. Identify edge cases not covered 3. Assess performance implications 4. Evaluate architecture fit 5. Check for maintainability issues # AVAILABLE TOOLS - run_linter(files): Run ESLint/Biome - run_security_scan(files): Run SAST + secret detection - llm_review(diff, context): Deep review with LLM - read_file(path): Read file contents # REVIEW FOCUS AREAS **Logic Correctness:** - Are edge cases handled? - Are error conditions properly managed? - Is the happy path correct? - Are race conditions possible? **Architecture Fit:** - Does this follow existing patterns? - Are abstractions appropriate? - Is coupling minimized? - Are responsibilities clear? **Performance:** - Are there obvious performance issues? - Could this cause memory leaks? - Are queries optimized? - Is caching appropriate? **Maintainability:** - Is the code readable? - Are comments helpful (not redundant)? - Is complexity reasonable? - Would a new developer understand this? # FINDING DEDUPLICATION Before outputting findings: 1. Merge duplicate findings (same file + line) 2. If static analysis already caught it, don't repeat 3. Prioritize: security > correctness > performance > style # RISK SCORE CALCULATION Risk score = weighted sum of: - Complexity: 30% - Criticality: 25% - Change size: 20% - Test coverage: 15% - Findings severity: 10% # GATE DECISION LOGIC - **approve**: score < 20 AND no critical findings - **request_changes**: 20 ≤ score < 50 OR has error-level findings - **require_human**: score ≥ 50 OR has critical security findings # OUTPUT FORMAT ```typescript { findings: Finding[], riskScore: { total: number, level: 'low' | 'medium' | 'high' | 'critical', breakdown: { complexity, criticality, changeSize, coverage, findings } }, decision: 'approve' | 'request_changes' | 'require_human', reasoning: string }

SKIP LOGIC

If risk.level === 'low' (from planner), you MAY skip AI review and rely only on static + security.


### 3.3 Tools Available

```typescript
const reviewerTools: Tool[] = [
  {
    name: 'run_linter',
    description: 'Run static analysis (ESLint/Biome)',
    schema: {
      input: z.object({ files: z.array(z.string()) }),
      output: z.object({
        issues: z.array(z.object({
          file: z.string(),
          line: z.number(),
          severity: z.enum(['info', 'warning', 'error']),
          message: z.string(),
          rule: z.string()
        }))
      })
    },
    execute: async ({ files }, ctx) => {
      const result = await ctx.shell.exec(`eslint ${files.join(' ')} --format json`);
      return { issues: JSON.parse(result.stdout) };
    }
  },
  {
    name: 'run_security_scan',
    description: 'Run security scans (SAST + secrets)',
    schema: {
      input: z.object({ files: z.array(z.string()) }),
      output: z.object({
        vulnerabilities: z.array(z.object({
          severity: z.enum(['low', 'medium', 'high', 'critical']),
          category: z.string(),
          file: z.string(),
          line: z.number(),
          description: z.string()
        }))
      })
    },
    execute: async ({ files }, ctx) => {
      // Run multiple security tools
      const sastResult = await ctx.shell.exec(`semgrep scan ${files.join(' ')} --json`);
      const secretResult = await ctx.shell.exec(`gitleaks detect --source . --verbose`);

      return {
        vulnerabilities: [
          ...parseSemgrepOutput(sastResult.stdout),
          ...parseGitleaksOutput(secretResult.stdout)
        ]
      };
    }
  },
  {
    name: 'llm_review',
    description: 'Perform deep AI code review',
    schema: {
      input: z.object({ diff: z.string(), context: z.string().optional() }),
      output: z.object({
        comments: z.array(z.object({
          file: z.string(),
          line: z.number(),
          severity: z.enum(['info', 'warning', 'error', 'critical']),
          category: z.enum(['logic', 'performance', 'maintainability', 'architecture']),
          message: z.string(),
          suggestedFix: z.string().optional()
        }))
      })
    },
    execute: async ({ diff, context }, ctx) => {
      const response = await ctx.llm.chat({
        system: `You are a code reviewer. Focus on logic, edge cases, performance, and maintainability.
Output a JSON array of review comments.`,
        messages: [{
          role: 'user',
          content: `${context || ''}\n\nDiff:\n${diff}\n\nProvide review comments as JSON.`
        }],
        temperature: 0.2
      });

      return { comments: JSON.parse(response.content) };
    }
  },
  {
    name: 'read_file',
    description: 'Read file for context',
    schema: {
      input: z.object({ path: z.string() }),
      output: z.object({ content: z.string() })
    },
    execute: async ({ path }) => {
      return { content: await fs.readFile(path, 'utf-8') };
    }
  }
];

3.4 Input/Output Types

typescript
interface ReviewerInput { codeChanges: CodeChanges; planRisk?: RiskAssessment; // From planner, used for skip logic } interface ReviewResult { findings: Finding[]; riskScore: RiskScore; decision: 'approve' | 'request_changes' | 'require_human'; reasoning: string; } interface Finding { id: string; source: 'static' | 'security' | 'ai'; file: string; line: number; severity: 'info' | 'warning' | 'error' | 'critical'; category: 'style' | 'security' | 'correctness' | 'performance' | 'maintainability'; message: string; confidence: number; // 0-1 fixable: boolean; suggestedFix?: string; } interface RiskScore { total: number; // 0-100 level: 'low' | 'medium' | 'high' | 'critical'; breakdown: { complexity: number; criticality: number; changeSize: number; coverage: number; findings: number; }; }

3.5 Three-Layer Pipeline

┌─────────────────────────────────────────┐
│ LAYER 1: Static Analysis                │
│  - ESLint / Biome                       │
│  - TypeScript strict check              │
│  - Formatting check                     │
│  ⏱ Fast (seconds)                       │
│  💰 Cheap (free)                        │
│  ✓ Deterministic                        │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ LAYER 2: Security Scan                  │
│  - Secret detection (gitleaks)          │
│  - SAST (semgrep)                       │
│  - Dependency vulnerabilities (npm audit)│
│  ⏱ Fast (seconds)                       │
│  💰 Cheap (mostly free)                 │
│  ✓ High signal                          │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ LAYER 3: AI Review                      │
│  - Logic correctness                    │
│  - Edge cases                           │
│  - Performance implications             │
│  - Architecture fit                     │
│  ⏱ Slow (10-30 seconds)                 │
│  💰 Expensive ($0.01-0.10 per review)   │
│  ~ Variable quality                     │
│                                         │
│  SKIP if risk.level === 'low'           │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ Synthesis & Deduplication               │
│  - Merge duplicate findings             │
│  - Calculate final risk score           │
│  - Determine gate decision              │
└──────────────┬──────────────────────────┘
               ▼
         ReviewResult

3.6 Finding Deduplication

typescript
function deduplicateFindings( staticFindings: Finding[], securityFindings: Finding[], aiFindings: Finding[] ): Finding[] { const allFindings = [ ...staticFindings.map(f => ({ ...f, source: 'static' as const })), ...securityFindings.map(f => ({ ...f, source: 'security' as const })), ...aiFindings.map(f => ({ ...f, source: 'ai' as const })) ]; // Group by file + line const grouped = new Map<string, Finding[]>(); allFindings.forEach(f => { const key = `${f.file}:${f.line}`; if (!grouped.has(key)) { grouped.set(key, []); } grouped.get(key)!.push(f); }); // Deduplicate: prefer security > static > ai const deduplicated: Finding[] = []; grouped.forEach(findings => { const security = findings.find(f => f.source === 'security'); if (security) { deduplicated.push(security); return; } const static_ = findings.find(f => f.source === 'static'); if (static_) { deduplicated.push(static_); return; } // Take AI finding with highest confidence const ai = findings .filter(f => f.source === 'ai') .sort((a, b) => b.confidence - a.confidence)[0]; if (ai) { deduplicated.push(ai); } }); return deduplicated; }

3.7 Risk Score Calculation

typescript
function calculateReviewRiskScore( changes: CodeChanges, findings: Finding[] ): RiskScore { // Complexity (0-30) const complexity = Math.min(30, changes.files.length * 2 + // More files = higher complexity findings.filter(f => f.category === 'maintainability').length * 5 ); // Criticality (0-25) const criticalityKeywords = ['auth', 'security', 'payment', 'data']; const criticalFiles = changes.files.filter(f => criticalityKeywords.some(kw => f.path.includes(kw)) ); const criticality = Math.min(25, criticalFiles.length * 10); // Change size (0-20) const totalLines = changes.files.reduce((acc, f) => { const lines = f.diff?.split('\n').filter(l => l.startsWith('+') || l.startsWith('-')).length || 0; return acc + lines; }, 0); const changeSize = Math.min(20, Math.floor(totalLines / 50)); // Coverage (0-15) const hasTests = changes.testsAdded.length > 0; const coverage = hasTests ? 3 : 12; // Findings (0-10) const criticalCount = findings.filter(f => f.severity === 'critical').length; const errorCount = findings.filter(f => f.severity === 'error').length; const findingsScore = Math.min(10, criticalCount * 5 + errorCount * 2); const total = complexity + criticality + changeSize + coverage + findingsScore; return { total, level: total < 20 ? 'low' : total < 50 ? 'medium' : total < 70 ? 'high' : 'critical', breakdown: { complexity, criticality, changeSize, coverage, findings: findingsScore } }; }

3.8 Gate Decision Logic

typescript
function determineGateDecision( riskScore: RiskScore, findings: Finding[] ): { decision: ReviewDecision; reasoning: string } { // Critical security findings always require human const criticalSecurity = findings.some( f => f.severity === 'critical' && f.category === 'security' ); if (criticalSecurity) { return { decision: 'require_human', reasoning: 'Critical security finding requires human review' }; } // High or critical risk requires human if (riskScore.level === 'high' || riskScore.level === 'critical') { return { decision: 'require_human', reasoning: `Risk level ${riskScore.level} (score: ${riskScore.total}) requires human oversight` }; } // Any error-level findings require changes const hasErrors = findings.some(f => f.severity === 'error'); if (hasErrors) { return { decision: 'request_changes', reasoning: 'Code has error-level findings that must be addressed' }; } // Medium risk with warnings → request changes if (riskScore.level === 'medium' && findings.length > 0) { return { decision: 'request_changes', reasoning: 'Medium-risk change with findings should be refined' }; } // Low risk, no critical issues → approve return { decision: 'approve', reasoning: `Low risk (score: ${riskScore.total}), no blocking issues` }; }

3.9 Class Implementation

typescript
export class ReviewerAgent extends BaseAgent { type = 'reviewer' as const; tools = reviewerTools; systemPrompt = REVIEWER_SYSTEM_PROMPT; async execute( input: ReviewerInput, ctx: AgentContext ): Promise<ReviewResult> { // Layer 1: Static analysis const staticFindings = await this.runStaticAnalysis(input.codeChanges, ctx); // Layer 2: Security scan const securityFindings = await this.runSecurityScan(input.codeChanges, ctx); // Layer 3: AI review (skip if low risk) let aiFindings: Finding[] = []; if (!input.planRisk || input.planRisk.level !== 'low') { aiFindings = await this.runAIReview(input.codeChanges, ctx); } else { ctx.bus.emit({ type: 'review.ai_skipped', payload: { reason: 'Low risk, static + security only' } }); } // Deduplicate and synthesize const allFindings = deduplicateFindings(staticFindings, securityFindings, aiFindings); // Calculate risk const riskScore = calculateReviewRiskScore(input.codeChanges, allFindings); // Determine gate decision const { decision, reasoning } = determineGateDecision(riskScore, allFindings); return { findings: allFindings, riskScore, decision, reasoning }; } private async runStaticAnalysis( changes: CodeChanges, ctx: AgentContext ): Promise<Finding[]> { const files = changes.files.map(f => f.path); const result = await this.tools.find(t => t.name === 'run_linter')! .execute({ files }, ctx as any); return (result as any).issues.map((issue: any) => ({ id: ulid(), source: 'static', file: issue.file, line: issue.line, severity: issue.severity, category: 'style', message: issue.message, confidence: 1.0, fixable: true })); } private async runSecurityScan( changes: CodeChanges, ctx: AgentContext ): Promise<Finding[]> { const files = changes.files.map(f => f.path); const result = await this.tools.find(t => t.name === 'run_security_scan')! .execute({ files }, ctx as any); return (result as any).vulnerabilities.map((vuln: any) => ({ id: ulid(), source: 'security', file: vuln.file, line: vuln.line, severity: vuln.severity, category: 'security', message: vuln.description, confidence: 0.95, fixable: false })); } private async runAIReview( changes: CodeChanges, ctx: AgentContext ): Promise<Finding[]> { const diff = changes.files.map(f => f.diff).join('\n'); const result = await this.tools.find(t => t.name === 'llm_review')! .execute({ diff }, ctx as any); return (result as any).comments.map((comment: any) => ({ id: ulid(), source: 'ai', file: comment.file, line: comment.line, severity: comment.severity, category: comment.category, message: comment.message, confidence: 0.7, fixable: !!comment.suggestedFix, suggestedFix: comment.suggestedFix })); } protected buildPrompt(input: ReviewerInput, memories: Memory[]): string { // AI review layer is invoked via llm_review tool // This method not used in current implementation return ''; } }

4. Tester Agent

4.1 Purpose

Selects and executes tests based on code changes, analyzes failures, and generates missing tests.

4.2 System Prompt

markdown
# ROLE You are a QA engineer responsible for test execution and analysis. # TASK Given code changes and existing test suite, you will: 1. Select which tests to run (risk-based selection) 2. Execute selected tests 3. Analyze any failures (root cause, classification) 4. Identify test coverage gaps 5. Generate missing tests if needed # AVAILABLE TOOLS - run_tests(pattern, options): Execute test suite - read_file(path): Read file contents - write_file(path, content): Write test files - llm_analyze(text, question): Analyze failures with LLM - get_coverage(files): Get coverage report # TEST SELECTION STRATEGY **Always run:** - Tests covering changed files - Tests importing changed modules **Risk-based addition:** - Low risk: Only above - Medium risk: + integration tests for affected features - High risk: + full test suite # FAILURE ANALYSIS For each failure: 1. Classify: real bug | flaky test | environment issue | test needs update 2. Root cause analysis (use LLM) 3. Suggest fix (if auto-fixable) 4. Confidence score (0-1) Only bounce back to implementer if: - Failure is a real bug - Confidence > 0.7 in root cause - Have suggested fix Otherwise, escalate to human. # TEST GENERATION Generate tests if: - Coverage delta < 70% (new code not tested) - Critical paths not covered - Edge cases identified but not tested # OUTPUT FORMAT ```typescript { summary: { total: number, passed: number, failed: number, skipped: number }, coverage: { line: number, branch: number, function: number, diffCoverage: number // % of changed lines covered }, failures: FailureAnalysis[], generatedTests: string[] }

FLAKINESS DETECTION

If a test fails, retry once. If passes on retry:

  • Mark as flaky
  • Report to monitoring
  • Don't fail the build

ITERATION

Use tools to:

  • Execute tests
  • Analyze failures
  • Generate new tests
  • Re-run tests

When all tests pass or failures analyzed, return { done: true, result: testResult }.


### 4.3 Tools Available

```typescript
const testerTools: Tool[] = [
  {
    name: 'run_tests',
    description: 'Execute test suite',
    schema: {
      input: z.object({
        pattern: z.string().optional(),
        options: z.object({
          coverage: z.boolean().optional(),
          bail: z.boolean().optional(),
          timeout: z.number().optional()
        }).optional()
      }),
      output: z.object({
        exitCode: z.number(),
        results: z.array(z.object({
          file: z.string(),
          name: z.string(),
          status: z.enum(['passed', 'failed', 'skipped']),
          duration: z.number(),
          error: z.string().optional()
        })),
        coverage: z.object({
          line: z.number(),
          branch: z.number(),
          function: z.number()
        }).optional()
      })
    },
    execute: async ({ pattern, options }, ctx) => {
      const cmd = `bun test ${pattern || ''} ${options?.coverage ? '--coverage' : ''}`;
      const result = await ctx.shell.exec(cmd);
      return parseTestOutput(result);
    }
  },
  {
    name: 'read_file',
    description: 'Read file contents',
    schema: {
      input: z.object({ path: z.string() }),
      output: z.object({ content: z.string() })
    },
    execute: async ({ path }) => {
      return { content: await fs.readFile(path, 'utf-8') };
    }
  },
  {
    name: 'write_file',
    description: 'Write test file',
    schema: {
      input: z.object({ path: z.string(), content: z.string() }),
      output: z.object({ success: z.boolean() })
    },
    execute: async ({ path, content }) => {
      await fs.writeFile(path, content, 'utf-8');
      return { success: true };
    }
  },
  {
    name: 'llm_analyze',
    description: 'Analyze test failures',
    schema: {
      input: z.object({ failure: z.string(), context: z.string().optional() }),
      output: z.object({
        classification: z.enum(['bug', 'flaky', 'environment', 'test_outdated']),
        rootCause: z.string(),
        suggestedFix: z.string().optional(),
        confidence: z.number()
      })
    },
    execute: async ({ failure, context }, ctx) => {
      const response = await ctx.llm.chat({
        system: `You are analyzing test failures. Classify and diagnose.`,
        messages: [{
          role: 'user',
          content: `${context || ''}\n\nFailure:\n${failure}\n\nProvide analysis as JSON.`
        }],
        temperature: 0.1
      });
      return JSON.parse(response.content);
    }
  },
  {
    name: 'get_coverage',
    description: 'Get coverage for specific files',
    schema: {
      input: z.object({ files: z.array(z.string()) }),
      output: z.object({
        files: z.array(z.object({
          path: z.string(),
          line: z.number(),
          branch: z.number(),
          uncovered: z.array(z.number())
        }))
      })
    },
    execute: async ({ files }, ctx) => {
      // Parse coverage report for specific files
      return getCoverageForFiles(files);
    }
  }
];

4.4 Input/Output Types

typescript
interface TesterInput { codeChanges: CodeChanges; planRisk?: RiskAssessment; } interface TestResult { summary: { total: number; passed: number; failed: number; skipped: number; }; coverage: { line: number; branch: number; function: number; diffCoverage: number; }; failures: FailureAnalysis[]; generatedTests: string[]; } interface FailureAnalysis { testName: string; testFile: string; classification: 'bug' | 'flaky' | 'environment' | 'test_outdated'; rootCause: string; suggestedFix?: string; confidence: number; errorMessage: string; stackTrace: string; }

4.5 Test Selection Strategy

┌─────────────────────────────────────────┐
│ 1. Identify Changed Files              │
│   - Files modified in codeChanges       │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 2. Find Direct Tests                    │
│   - Tests importing changed files       │
│   - Tests in same directory             │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 3. Risk-Based Expansion                 │
│   Low risk:    Direct tests only        │
│   Medium risk: + integration tests      │
│   High risk:   + full suite             │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 4. Execute Selected Tests               │
└─────────────────────────────────────────┘
typescript
function selectTests( changes: CodeChanges, risk: RiskAssessment, allTests: string[] ): string[] { const changedFiles = new Set(changes.files.map(f => f.path)); // Always run tests covering changed files const directTests = allTests.filter(testFile => { const imports = getImportsFromFile(testFile); return imports.some(imp => changedFiles.has(imp)); }); // Risk-based expansion if (risk.level === 'low') { return directTests; } if (risk.level === 'medium') { // Add integration tests for affected features const features = extractFeatures(changes); const integrationTests = allTests.filter(t => t.includes('integration') && features.some(f => t.includes(f)) ); return [...directTests, ...integrationTests]; } // High or critical: run everything return allTests; }

4.6 Failure Analysis

typescript
async function analyzeFailure( failure: TestFailure, ctx: AgentContext ): Promise<FailureAnalysis> { // Retry once to check for flakiness const retryResult = await ctx.tools.run_tests({ pattern: failure.testFile }); const retryTest = retryResult.results.find(r => r.name === failure.testName); if (retryTest?.status === 'passed') { return { testName: failure.testName, testFile: failure.testFile, classification: 'flaky', rootCause: 'Test is flaky - passed on retry', confidence: 0.9, errorMessage: failure.error || '', stackTrace: failure.stackTrace || '' }; } // Not flaky - analyze root cause const analysis = await ctx.tools.llm_analyze({ failure: `${failure.error}\n${failure.stackTrace}`, context: `Test: ${failure.testName}\nFile: ${failure.testFile}` }); return { testName: failure.testName, testFile: failure.testFile, classification: analysis.classification, rootCause: analysis.rootCause, suggestedFix: analysis.suggestedFix, confidence: analysis.confidence, errorMessage: failure.error || '', stackTrace: failure.stackTrace || '' }; }

4.7 Test Generation

typescript
async function generateMissingTests( changes: CodeChanges, coverage: CoverageReport, ctx: AgentContext ): Promise<string[]> { const generatedFiles: string[] = []; // Find files with low coverage for (const file of changes.files) { const fileCoverage = coverage.files.find(f => f.path === file.path); if (!fileCoverage || fileCoverage.line < 70) { // Generate tests const sourceCode = await ctx.tools.read_file({ path: file.path }); const testCode = await ctx.llm.chat({ system: `Generate comprehensive tests for this code. Use the testing framework already in use.`, messages: [{ role: 'user', content: `File: ${file.path}\n\n${sourceCode.content}\n\nGenerate tests covering all functions and edge cases.` }], temperature: 0.2 }); // Write test file const testPath = file.path.replace(/\.(ts|js)$/, '.test.$1'); await ctx.tools.write_file({ path: testPath, content: testCode.content }); generatedFiles.push(testPath); } } return generatedFiles; }

4.8 Class Implementation

typescript
export class TesterAgent extends BaseAgent { type = 'tester' as const; tools = testerTools; systemPrompt = TESTER_SYSTEM_PROMPT; async execute( input: TesterInput, ctx: AgentContext ): Promise<TestResult> { // Select tests const allTests = await this.getAllTests(); const selectedTests = selectTests( input.codeChanges, input.planRisk || { level: 'medium' }, allTests ); ctx.bus.emit({ type: 'test.selection', payload: { total: allTests.length, selected: selectedTests.length } }); // Execute tests const testResults = await this.executeTests(selectedTests, ctx); // Analyze failures const failures = await this.analyzeFailures(testResults.failures, ctx); // Get coverage const coverage = await this.getCoverage(input.codeChanges, ctx); // Generate missing tests if needed let generatedTests: string[] = []; if (coverage.diffCoverage < 70) { generatedTests = await this.generateMissingTests( input.codeChanges, coverage, ctx ); } return { summary: testResults.summary, coverage, failures, generatedTests }; } private async getAllTests(): Promise<string[]> { return await glob('**/*.test.{ts,js}'); } private async executeTests( tests: string[], ctx: AgentContext ): Promise<any> { const result = await this.tools.find(t => t.name === 'run_tests')! .execute({ pattern: tests.join(' '), options: { coverage: true } }, ctx as any); return result; } private async analyzeFailures( failures: TestFailure[], ctx: AgentContext ): Promise<FailureAnalysis[]> { return await Promise.all( failures.map(f => analyzeFailure(f, ctx)) ); } private async getCoverage( changes: CodeChanges, ctx: AgentContext ): Promise<CoverageReport> { const files = changes.files.map(f => f.path); const result = await this.tools.find(t => t.name === 'get_coverage')! .execute({ files }, ctx as any); return calculateCoverageMetrics(result); } private async generateMissingTests( changes: CodeChanges, coverage: CoverageReport, ctx: AgentContext ): Promise<string[]> { return await generateMissingTests(changes, coverage, ctx); } protected buildPrompt(input: TesterInput, memories: Memory[]): string { // Tester uses tools directly, prompt not heavily used return `Analyze test results for ${input.codeChanges.files.length} changed files.`; } }

5. Deployer Agent

5.1 Purpose

Orchestrates deployment with build verification, human approval gates, canary rollout, and health monitoring.

5.2 System Prompt

markdown
# ROLE You are a deployment engineer responsible for safe production releases. # TASK Given validated code and test results, you will: 1. Verify build artifact 2. Request human approval (if production) 3. Execute canary deployment 4. Monitor health metrics during rollout 5. Auto-rollback if unhealthy 6. Complete full rollout if healthy # AVAILABLE TOOLS - run_command(cmd): Execute build/deploy commands - github_api(endpoint, data): Interact with GitHub - read_file(path): Read config files - wait(ms): Wait for duration # BUILD VERIFICATION Before deployment: 1. Build artifact 2. Verify checksum 3. Check artifact size (flag if > 2x previous) 4. Validate manifest # HUMAN APPROVAL GATE Always required for: - Production environment - Breaking changes - High/critical risk Request approval with: - Summary of changes - Risk assessment - Test results - Findings summary # CANARY DEPLOYMENT LOGIC Based on risk level: - Low risk: 5% → 25% → 100% (5 min between stages) - Medium risk: 5% → 10% → 25% → 50% → 100% (10 min between stages) - High risk: 5% → 10% → 25% → 50% → 100% (30 min between stages) At each stage: 1. Shift traffic 2. Wait for stabilization 3. Check health metrics 4. If unhealthy → auto-rollback 5. If healthy → continue # HEALTH CHECKS Monitor: - Error rate vs. baseline (threshold: +10%) - Latency p95 vs. baseline (threshold: +50%) - Throughput vs. baseline (threshold: -20%) # ROLLBACK TRIGGERS Auto-rollback if: - Error rate > baseline + 1% - Latency p95 > baseline * 1.5 - Any critical health check fails # OUTPUT FORMAT ```typescript { status: 'healthy' | 'degraded' | 'rolled_back', stages: DeploymentStage[], metrics: { errorRate: number, latency: { p50, p95, p99 }, throughput: number }, url: string }

ITERATION

Deployment is a linear process with checkpoints:

  1. Build
  2. Approval (if needed)
  3. Deploy stage 1
  4. Health check
  5. Deploy stage 2
  6. Health check
  7. ... until complete or rollback

Return { done: true, result: deploymentResult } when complete.


### 5.3 Tools Available

```typescript
const deployerTools: Tool[] = [
  {
    name: 'run_command',
    description: 'Execute shell command',
    schema: {
      input: z.object({ cmd: z.string() }),
      output: z.object({
        exitCode: z.number(),
        stdout: z.string(),
        stderr: z.string()
      })
    },
    execute: async ({ cmd }, ctx) => {
      return await ctx.shell.exec(cmd);
    }
  },
  {
    name: 'github_api',
    description: 'Interact with GitHub API',
    schema: {
      input: z.object({
        endpoint: z.string(),
        method: z.enum(['GET', 'POST', 'PUT', 'PATCH']),
        data: z.any().optional()
      }),
      output: z.object({ response: z.any() })
    },
    execute: async ({ endpoint, method, data }, ctx) => {
      const response = await fetch(`https://api.github.com${endpoint}`, {
        method,
        headers: {
          'Authorization': `token ${ctx.config.githubToken}`,
          'Content-Type': 'application/json'
        },
        body: data ? JSON.stringify(data) : undefined
      });
      return { response: await response.json() };
    }
  },
  {
    name: 'read_file',
    description: 'Read file',
    schema: {
      input: z.object({ path: z.string() }),
      output: z.object({ content: z.string() })
    },
    execute: async ({ path }) => {
      return { content: await fs.readFile(path, 'utf-8') };
    }
  },
  {
    name: 'wait',
    description: 'Wait for duration',
    schema: {
      input: z.object({ ms: z.number() }),
      output: z.object({ waited: z.number() })
    },
    execute: async ({ ms }) => {
      await new Promise(resolve => setTimeout(resolve, ms));
      return { waited: ms };
    }
  }
];

5.4 Input/Output Types

typescript
interface DeployerInput { codeChanges: CodeChanges; testResults: TestResult; reviewResults: ReviewResult; environment: 'staging' | 'production'; } interface DeploymentResult { status: 'healthy' | 'degraded' | 'rolled_back'; stages: DeploymentStage[]; metrics: { errorRate: number; latency: { p50: number; p95: number; p99: number }; throughput: number; }; url: string; } interface DeploymentStage { name: string; trafficPercent: number; startTime: Date; endTime?: Date; status: 'pending' | 'deploying' | 'healthy' | 'unhealthy' | 'rolled_back'; healthChecks: HealthCheck[]; } interface HealthCheck { timestamp: Date; metric: string; value: number; baseline: number; threshold: number; passed: boolean; }

5.5 Deployment Flow

┌─────────────────────────────────────────┐
│ 1. Build Verification                   │
│   - Run build command                   │
│   - Verify artifact                     │
│   - Check size                          │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 2. Human Approval Gate                  │
│   (if production or high risk)          │
│   - Show summary                        │
│   - Wait for approval                   │
│   - If rejected → stop                  │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 3. Canary Stage 1 (5%)                  │
│   - Shift 5% traffic                    │
│   - Wait 5 minutes                      │
│   - Health check                        │
│   - If unhealthy → rollback             │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 4. Canary Stage 2 (25%)                 │
│   - Shift 25% traffic                   │
│   - Wait 5 minutes                      │
│   - Health check                        │
│   - If unhealthy → rollback             │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 5. Full Rollout (100%)                  │
│   - Shift 100% traffic                  │
│   - Final health check                  │
└──────────────┬──────────────────────────┘
               ▼
┌─────────────────────────────────────────┐
│ 6. Monitor                              │
│   - Continuous health monitoring        │
│   - Report metrics                      │
└─────────────────────────────────────────┘

5.6 Health Check Logic

typescript
async function checkHealth( deployment: Deployment, baseline: Metrics ): Promise<{ healthy: boolean; issues: string[] }> { const current = await collectMetrics(deployment); const issues: string[] = []; // Error rate check const errorRateIncrease = current.errorRate - baseline.errorRate; if (errorRateIncrease > 0.01) { // 1% absolute increase issues.push(`Error rate increased by ${(errorRateIncrease * 100).toFixed(2)}%`); } // Latency check const latencyIncrease = current.latency.p95 / baseline.latency.p95; if (latencyIncrease > 1.5) { // 50% increase issues.push(`P95 latency increased by ${((latencyIncrease - 1) * 100).toFixed(0)}%`); } // Throughput check const throughputDecrease = current.throughput / baseline.throughput; if (throughputDecrease < 0.8) { // 20% decrease issues.push(`Throughput decreased by ${((1 - throughputDecrease) * 100).toFixed(0)}%`); } return { healthy: issues.length === 0, issues }; }

5.7 Canary Stages by Risk

typescript
function getCanaryStages(risk: RiskLevel): CanaryConfig { switch (risk) { case 'low': return { stages: [ { percent: 5, waitMs: 5 * 60_000 }, { percent: 25, waitMs: 5 * 60_000 }, { percent: 100, waitMs: 0 } ] }; case 'medium': return { stages: [ { percent: 5, waitMs: 10 * 60_000 }, { percent: 10, waitMs: 10 * 60_000 }, { percent: 25, waitMs: 10 * 60_000 }, { percent: 50, waitMs: 10 * 60_000 }, { percent: 100, waitMs: 0 } ] }; case 'high': case 'critical': return { stages: [ { percent: 5, waitMs: 30 * 60_000 }, { percent: 10, waitMs: 30 * 60_000 }, { percent: 25, waitMs: 30 * 60_000 }, { percent: 50, waitMs: 30 * 60_000 }, { percent: 100, waitMs: 0 } ] }; } }

5.8 Rollback Logic

typescript
async function rollback( deployment: Deployment, reason: string, ctx: AgentContext ): Promise<void> { ctx.bus.emit({ type: 'deployment.rollback_started', payload: { deploymentId: deployment.id, reason } }); // Shift all traffic back to previous version await ctx.tools.run_command({ cmd: `kubectl set image deployment/${deployment.service} app=${deployment.previousVersion}` }); // Wait for rollback to complete await ctx.tools.wait({ ms: 30_000 }); // Verify rollback succeeded const health = await checkHealth(deployment, deployment.baselineMetrics); if (!health.healthy) { // Rollback itself failed - escalate await ctx.gates.escalate({ type: 'rollback_failed', deployment, reason: health.issues.join(', ') }); } ctx.bus.emit({ type: 'deployment.rollback_completed', payload: { deploymentId: deployment.id, healthy: health.healthy } }); }

5.9 Class Implementation

typescript
export class DeployerAgent extends BaseAgent { type = 'deployer' as const; tools = deployerTools; systemPrompt = DEPLOYER_SYSTEM_PROMPT; async execute( input: DeployerInput, ctx: AgentContext ): Promise<DeploymentResult> { // Build const artifact = await this.buildArtifact(ctx); // Human approval for production if (input.environment === 'production') { const approved = await this.requestApproval(input, ctx); if (!approved) { throw new Error('Deployment rejected by human'); } } // Get baseline metrics const baseline = await this.getBaselineMetrics(ctx); // Determine canary stages const canaryConfig = getCanaryStages(input.reviewResults.riskScore.level); // Execute canary deployment const stages: DeploymentStage[] = []; for (const stage of canaryConfig.stages) { const stageResult = await this.deployStage( artifact, stage.percent, stage.waitMs, baseline, ctx ); stages.push(stageResult); if (stageResult.status === 'unhealthy') { // Rollback await rollback( { id: 'current', service: 'app', previousVersion: 'v1' } as any, 'Health check failed', ctx ); return { status: 'rolled_back', stages, metrics: await this.getCurrentMetrics(ctx), url: ctx.config.appUrl }; } } // Success return { status: 'healthy', stages, metrics: await this.getCurrentMetrics(ctx), url: ctx.config.appUrl }; } private async buildArtifact(ctx: AgentContext): Promise<BuildArtifact> { const result = await this.tools.find(t => t.name === 'run_command')! .execute({ cmd: 'bun run build' }, ctx as any); if (result.exitCode !== 0) { throw new Error(`Build failed: ${result.stderr}`); } return { id: ulid(), version: 'v1', checksum: 'abc123', size: 1024, files: [], metadata: {} }; } private async requestApproval( input: DeployerInput, ctx: AgentContext ): Promise<boolean> { const approval = await ctx.gates.requestHumanApproval({ type: 'deploy_production', context: { files: input.codeChanges.files.length, risk: input.reviewResults.riskScore.level, tests: input.testResults.summary, findings: input.reviewResults.findings.length }, riskAssessment: input.reviewResults.riskScore as any, automatedChecks: [] }); return approval.decision === 'approved'; } private async getBaselineMetrics(ctx: AgentContext): Promise<Metrics> { // Fetch current production metrics return { errorRate: 0.001, latency: { p50: 100, p95: 250, p99: 500 }, throughput: 1000 }; } private async deployStage( artifact: BuildArtifact, percent: number, waitMs: number, baseline: Metrics, ctx: AgentContext ): Promise<DeploymentStage> { const stage: DeploymentStage = { name: `${percent}%`, trafficPercent: percent, startTime: new Date(), status: 'deploying', healthChecks: [] }; // Shift traffic await this.tools.find(t => t.name === 'run_command')! .execute({ cmd: `kubectl set image deployment/app app=${artifact.version} --record` }, ctx as any); // Wait for stabilization await this.tools.find(t => t.name === 'wait')! .execute({ ms: waitMs }, ctx as any); // Health check const health = await checkHealth({ id: 'current' } as any, baseline); stage.status = health.healthy ? 'healthy' : 'unhealthy'; stage.endTime = new Date(); stage.healthChecks.push({ timestamp: new Date(), metric: 'overall', value: health.healthy ? 1 : 0, baseline: 1, threshold: 1, passed: health.healthy }); return stage; } private async getCurrentMetrics(ctx: AgentContext): Promise<any> { return { errorRate: 0.001, latency: { p50: 100, p95: 250, p99: 500 }, throughput: 1000 }; } protected buildPrompt(input: DeployerInput, memories: Memory[]): string { return `Deploy to ${input.environment} with risk level ${input.reviewResults.riskScore.level}`; } }

6. Summary and Integration

6.1 Agent Dependencies

PlannerAgent
     │
     ▼
ImplementerAgent ──┐
     ▲             │
     │             ▼
     │        ReviewAgent
     │             │
     │             ▼
     │        (bounce back if changes needed)
     │
     └───── (fixFindings) ───┐
                             │
                             ▼
                        TesterAgent
                             │
                             ▼
                        (bounce back if test failures)
                             │
                             └─── (fixFailures) ─────▶ ImplementerAgent
                                                             │
                                                             ▼
                                                        DeployerAgent

6.2 Shared Context

All agents access shared context via AgentContext:

typescript
interface AgentContext { traceId: string; bus: EventBus; memory: MemoryStore; llm: LLMProvider; tools: ToolRegistry; shell: ShellExecutor; config: RuntimeConfig; gates: HumanGateSystem; safety: SafetyControls; cost: CostTracker; elapsed: number; }

6.3 Build Order

  1. Week 1: BaseAgent + Planner (simple version, no memory queries)
  2. Week 2: Implementer (file-by-file, no bounce-back)
  3. Week 3: Reviewer (3-layer pipeline)
  4. Week 4: Tester (selection + execution)
  5. Week 5: Implementer bounce-back logic
  6. Week 6: Deployer
  7. Week 7: Memory integration for all agents
  8. Week 8: Polish and edge cases

6.4 Testing Strategy

Each agent should have:

  • Unit tests for decision logic (risk scoring, test selection, etc.)
  • Integration tests with mock tools
  • End-to-end tests with real codebase (small fixture)
  • Cost tracking tests (ensure within budget)

7. Open Questions and Decisions Needed

  1. Planner parallelization: Should planning happen in parallel for large features? (Start sequential)
  2. Implementer swarm: When to introduce parallel sub-agents? (Post-MVP)
  3. Review confidence thresholds: What confidence scores should gate decisions? (Start conservative: 0.7)
  4. Test generation quality: How to validate generated tests are useful? (Coverage + manual review)
  5. Deployment rollback automation: Always auto-rollback or require human? (Auto-rollback with notification)

End of Agent Designs Implementation Plan