36 min
architecture
February 8, 2026

Implementation Sub-Plan: Build Order (8-Week Detailed Breakdown)

Implementation Sub-Plan: Build Order (8-Week Detailed Breakdown)

Document: 13-build-order.md Date: 2026-02-07 Status: Detailed implementation plan Source: SYSTEM-DESIGN.md Section 13, 02-roadmap.md, 01-architecture.md


Overview

This document breaks down the 8-week build order from the system design into specific day-by-day tasks. Each day is scoped to 4-6 hours of focused work. The plan follows a "skeleton first, organs second" approach: build the foundational infrastructure early, then fill in specialized agent logic.

Core Philosophy:

  • Week 1-2: Foundation (types, bus, memory, tools)
  • Week 3-4: First vertical slice (Reviewer)
  • Week 5: Second agent (Tester)
  • Week 6: Complete agent set (Planner + Implementer)
  • Week 7: Orchestration layer
  • Week 8: Polish and harden

Week 1: Core Skeleton

Goal: Build the foundational types, event system, configuration, and database schema that everything else depends on. By end of week, the project compiles and has passing tests for all core modules.

Day 1: Project Initialization and Schema Design

Time Estimate: 5 hours Dependencies: None

Tasks:

  1. Initialize Bun project with TypeScript

    • bun init
    • Configure tsconfig.json (strict mode, path aliases)
    • Add dependencies: drizzle-orm, better-sqlite3, zod, ulid
  2. Set up directory structure

    forge/
    ├── src/
    │   ├── core/
    │   ├── safety/
    │   ├── memory/
    │   ├── tools/
    │   ├── agents/
    │   ├── orchestrator/
    │   └── cli/
    ├── drizzle/
    ├── tests/
    └── package.json
    
  3. Create Drizzle schema (src/memory/schema.ts)

    • Events table
    • Memories table
    • Patterns table
    • Checkpoints table
    • Runs table
    • Findings table
    • (Full schema from SYSTEM-DESIGN.md Section 5)
  4. Set up Drizzle migrations

    • drizzle.config.ts
    • Generate initial migration
    • Create seed data script

Files to Create:

  • package.json
  • tsconfig.json
  • src/memory/schema.ts
  • drizzle.config.ts
  • drizzle/0000_initial.sql
  • scripts/seed.ts

Acceptance Criteria:

  • bun run typecheck passes
  • bun run drizzle-kit migrate creates database
  • ✓ Seed script populates test data

Day 2: Core Types and Error Taxonomy

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Implement core abstractions (src/core/types.ts)

    • Agent interface
    • ForgeEvent interface
    • Tool interface
    • Phase interface
    • Memory interface
    • Checkpoint interface
    • All supporting types from SYSTEM-DESIGN.md Section 3
  2. Create error taxonomy (src/core/errors.ts)

    • Base ForgeError class
    • Specialized errors:
      • CircuitBreakerError
      • ConfigurationError
      • AgentError
      • ToolExecutionError
      • ValidationError
    • Error classification helpers
    • Severity and recoverability enums
  3. Create Zod schemas for validation (src/core/schemas.ts)

    • Validation schemas for all core types
    • Runtime type guards

Files to Create:

  • src/core/types.ts
  • src/core/errors.ts
  • src/core/schemas.ts
  • tests/core/types.test.ts
  • tests/core/errors.test.ts

Acceptance Criteria:

  • ✓ All types export correctly
  • ✓ Error classes have proper inheritance
  • ✓ Zod schemas validate example data
  • ✓ Test coverage > 80%

Day 3: Event Bus Implementation

Time Estimate: 5 hours Dependencies: Day 2 complete

Tasks:

  1. Implement in-memory event bus (src/core/bus.ts)

    • EventBus class with emit/on/off
    • Wildcard subscriptions
    • Event persistence to SQLite
    • Replay functionality
    • Subscription cleanup
  2. Add event tracing

    • Generate traceId (ulid)
    • Chain events with trace IDs
    • Event timestamps and ordering
  3. Create event helpers

    • Event builder pattern
    • Common event factories
    • Event filtering utilities

Files to Create:

  • src/core/bus.ts
  • tests/core/bus.test.ts

Acceptance Criteria:

  • ✓ Can emit events
  • ✓ Subscribers receive events
  • ✓ Events persist to database
  • ✓ Replay reconstructs event history
  • ✓ Wildcard subscriptions work
  • ✓ No memory leaks in subscriptions

Day 4: Configuration and Safety Defaults

Time Estimate: 4 hours Dependencies: Day 2 complete

Tasks:

  1. Implement configuration system (src/core/config.ts)

    • Default configuration from SYSTEM-DESIGN.md Section 8
    • Per-project config loading (forge.config.ts)
    • Environment variable overrides
    • Config validation with Zod
    • Config merge strategy
  2. Create safety control structures (src/safety/breakers.ts)

    • CircuitBreaker class
    • Iteration counter breaker
    • Cost tracker breaker
    • Time limit breaker
    • Error rate breaker
    • Breaker state machine (closed/open/half-open)
  3. Implement safety budget (src/safety/budget.ts)

    • Cost tracking per phase
    • Cost tracking per run
    • Budget exhaustion handling

Files to Create:

  • src/core/config.ts
  • src/safety/breakers.ts
  • src/safety/budget.ts
  • forge.config.example.ts
  • tests/core/config.test.ts
  • tests/safety/breakers.test.ts

Acceptance Criteria:

  • ✓ Default config loads
  • ✓ Can override with project config
  • ✓ Circuit breakers trip at thresholds
  • ✓ Budget tracking is accurate

Day 5: LLM Provider Abstraction

Time Estimate: 6 hours Dependencies: Day 2, Day 4 complete

Tasks:

  1. Create LLM provider interface (src/tools/llm.ts)

    • LLMProvider interface
    • ChatRequest / ChatResponse types
    • Token counting utilities
    • Cost calculation per model
  2. Implement Anthropic provider

    • Claude API integration
    • Streaming support
    • Tool use protocol
    • Error handling and retry
    • Rate limiting
  3. Add prompt management

    • System prompt templates
    • Message formatting
    • Token limit enforcement
  4. Create LLM cost tracker

    • Track tokens per call
    • Calculate USD cost
    • Integrate with budget system

Files to Create:

  • src/tools/llm.ts
  • src/tools/llm-anthropic.ts
  • src/tools/prompts.ts
  • tests/tools/llm.test.ts (with mocks)

Acceptance Criteria:

  • ✓ Can make chat completions
  • ✓ Tool use protocol works
  • ✓ Cost tracking is accurate
  • ✓ Retries on transient errors
  • ✓ Integration test with real API passes

Week 1 Deliverable:

  • Project skeleton that compiles
  • Core types defined
  • Event bus working
  • Config and safety defaults in place
  • LLM abstraction ready
  • All unit tests passing

Week 2: Memory + Tools

Goal: Build the memory system (store, recall, consolidation) and essential tool layer (git, runner, linter). By end of week, an agent loop can execute against a real LLM, call tools, and store memories.

Day 1: Memory Store CRUD

Time Estimate: 5 hours Dependencies: Week 1 complete

Tasks:

  1. Implement memory store (src/memory/store.ts)

    • CRUD operations for memories
    • Query by type (episodic/semantic/procedural)
    • Query by confidence threshold
    • Query by recency
    • Update confidence scores
    • Archive low-confidence memories
  2. Add memory indexing

    • Tag-based search
    • Context string matching
    • Access tracking (lastAccessed, accessCount)
  3. Implement confidence decay

    • Weekly decay formula: confidence -= 0.05
    • Reinforcement on access: confidence += 0.1
    • Pruning threshold: confidence < 0.2

Files to Create:

  • src/memory/store.ts
  • tests/memory/store.test.ts

Acceptance Criteria:

  • ✓ Can store and retrieve memories
  • ✓ Confidence decay works correctly
  • ✓ Can query by multiple criteria
  • ✓ Access tracking increments

Day 2: Similarity Search and Embeddings

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Add embedding support to memory store

    • Store embeddings as Float32Array in blob column
    • Generate embeddings via LLM provider
    • Cosine similarity calculation
  2. Implement similarity search

    • Embed query string
    • Calculate similarity scores
    • Return top-k memories
    • Cache embeddings to reduce API calls
  3. Create memory recall function

    • recall(context, type, limit)Memory[]
    • Combine similarity + recency + confidence
    • Ranking algorithm

Files to Create:

  • src/memory/similarity.ts
  • tests/memory/similarity.test.ts

Acceptance Criteria:

  • ✓ Embeddings stored correctly
  • ✓ Similarity search returns relevant memories
  • ✓ Recall function uses multi-factor ranking

Day 3: Episodic and Semantic Memory

Time Estimate: 5 hours Dependencies: Day 1-2 complete

Tasks:

  1. Implement episodic memory (src/memory/episodes.ts)

    • Store execution events as episodic memories
    • Link to trace ID
    • Query episodes by time range
    • Query episodes by outcome (success/failure)
  2. Implement semantic memory (src/memory/patterns.ts)

    • Extract patterns from episodic memories
    • Pattern frequency tracking
    • Success rate calculation
    • Pattern trigger matching
  3. Create pattern extraction logic

    • LLM-based pattern extraction from events
    • Pattern deduplication
    • Pattern confidence scoring

Files to Create:

  • src/memory/episodes.ts
  • src/memory/patterns.ts
  • tests/memory/episodes.test.ts
  • tests/memory/patterns.test.ts

Acceptance Criteria:

  • ✓ Episodes stored with trace IDs
  • ✓ Patterns extracted from episodes
  • ✓ Pattern matching works

Day 4: Tool Registry and Core Tools

Time Estimate: 6 hours Dependencies: Week 1 complete

Tasks:

  1. Create tool registry (src/tools/registry.ts)

    • Tool registration
    • Tool discovery
    • Tool execution with validation
    • Tool sandboxing (execution timeout, resource limits)
  2. Implement git tool (src/tools/git.ts)

    • git status
    • git diff
    • git add
    • git commit
    • git push
    • Parse git output
  3. Implement shell runner tool (src/tools/runner.ts)

    • Execute shell commands
    • Capture stdout/stderr
    • Timeout enforcement
    • Working directory management
  4. Implement linter tool (src/tools/linter.ts)

    • Run ESLint or Biome
    • Parse linter output
    • Categorize findings

Files to Create:

  • src/tools/registry.ts
  • src/tools/git.ts
  • src/tools/runner.ts
  • src/tools/linter.ts
  • tests/tools/registry.test.ts
  • tests/tools/git.test.ts
  • tests/tools/runner.test.ts

Acceptance Criteria:

  • ✓ Tools can be registered
  • ✓ Tool input/output validated
  • ✓ Git operations work
  • ✓ Shell commands execute with timeout
  • ✓ Linter integration works

Day 5: Base Agent with Loop + First Integration Test

Time Estimate: 6 hours Dependencies: Week 2 Days 1-4 complete

Tasks:

  1. Implement base agent (src/agents/base.ts)

    • Agent loop: perceive → reason → act → learn
    • Integration with LLM provider
    • Integration with tool registry
    • Integration with memory store
    • Circuit breaker integration
    • Reflection after execution
  2. Create working memory structure

    • Message history
    • Tool results
    • Iteration tracking
  3. Implement reflection mechanism

    • Post-execution reflection prompt
    • Extract learnings
    • Store learnings in memory
  4. Write first integration test

    • Create simple agent that uses a tool
    • Execute against real LLM
    • Verify tool execution
    • Verify memory storage

Files to Create:

  • src/agents/base.ts
  • tests/agents/base.integration.test.ts

Acceptance Criteria:

  • ✓ Agent loop executes
  • ✓ LLM decides which tool to use
  • ✓ Tool executes successfully
  • ✓ Memory stores execution
  • ✓ Circuit breakers work
  • ✓ Reflection extracts learnings

Week 2 Deliverable:

  • Memory system fully functional
  • Tool layer operational
  • Base agent can execute with real LLM
  • First integration test passing
  • Feedback loop foundation in place

Week 3: Reviewer Agent (First Vertical Slice)

Goal: Build a complete vertical slice: the Reviewer agent that reviews code changes, posts findings, learns from dismissals. This proves the full stack works end-to-end.

Day 1: Reviewer Agent Skeleton

Time Estimate: 5 hours Dependencies: Week 2 complete

Tasks:

  1. Create reviewer agent (src/agents/reviewer.ts)

    • Extend BaseAgent
    • Define agent type: reviewer
    • Three-layer review structure:
      • Layer 1: Static analysis
      • Layer 2: Security scan
      • Layer 3: AI review
  2. Define reviewer input/output types

    • ReviewInput: code changes, context
    • ReviewOutput: findings, risk score, decision
  3. Implement risk scoring algorithm

    • Complexity component
    • Change size component
    • Criticality component
    • Calculate overall risk level (low/medium/high/critical)
  4. Define review decision logic

    • Approve (low risk, no critical findings)
    • Request changes (fixable issues)
    • Require human (high risk or critical findings)

Files to Create:

  • src/agents/reviewer.ts
  • src/agents/reviewer-types.ts
  • tests/agents/reviewer.test.ts

Acceptance Criteria:

  • ✓ Reviewer extends BaseAgent
  • ✓ Risk scoring formula implemented
  • ✓ Decision logic implemented

Day 2: Static Analysis Integration

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Create static analysis layer

    • Integrate linter tool (ESLint/Biome)
    • Run TypeScript strict check
    • Run formatting check (Prettier/Biome)
    • Parse and categorize findings
  2. Map linter output to Finding type

    • Extract file, line, column
    • Map severity
    • Categorize (style/correctness/etc)
    • Determine fixability
  3. Deduplicate findings

    • Hash-based deduplication
    • Merge similar findings

Files to Create:

  • src/agents/reviewer-static.ts
  • tests/agents/reviewer-static.test.ts

Acceptance Criteria:

  • ✓ Static analysis runs on code changes
  • ✓ Findings extracted correctly
  • ✓ Deduplication works

Day 3: AI Review Layer

Time Estimate: 6 hours Dependencies: Day 1-2 complete

Tasks:

  1. Create AI review layer

    • Construct review prompt with code diff
    • Include relevant patterns from memory
    • Request LLM to review for:
      • Logic correctness
      • Edge cases
      • Performance implications
      • Architecture fit
  2. Parse LLM review output

    • Extract findings from structured output
    • Map to Finding type
    • Confidence scoring
  3. Implement risk-based review depth

    • Low risk: skip AI review (static only)
    • Medium risk: AI review with fast model
    • High risk: AI review with strong model
    • Critical risk: AI review + human required

Files to Create:

  • src/agents/reviewer-ai.ts
  • src/agents/reviewer-prompts.ts
  • tests/agents/reviewer-ai.test.ts

Acceptance Criteria:

  • ✓ AI review runs for medium+ risk
  • ✓ Findings extracted from LLM
  • ✓ Risk-based depth selection works

Day 4: GitHub Integration

Time Estimate: 5 hours Dependencies: Day 1-3 complete

Tasks:

  1. Create GitHub tool (src/tools/github.ts)

    • Fetch PR details
    • Fetch PR diff
    • Post review comments
    • Update PR status check
    • Dismiss comments
    • React to comment events
  2. Integrate with reviewer agent

    • Fetch PR on review request
    • Post findings as PR comments
    • Update check status based on decision
    • Group findings by file
  3. Format review comments

    • Markdown formatting
    • Code snippets
    • Severity badges
    • Fix suggestions

Files to Create:

  • src/tools/github.ts
  • src/agents/reviewer-github.ts
  • tests/tools/github.test.ts (mocked)

Acceptance Criteria:

  • ✓ Can fetch PR details
  • ✓ Can post review comments
  • ✓ Comments formatted nicely
  • ✓ Status checks update

Day 5: Risk Scoring, Finding Persistence, Phase Bounce

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

  1. Persist findings to database

    • Store findings in findings table
    • Link to run ID
    • Track dismissals
  2. Implement finding dismissal learning

    • Track when humans dismiss findings
    • Decrease confidence in related patterns
    • Store dismissal reason as learning
  3. Implement phase bounce logic (review → implement → re-review)

    • Detect "changes requested" decision
    • Package findings for implementer
    • Track bounce count
    • Max bounces: 3
  4. Create review metrics

    • False positive rate (dismissals / total)
    • Review duration
    • Findings per review

Files to Create:

  • src/agents/reviewer-persistence.ts
  • src/agents/reviewer-learning.ts
  • tests/agents/reviewer-learning.test.ts

Acceptance Criteria:

  • ✓ Findings persist to database
  • ✓ Dismissals decrease pattern confidence
  • ✓ Bounce logic works
  • ✓ Metrics tracked

Week 3 Deliverable:

  • Complete Reviewer agent operational
  • Three-layer review working
  • GitHub PR integration functional
  • Findings persisted and learned from
  • Phase bounce logic implemented

Week 4: Tester Agent

Goal: Build the Tester agent that selects tests, executes them, analyzes failures, and suggests fixes. Integrate with phase bounce (test → implement → retest).

Day 1: Tester Agent Skeleton + Test Selection

Time Estimate: 5 hours Dependencies: Week 3 complete

Tasks:

  1. Create tester agent (src/agents/tester.ts)

    • Extend BaseAgent
    • Define agent type: tester
  2. Define tester input/output types

    • TestInput: code changes, existing test suite
    • TestOutput: test results, failures, coverage, generated tests
  3. Implement risk-based test selection

    • Changed files → unit tests covering those files
    • Medium+ risk → integration tests
    • High+ risk → full suite
    • Parse test files to build dependency graph
  4. Create test selector algorithm

    • Static analysis to find test files for modules
    • Impact analysis for integration tests
    • Prioritize by risk and recency

Files to Create:

  • src/agents/tester.ts
  • src/agents/tester-selection.ts
  • tests/agents/tester.test.ts

Acceptance Criteria:

  • ✓ Tester extends BaseAgent
  • ✓ Test selection based on changes works
  • ✓ Risk-based selection works

Day 2: Test Runner Integration

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Create test runner tool (src/tools/test-runner.ts)

    • Execute Jest/Vitest via shell
    • Parse JSON output
    • Capture stdout/stderr
    • Timeout enforcement
  2. Parse test results

    • Extract passed/failed/skipped counts
    • Extract failure messages
    • Extract stack traces
    • Extract coverage data
  3. Implement retry logic for flaky tests

    • Retry failed tests once
    • Mark as flaky if inconsistent
    • Track flakiness over time

Files to Create:

  • src/tools/test-runner.ts
  • tests/tools/test-runner.test.ts

Acceptance Criteria:

  • ✓ Can execute tests
  • ✓ Results parsed correctly
  • ✓ Retry logic works
  • ✓ Flaky tests detected

Day 3: Failure Analysis

Time Estimate: 6 hours Dependencies: Day 1-2 complete

Tasks:

  1. Implement failure analyzer (src/agents/tester-analysis.ts)

    • Classify failure type:
      • Real bug
      • Flaky test
      • Environment issue
      • Outdated snapshot
    • Use LLM for root cause analysis
    • Extract relevant code context
  2. Create failure analysis prompt

    • Include test code
    • Include failure message
    • Include relevant source code
    • Request root cause and fix suggestion
  3. Confidence scoring for suggested fixes

    • High confidence (>0.7) → auto-fixable
    • Low confidence → escalate to human

Files to Create:

  • src/agents/tester-analysis.ts
  • src/agents/tester-prompts.ts
  • tests/agents/tester-analysis.test.ts

Acceptance Criteria:

  • ✓ Failures classified correctly
  • ✓ LLM provides root cause analysis
  • ✓ Fix suggestions generated
  • ✓ Confidence scores assigned

Day 4: Test Gap Detection + Generation

Time Estimate: 5 hours Dependencies: Day 1-3 complete

Tasks:

  1. Implement test gap detection

    • Parse coverage report
    • Identify uncovered lines in changed files
    • Identify uncovered branches
    • Identify functions without tests
  2. Create test generator

    • Use LLM to generate test cases
    • Input: function signature, implementation, examples
    • Output: test code
    • Validate generated tests (parse, typecheck)
  3. Format generated tests

    • Match existing test style
    • Follow project conventions
    • Add descriptive test names

Files to Create:

  • src/agents/tester-gaps.ts
  • src/agents/tester-generator.ts
  • tests/agents/tester-generator.test.ts

Acceptance Criteria:

  • ✓ Uncovered code detected
  • ✓ Tests generated for gaps
  • ✓ Generated tests are valid

Day 5: Phase Bounce Integration + Metrics

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

  1. Implement test → implement → retest bounce

    • Detect test failures
    • Package failure analysis for implementer
    • Track bounce count
    • Max bounces: 2
  2. Create test metrics

    • Test duration
    • Pass rate over time
    • Flaky test rate
    • Coverage delta (before/after)
    • Test gap count
  3. Integrate with memory system

    • Store failure patterns
    • Store successful fixes
    • Recall similar failures from memory

Files to Create:

  • src/agents/tester-bounce.ts
  • src/agents/tester-metrics.ts
  • tests/agents/tester-bounce.test.ts

Acceptance Criteria:

  • ✓ Test failures bounce to implementer
  • ✓ Bounce count tracked
  • ✓ Metrics tracked
  • ✓ Failure patterns stored

Week 4 Deliverable:

  • Complete Tester agent operational
  • Test selection and execution working
  • Failure analysis with suggested fixes
  • Test generation for gaps
  • Phase bounce (test → fix → retest) working

Week 5: Planner + Implementer Agents

Goal: Build the Planner agent (requirements → plan) and Implementer agent (plan → code). Integrate self-validation loop (write → typecheck → test → fix). End-to-end flow works: plan → implement → review → test.

Day 1: Planner Agent - Requirements Analysis

Time Estimate: 5 hours Dependencies: Week 4 complete

Tasks:

  1. Create planner agent (src/agents/planner.ts)

    • Extend BaseAgent
    • Define agent type: planner
  2. Define planner input/output types

    • PlanInput: requirements (natural language)
    • PlanOutput: analysis, architecture, tasks, risk assessment
  3. Implement requirements analysis

    • Use LLM to parse requirements
    • Extract acceptance criteria
    • Identify constraints
    • Decompose into stories
  4. Create planning prompts

    • Requirements analysis prompt
    • Include codebase context (file structure)
    • Include relevant patterns from memory

Files to Create:

  • src/agents/planner.ts
  • src/agents/planner-analysis.ts
  • src/agents/planner-prompts.ts
  • tests/agents/planner.test.ts

Acceptance Criteria:

  • ✓ Requirements parsed into structured format
  • ✓ Stories extracted
  • ✓ Constraints identified

Day 2: Planner Agent - Architecture Design + Task Decomposition

Time Estimate: 6 hours Dependencies: Day 1 complete

Tasks:

  1. Implement architecture design

    • Use LLM to design system architecture
    • Identify components
    • Define interfaces
    • Document architecture decisions
  2. Implement task decomposition

    • Break architecture into implementation tasks
    • Order tasks by dependencies
    • Estimate complexity per task
    • Assign priorities
  3. Implement risk assessment

    • Calculate risk score (same formula as reviewer)
    • Determine review depth requirement
    • Determine test coverage requirement
  4. Create implementation plan

    • Combine architecture + tasks + risk
    • Package for implementer

Files to Create:

  • src/agents/planner-design.ts
  • src/agents/planner-tasks.ts
  • src/agents/planner-risk.ts
  • tests/agents/planner-design.test.ts

Acceptance Criteria:

  • ✓ Architecture designed
  • ✓ Tasks decomposed and ordered
  • ✓ Risk assessed
  • ✓ Plan packaged for implementer

Day 3: Implementer Agent - Code Generation

Time Estimate: 6 hours Dependencies: Day 1-2 complete

Tasks:

  1. Create implementer agent (src/agents/implementer.ts)

    • Extend BaseAgent
    • Define agent type: implementer
  2. Define implementer input/output types

    • ImplementInput: plan, task
    • ImplementOutput: code changes, tests added, validated
  3. Implement code generation

    • Use LLM to generate code from task spec
    • Include architecture context
    • Include existing code patterns
    • Generate tests alongside code
  4. Create file operations

    • Read existing files
    • Write new files
    • Modify existing files (diff-based edits)

Files to Create:

  • src/agents/implementer.ts
  • src/agents/implementer-generation.ts
  • src/agents/implementer-prompts.ts
  • tests/agents/implementer.test.ts

Acceptance Criteria:

  • ✓ Code generated from task spec
  • ✓ Tests generated alongside code
  • ✓ Files written to disk

Day 4: Implementer Agent - Self-Validation Loop

Time Estimate: 6 hours Dependencies: Day 1-3 complete

Tasks:

  1. Implement self-validation loop

    • After code generation:
      • Run typecheck
      • Run linter
      • Run affected tests
    • If issues found:
      • Analyze issues
      • Fix issues
      • Loop back (max 3 iterations)
  2. Create validation tools

    • Typecheck runner (tsc --noEmit)
    • Linter runner (reuse from Week 3)
    • Test runner (reuse from Week 4)
  3. Implement fix logic

    • Parse validation errors
    • Use LLM to suggest fixes
    • Apply fixes
    • Re-validate
  4. Track validation metrics

    • Iterations to valid code
    • Issues fixed per iteration
    • Final validation status

Files to Create:

  • src/agents/implementer-validation.ts
  • src/agents/implementer-fix.ts
  • tests/agents/implementer-validation.test.ts

Acceptance Criteria:

  • ✓ Typecheck runs after code generation
  • ✓ Linter runs after code generation
  • ✓ Tests run after code generation
  • ✓ Issues auto-fixed (up to max iterations)
  • ✓ Final code typechecks and passes tests

Day 5: End-to-End Integration Test

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

  1. Create integration test for full flow

    • Input: simple feature requirement
    • Plan → Implement → Review → Test
    • Verify each phase output
    • Verify phase transitions
    • Verify bounce logic works
  2. Test phase bounces

    • Review finds issue → bounce to implement → fix → re-review
    • Test fails → bounce to implement → fix → re-test
  3. Verify memory integration

    • Learnings stored at each phase
    • Patterns extracted after run
    • Memories recalled in next run
  4. Create test fixtures

    • Sample requirements
    • Sample codebase
    • Expected outputs

Files to Create:

  • tests/integration/full-flow.test.ts
  • tests/fixtures/sample-requirements.ts
  • tests/fixtures/sample-codebase/

Acceptance Criteria:

  • ✓ Full flow executes without errors
  • ✓ Phase bounces work
  • ✓ Memories stored and recalled
  • ✓ Final code is valid and tested

Week 5 Deliverable:

  • Planner agent functional (requirements → plan)
  • Implementer agent functional (plan → code)
  • Self-validation loop working
  • End-to-end flow tested (plan → implement → review → test)
  • Phase bounces working across all agents

Week 6: Orchestrator

Goal: Build the orchestration layer that coordinates agents through the pipeline. State machine, checkpoints, shared context, human gates. "forge run" executes the complete pipeline.

Day 1: Pipeline State Machine + Phase Sequencing

Time Estimate: 6 hours Dependencies: Week 5 complete

Tasks:

  1. Create pipeline orchestrator (src/orchestrator/pipeline.ts)

    • State machine: idle → planning → implementing → reviewing → testing → deploying → monitoring → completed
    • Phase transitions
    • Phase input/output validation
    • Phase execution logic
  2. Define phase configuration

    • Each phase has:
      • Agent assignment
      • Guards (pre-conditions)
      • Gates (human approval checkpoints)
      • Breakers (circuit breakers)
      • Next phase
  3. Implement phase execution

    • Load phase config
    • Check guards
    • Execute agent
    • Check breakers
    • Check gates
    • Transition to next phase
  4. Handle phase failures

    • Capture errors
    • Determine recoverability
    • Retry logic (if transient)
    • Checkpoint before retry

Files to Create:

  • src/orchestrator/pipeline.ts
  • src/orchestrator/phases.ts
  • tests/orchestrator/pipeline.test.ts

Acceptance Criteria:

  • ✓ State machine transitions correctly
  • ✓ Phases execute in sequence
  • ✓ Phase failures handled
  • ✓ Guards and breakers enforced

Day 2: Checkpoint System

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Create checkpoint system (src/orchestrator/checkpoint.ts)

    • Create checkpoint after each phase
    • Store phase state to database
    • List checkpoints for a run
    • Restore from checkpoint
    • Compare checkpoints (diff)
  2. Define checkpoint schema

    • Checkpoint ID (ulid)
    • Trace ID (links to run)
    • Phase name
    • State (JSON serialized)
    • Timestamp
  3. Implement state serialization

    • Serialize complex types (Dates, Maps, etc.)
    • Deserialize on restore
    • Validate restored state
  4. Integrate with pipeline

    • Auto-checkpoint after each phase
    • Resume from last checkpoint on failure
    • Clean up old checkpoints (retention policy)

Files to Create:

  • src/orchestrator/checkpoint.ts
  • tests/orchestrator/checkpoint.test.ts

Acceptance Criteria:

  • ✓ Checkpoints created after each phase
  • ✓ State serialized correctly
  • ✓ Can restore from checkpoint
  • ✓ Pipeline resumes from checkpoint

Day 3: Context Bus (Shared State)

Time Estimate: 5 hours Dependencies: Day 1-2 complete

Tasks:

  1. Create context bus (src/orchestrator/context.ts)

    • Shared state store (in-memory + persistent)
    • Key-value interface
    • Get/set/update/subscribe
    • Snapshot and restore
  2. Implement context scoping

    • Global context (across all runs)
    • Run context (scoped to trace ID)
    • Phase context (scoped to phase)
  3. Integrate with agents

    • Agents read from context
    • Agents write to context
    • Context persisted at checkpoints
  4. Create context helpers

    • Context builders
    • Type-safe context accessors
    • Context validation

Files to Create:

  • src/orchestrator/context.ts
  • tests/orchestrator/context.test.ts

Acceptance Criteria:

  • ✓ Context shared across agents
  • ✓ Context persisted at checkpoints
  • ✓ Context scoping works
  • ✓ Type safety maintained

Day 4: Human Approval Gates

Time Estimate: 5 hours Dependencies: Day 1-3 complete

Tasks:

  1. Create gate system (src/safety/gates.ts)

    • Define gate types:
      • Architecture approval (high-risk plans)
      • Production deploy (always)
      • Security findings (critical severity)
      • Cost overrun (approaching budget)
    • Gate condition evaluation
    • Gate timeout handling
  2. Implement gate workflow

    • Pause pipeline at gate
    • Notify human (CLI prompt, webhook, etc.)
    • Wait for approval/rejection
    • Resume pipeline on approval
    • Halt pipeline on rejection
    • Timeout fallback (default: reject)
  3. Create CLI prompts for gates

    • Display gate context
    • Display relevant information
    • Yes/no/defer prompt
    • Optional feedback input
  4. Integrate with pipeline

    • Check gates after phase execution
    • Store gate decisions as events

Files to Create:

  • src/safety/gates.ts
  • src/cli/gates.ts
  • tests/safety/gates.test.ts

Acceptance Criteria:

  • ✓ Gates evaluated correctly
  • ✓ Pipeline pauses at gate
  • ✓ CLI prompts display
  • ✓ Approval resumes pipeline
  • ✓ Rejection halts pipeline
  • ✓ Timeout handled

Day 5: Full Pipeline Integration Test + "forge run"

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

  1. Wire up complete pipeline

    • Integrate all agents
    • Integrate all phases
    • Integrate checkpoints
    • Integrate context bus
    • Integrate gates
    • Integrate breakers
  2. Create "forge run" CLI command (src/cli/commands/run.ts)

    • Parse requirements from CLI
    • Initialize pipeline
    • Execute pipeline
    • Display progress
    • Handle interruptions (Ctrl+C)
    • Display final status
  3. Create full integration test

    • Run complete pipeline with real LLM
    • Verify all phases execute
    • Verify checkpoints created
    • Verify gates triggered
    • Verify final output

Files to Create:

  • src/cli/commands/run.ts
  • tests/integration/pipeline-full.test.ts

Acceptance Criteria:

  • ✓ "forge run" command works
  • ✓ Complete pipeline executes
  • ✓ All agents coordinate correctly
  • ✓ Checkpoints work
  • ✓ Gates work
  • ✓ Breakers work

Week 6 Deliverable:

  • Pipeline orchestrator functional
  • State machine working
  • Checkpoint system operational
  • Context bus sharing state
  • Human approval gates working
  • "forge run" executes complete pipeline
  • Full integration test passing

Week 7: CLI + Polish

Goal: Build a polished CLI with commands for running, reviewing, testing, status checks, and history. Terminal UI for progress and findings. Per-project configuration. Memory consolidation job.

Day 1: CLI Framework + Core Commands (Part 1)

Time Estimate: 5 hours Dependencies: Week 6 complete

Tasks:

  1. Set up CLI framework

    • Use commander or similar for CLI parsing
    • Define command structure
    • Add help text
    • Add version flag
  2. Implement "forge run" (refine from Week 6 Day 5)

    • Accept requirements as argument or file
    • Accept config overrides via flags
    • Display progress indicators
    • Handle errors gracefully
  3. Implement "forge review"

    • Manually trigger review of PR or local changes
    • Display review findings
    • Option to auto-fix findings
  4. Create CLI utilities

    • Spinner/progress indicators
    • Error formatting
    • Success messages

Files to Create:

  • src/cli/index.ts
  • src/cli/commands/run.ts (refine)
  • src/cli/commands/review.ts
  • src/cli/ui/spinner.ts
  • src/cli/ui/format.ts

Acceptance Criteria:

  • ✓ CLI parses commands
  • ✓ Help text displays
  • ✓ "forge run" works
  • ✓ "forge review" works
  • ✓ Progress indicators display

Day 2: CLI Core Commands (Part 2)

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Implement "forge test"

    • Manually trigger test execution
    • Display test results
    • Display failure analysis
    • Option to generate missing tests
  2. Implement "forge status"

    • Display current pipeline status
    • Display recent runs
    • Display cost usage
    • Display agent health
  3. Implement "forge history"

    • List past runs
    • Filter by status/date
    • Display run details
    • Replay events from a run
  4. Create status display

    • Table formatting
    • Color coding by status
    • Duration formatting
    • Cost formatting

Files to Create:

  • src/cli/commands/test.ts
  • src/cli/commands/status.ts
  • src/cli/commands/history.ts
  • src/cli/ui/table.ts

Acceptance Criteria:

  • ✓ "forge test" works
  • ✓ "forge status" displays current state
  • ✓ "forge history" lists runs
  • ✓ Tables formatted nicely

Day 3: Terminal UI (Progress + Findings Display)

Time Estimate: 5 hours Dependencies: Day 1-2 complete

Tasks:

  1. Create progress display for pipeline

    • Show current phase
    • Show phase progress (iteration count, time elapsed)
    • Show cost so far
    • Update in real-time
  2. Create findings display

    • Group findings by file
    • Color code by severity
    • Show code snippets
    • Show fix suggestions
    • Collapsible sections
  3. Create event stream display

    • Show events as they occur
    • Filter by event type
    • Expandable details
  4. Create dashboard view

    • Overview of system state
    • Recent activity
    • Cost dashboard
    • Agent status

Files to Create:

  • src/cli/ui/progress.ts
  • src/cli/ui/findings.ts
  • src/cli/ui/events.ts
  • src/cli/ui/dashboard.ts

Acceptance Criteria:

  • ✓ Progress updates in real-time
  • ✓ Findings display is readable
  • ✓ Events stream displays
  • ✓ Dashboard shows system state

Day 4: Per-Project Configuration

Time Estimate: 4 hours Dependencies: Week 6 complete

Tasks:

  1. Create configuration loader

    • Load forge.config.ts from project root
    • Merge with defaults
    • Validate configuration
  2. Define configuration schema

    • Project metadata (name, language)
    • LLM provider settings
    • Tool commands (test, lint, build, typecheck)
    • Safety settings (cost limits, iteration limits)
    • GitHub integration settings
    • Memory settings
  3. Implement "forge init" command

    • Generate forge.config.ts template
    • Interactive prompts for key settings
    • Detect project type (package.json, etc.)
  4. Create configuration validation

    • Validate all required fields
    • Validate types
    • Provide helpful error messages

Files to Create:

  • src/cli/commands/init.ts
  • src/core/config-loader.ts
  • forge.config.template.ts
  • tests/core/config-loader.test.ts

Acceptance Criteria:

  • ✓ "forge init" creates config file
  • ✓ Config loads from project root
  • ✓ Config merges with defaults
  • ✓ Config validation works

Day 5: Memory Consolidation Job

Time Estimate: 6 hours Dependencies: Week 2 complete

Tasks:

  1. Create consolidation job (src/memory/consolidate.ts)

    • Run periodically (nightly or on-demand)
    • Extract patterns from episodic memories
    • Merge similar patterns
    • Promote high-frequency episodes to patterns
    • Decay confidence on unused memories
    • Prune low-confidence memories
    • Archive old memories
  2. Implement pattern extraction

    • Query recent episodic memories
    • Use LLM to extract patterns
    • Deduplicate patterns
    • Calculate frequency and success rate
  3. Implement memory pruning

    • Identify low-confidence memories (< 0.2)
    • Identify stale memories (not accessed in 90 days)
    • Archive or delete
  4. Implement "forge consolidate" command

    • Manually trigger consolidation
    • Display consolidation results
    • Show patterns extracted
    • Show memories pruned

Files to Create:

  • src/memory/consolidate.ts
  • src/cli/commands/consolidate.ts
  • tests/memory/consolidate.test.ts

Acceptance Criteria:

  • ✓ Consolidation job runs
  • ✓ Patterns extracted from episodes
  • ✓ Similar patterns merged
  • ✓ Low-confidence memories pruned
  • ✓ "forge consolidate" command works

Week 7 Deliverable:

  • Polished CLI with all core commands
  • Terminal UI with progress and findings display
  • Per-project configuration system
  • "forge init" bootstraps new projects
  • Memory consolidation job operational
  • User experience polished

Week 8: Harden + Document

Goal: Error recovery, cost tracking dashboard, real-world testing on actual projects, edge case handling, documentation. Production-ready v0.1.

Day 1: Error Recovery (Part 1)

Time Estimate: 5 hours Dependencies: Week 7 complete

Tasks:

  1. Implement retry logic

    • Retry transient errors (network, rate limit)
    • Exponential backoff
    • Max retries per operation
    • Log all retries
  2. Implement fallback strategies

    • If LLM fails: fallback to cached response or simpler prompt
    • If tool fails: fallback to alternative tool or manual mode
    • If agent fails: fallback to previous checkpoint
  3. Classify errors by recoverability

    • Transient (retry)
    • Permanent (fail)
    • User error (prompt for correction)
  4. Enhance error messages

    • Actionable error messages
    • Suggest recovery steps
    • Link to documentation

Files to Create:

  • src/core/retry.ts
  • src/core/fallback.ts
  • tests/core/retry.test.ts

Acceptance Criteria:

  • ✓ Transient errors retried
  • ✓ Fallback strategies work
  • ✓ Error messages are helpful

Day 2: Error Recovery (Part 2) + Checkpoint Resume

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

  1. Implement checkpoint resume

    • "forge resume" command
    • List resumable runs (failed or interrupted)
    • Resume from last checkpoint
    • Replay events up to checkpoint
  2. Implement error recovery workflow

    • When pipeline fails:
      • Create checkpoint
      • Log error
      • Notify user
      • Offer to resume or rollback
    • When user resumes:
      • Restore from checkpoint
      • Attempt to fix error
      • Continue pipeline
  3. Test recovery scenarios

    • Network failure during LLM call
    • Tool execution timeout
    • Circuit breaker trip
    • User interrupt (Ctrl+C)

Files to Create:

  • src/cli/commands/resume.ts
  • src/orchestrator/recovery.ts
  • tests/orchestrator/recovery.test.ts

Acceptance Criteria:

  • ✓ Failed runs can be resumed
  • ✓ Checkpoint restore works
  • ✓ Recovery workflow works
  • ✓ All recovery scenarios tested

Day 3: Cost Tracking Dashboard

Time Estimate: 5 hours Dependencies: Week 7 complete

Tasks:

  1. Create cost tracking system

    • Track costs per run
    • Track costs per phase
    • Track costs per agent
    • Track costs per day/week/month
    • Store in database
  2. Create cost dashboard

    • Display current spend
    • Display budget remaining
    • Display cost breakdown (by agent, by phase)
    • Display cost trends over time
    • Warn on approaching limits
  3. Implement "forge costs" command

    • Display cost dashboard
    • Filter by date range
    • Export cost data (CSV)
  4. Add cost estimates

    • Before pipeline execution, estimate cost
    • Display estimate to user
    • Prompt for confirmation if > threshold

Files to Create:

  • src/safety/cost-tracker.ts
  • src/cli/commands/costs.ts
  • src/cli/ui/cost-dashboard.ts
  • tests/safety/cost-tracker.test.ts

Acceptance Criteria:

  • ✓ Costs tracked accurately
  • ✓ Dashboard displays cost breakdown
  • ✓ "forge costs" command works
  • ✓ Cost estimates before execution
  • ✓ Warnings on approaching limits

Day 4: Real-World Testing + Bug Fixes

Time Estimate: 6 hours Dependencies: Week 7 complete, Days 1-3 complete

Tasks:

  1. Test on real projects

    • Select 2-3 real codebases
    • Run "forge review" on recent PRs
    • Run "forge test" on test suites
    • Run "forge run" with simple requirements
    • Document all issues
  2. Fix discovered bugs

    • Prioritize critical bugs
    • Fix issues from real-world testing
    • Add regression tests
  3. Performance profiling

    • Profile slow operations
    • Optimize database queries
    • Add caching where appropriate
    • Reduce LLM calls where possible
  4. Improve robustness

    • Add null checks
    • Add input validation
    • Handle edge cases
    • Improve error handling

Files to Create:

  • tests/real-world/ (test logs and results)
  • Various bug fix commits

Acceptance Criteria:

  • ✓ Tested on real projects
  • ✓ Critical bugs fixed
  • ✓ Performance acceptable
  • ✓ Regression tests added

Day 5: Edge Cases + Documentation + Release Prep

Time Estimate: 6 hours Dependencies: Week 8 Days 1-4 complete

Tasks:

  1. Handle edge cases

    • Empty codebase
    • Very large codebase
    • Binary files
    • Permission issues
    • Network offline
    • Out of disk space
    • Invalid configuration
  2. Write documentation

    • README.md (installation, quick start)
    • ARCHITECTURE.md (system overview)
    • CONFIGURATION.md (config options)
    • CLI.md (command reference)
    • DEVELOPMENT.md (contributing guide)
  3. Create examples

    • Example configurations
    • Example workflows
    • Tutorial: first feature with Forge
  4. Release preparation

    • Update version to v0.1.0
    • Write CHANGELOG.md
    • Create GitHub release
    • Tag release commit

Files to Create:

  • README.md
  • docs/ARCHITECTURE.md
  • docs/CONFIGURATION.md
  • docs/CLI.md
  • docs/DEVELOPMENT.md
  • CHANGELOG.md
  • examples/ (various examples)

Acceptance Criteria:

  • ✓ Edge cases handled
  • ✓ Documentation complete
  • ✓ Examples working
  • ✓ Release tagged

Week 8 Deliverable:

  • Error recovery working
  • Checkpoint resume functional
  • Cost tracking dashboard
  • Real-world testing complete
  • Bugs fixed
  • Edge cases handled
  • Documentation complete
  • Production-ready v0.1 released

Dependencies Between Weeks

Critical Path

The following must be completed in sequence (cannot parallelize):

Week 1 → Week 2 → Week 3
                    ↓
                 Week 4
                    ↓
                 Week 5
                    ↓
                 Week 6 → Week 7 → Week 8

Dependency Details

WeekMust Complete BeforeWhy
Week 1Week 2Memory and tools need core types, bus, config
Week 2Week 3Reviewer needs memory system and tools
Week 3Week 4Tester uses same patterns as Reviewer
Week 4Week 5Planner/Implementer need Reviewer and Tester to exist for bounces
Week 5Week 6Orchestrator coordinates existing agents
Week 6Week 7CLI commands need working orchestrator
Week 7Week 8Can't harden what doesn't exist yet

Parallelization Opportunities

Within each week, some tasks can be parallelized:

Week 1:

  • Day 3 (Bus) and Day 4 (Config) can be parallelized

Week 2:

  • Day 1-2 (Memory) and Day 4 (Tools) can be partially parallelized

Week 3:

  • Day 2 (Static) and Day 3 (AI review) can be developed in parallel

Week 5:

  • Day 1-2 (Planner) and Day 3-4 (Implementer) can be parallelized by 2 developers

Week 7:

  • Day 1-2 (CLI commands) and Day 3 (UI) can be parallelized

Risk per Week

Week 1 Risks

RiskLikelihoodImpactMitigation
Schema design wrongMediumHighReview against all agent requirements before Day 2
Drizzle setup issuesLowMediumUse Drizzle docs, test migrations early
Type system too complexMediumMediumStart simple, iterate; avoid premature abstraction

Week 2 Risks

RiskLikelihoodImpactMitigation
Similarity search too slowMediumMediumStart with simple implementation, optimize later
LLM API rate limitsMediumMediumImplement exponential backoff, use caching
Memory bloatMediumHighImplement pruning early; test with large datasets

Week 3 Risks

RiskLikelihoodImpactMitigation
AI review too expensiveHighHighUse cheap model (Haiku) for reviews; cache aggressively
GitHub API rate limitsMediumMediumImplement backoff; use GraphQL for efficiency
False positive rate too highHighMediumTune confidence thresholds; learn from dismissals

Week 4 Risks

RiskLikelihoodImpactMitigation
Test selection misses bugsMediumCriticalAlways run baseline smoke tests; increase coverage for high-risk
Flaky test detection failsMediumMediumRequire 2+ inconsistent results to mark as flaky
Test generation low qualityHighLowHuman review required before merging generated tests

Week 5 Risks

RiskLikelihoodImpactMitigation
Code generation quality poorHighCriticalSelf-validation loop catches many issues; reviewer catches rest
Self-validation loop infiniteMediumHighMax iterations (3); circuit breakers
Integration test takes too longMediumMediumUse small test fixture; mock LLM for speed tests

Week 6 Risks

RiskLikelihoodImpactMitigation
State machine bugsMediumHighThorough testing of all state transitions
Checkpoint serialization failsMediumHighTest with complex state; validate on restore
Phase bounce infinite loopMediumHighMax bounce count (3); circuit breaker

Week 7 Risks

RiskLikelihoodImpactMitigation
CLI UX confusingMediumMediumUser testing; iterate on feedback
Configuration too complexMediumMediumSensible defaults; "forge init" generates valid config
Memory consolidation too slowLowMediumRun async; user can continue working

Week 8 Risks

RiskLikelihoodImpactMitigation
Real-world testing uncovers major bugsHighHighBudget extra time; prioritize critical bugs
Recovery logic flawedMediumCriticalTest all recovery scenarios; manual testing
Not enough time for polishHighMediumCut scope if needed; v0.1 is MVP, not perfect

Acceptance Criteria per Week

Week 1: Core Skeleton

  • Project initializes and compiles with bun run typecheck
  • Database schema created and migrations run
  • All core types exported from src/core/types.ts
  • Error taxonomy has 5+ specialized error classes
  • Event bus can emit, subscribe, persist, and replay events
  • Configuration loads from defaults and project config
  • Circuit breakers trip at configured thresholds
  • LLM abstraction can make chat completions with Anthropic
  • All unit tests pass with >80% coverage
  • At least one integration test with real LLM passes

Week 2: Memory + Tools

  • Memory store can create, read, update, delete memories
  • Similarity search returns relevant memories
  • Episodic memories stored and queried by trace ID
  • Patterns extracted from episodic memories
  • Tool registry can register and execute tools
  • Git tool can run basic git commands
  • Shell runner tool executes commands with timeout
  • Linter tool integrates with ESLint or Biome
  • Base agent loop executes with real LLM
  • Base agent can call tools and store memories
  • All unit tests pass with >80% coverage

Week 3: Reviewer Agent

  • Reviewer agent extends BaseAgent
  • Static analysis layer runs and parses lint output
  • AI review layer runs for medium+ risk changes
  • Risk scoring algorithm implemented
  • GitHub tool can fetch PR and post comments
  • Findings persist to database
  • Finding dismissals decrease pattern confidence
  • Phase bounce logic (review → implement → re-review) works
  • Review metrics tracked (false positive rate, duration)
  • Integration test: review a real PR

Week 4: Tester Agent

  • Tester agent extends BaseAgent
  • Risk-based test selection works
  • Test runner tool executes tests and parses output
  • Flaky test detection (retry logic) works
  • Failure analysis classifies failures correctly
  • LLM provides root cause analysis for failures
  • Test gap detection identifies uncovered code
  • Test generator creates valid tests for gaps
  • Phase bounce logic (test → implement → re-test) works
  • Test metrics tracked (pass rate, flaky rate, coverage)

Week 5: Planner + Implementer

  • Planner agent analyzes requirements and extracts stories
  • Planner agent designs architecture
  • Planner agent decomposes into ordered tasks
  • Planner agent assesses risk
  • Implementer agent generates code from task spec
  • Implementer agent generates tests alongside code
  • Self-validation loop runs typecheck, linter, tests
  • Self-validation loop auto-fixes issues (max 3 iterations)
  • End-to-end integration test: plan → implement → review → test
  • Phase bounces work across all agents

Week 6: Orchestrator

  • Pipeline state machine transitions through all phases
  • Checkpoint system creates checkpoints after each phase
  • Checkpoint restore works correctly
  • Context bus shares state across agents
  • Human approval gates pause pipeline and wait for approval
  • Circuit breakers halt pipeline when thresholds exceeded
  • "forge run" command executes complete pipeline
  • Full integration test with all phases passes

Week 7: CLI + Polish

  • CLI framework parses commands and displays help
  • "forge run" command works (from Week 6 refined)
  • "forge review" command works
  • "forge test" command works
  • "forge status" displays current pipeline status
  • "forge history" lists past runs
  • "forge init" generates project configuration
  • Terminal UI displays progress in real-time
  • Terminal UI displays findings nicely formatted
  • Memory consolidation job extracts patterns and prunes memories
  • "forge consolidate" command works

Week 8: Harden + Document

  • Retry logic handles transient errors
  • Fallback strategies implemented for LLM and tool failures
  • "forge resume" command resumes failed runs from checkpoint
  • Error recovery workflow tested (all scenarios)
  • Cost tracking accurate per run/phase/agent
  • "forge costs" command displays cost dashboard
  • Cost estimates shown before pipeline execution
  • Tested on 2-3 real projects
  • Critical bugs from real-world testing fixed
  • Performance profiled and optimized
  • Edge cases handled (empty codebase, large codebase, etc.)
  • Documentation complete (README, architecture, CLI reference, configuration, development guide)
  • Examples created and working
  • v0.1.0 tagged and released

Integration Test Plan

Week Boundary Integration Tests

End of Week 1:

  • Test: Initialize project, load config, emit event, persist to database, replay events
  • Test: LLM chat completion with real API
  • Expected: All pass, no errors

End of Week 2:

  • Test: Store memory, query by similarity, recall relevant memories
  • Test: Register tool, execute tool, capture result
  • Test: Base agent loop with LLM + tool execution
  • Expected: Agent executes tool based on LLM decision, stores memory

End of Week 3:

  • Test: Review a small code change (mock PR)
  • Test: Reviewer runs static analysis, AI review, posts findings
  • Test: Finding dismissed, confidence decreases
  • Test: Phase bounce (review → implement → re-review)
  • Expected: Complete review with findings, bounce works

End of Week 4:

  • Test: Tester selects tests based on code changes
  • Test: Tester executes tests, analyzes failures
  • Test: Tester generates tests for gaps
  • Test: Phase bounce (test → implement → re-test)
  • Expected: Test failures analyzed, gaps filled, bounce works

End of Week 5:

  • Test: Full flow: requirements → plan → implement → review → test
  • Test: Self-validation loop fixes typecheck/lint/test issues
  • Test: Phase bounces work across all agents
  • Expected: End-to-end flow produces valid, tested code

End of Week 6:

  • Test: "forge run" executes complete pipeline
  • Test: Checkpoints created and restored correctly
  • Test: Human approval gates pause and resume pipeline
  • Test: Circuit breakers halt pipeline when thresholds exceeded
  • Expected: Complete pipeline execution with all safety controls

End of Week 7:

  • Test: All CLI commands work ("forge run", "forge review", "forge test", "forge status", "forge history", "forge init", "forge consolidate")
  • Test: Terminal UI displays correctly
  • Test: Memory consolidation extracts patterns
  • Expected: Polished user experience, all commands functional

End of Week 8:

  • Test: Recovery from all failure scenarios (network error, tool timeout, circuit breaker, user interrupt)
  • Test: Cost tracking accurate across multiple runs
  • Test: Real-world project testing (run full pipeline on actual codebase)
  • Expected: Production-ready system, all edge cases handled

File Creation Checklist by Week

Week 1 Files

src/memory/schema.ts
src/core/types.ts
src/core/errors.ts
src/core/schemas.ts
src/core/bus.ts
src/core/config.ts
src/safety/breakers.ts
src/safety/budget.ts
src/tools/llm.ts
src/tools/llm-anthropic.ts
src/tools/prompts.ts
tests/core/types.test.ts
tests/core/errors.test.ts
tests/core/bus.test.ts
tests/core/config.test.ts
tests/safety/breakers.test.ts
tests/tools/llm.test.ts
drizzle.config.ts
drizzle/0000_initial.sql
scripts/seed.ts
forge.config.example.ts

Week 2 Files

src/memory/store.ts
src/memory/similarity.ts
src/memory/episodes.ts
src/memory/patterns.ts
src/tools/registry.ts
src/tools/git.ts
src/tools/runner.ts
src/tools/linter.ts
src/agents/base.ts
tests/memory/store.test.ts
tests/memory/similarity.test.ts
tests/memory/episodes.test.ts
tests/memory/patterns.test.ts
tests/tools/registry.test.ts
tests/tools/git.test.ts
tests/tools/runner.test.ts
tests/agents/base.integration.test.ts

Week 3 Files

src/agents/reviewer.ts
src/agents/reviewer-types.ts
src/agents/reviewer-static.ts
src/agents/reviewer-ai.ts
src/agents/reviewer-prompts.ts
src/agents/reviewer-github.ts
src/agents/reviewer-persistence.ts
src/agents/reviewer-learning.ts
src/tools/github.ts
tests/agents/reviewer.test.ts
tests/agents/reviewer-static.test.ts
tests/agents/reviewer-ai.test.ts
tests/agents/reviewer-learning.test.ts
tests/tools/github.test.ts

Week 4 Files

src/agents/tester.ts
src/agents/tester-selection.ts
src/agents/tester-analysis.ts
src/agents/tester-prompts.ts
src/agents/tester-gaps.ts
src/agents/tester-generator.ts
src/agents/tester-bounce.ts
src/agents/tester-metrics.ts
src/tools/test-runner.ts
tests/agents/tester.test.ts
tests/agents/tester-analysis.test.ts
tests/agents/tester-generator.test.ts
tests/agents/tester-bounce.test.ts
tests/tools/test-runner.test.ts

Week 5 Files

src/agents/planner.ts
src/agents/planner-analysis.ts
src/agents/planner-design.ts
src/agents/planner-tasks.ts
src/agents/planner-risk.ts
src/agents/planner-prompts.ts
src/agents/implementer.ts
src/agents/implementer-generation.ts
src/agents/implementer-validation.ts
src/agents/implementer-fix.ts
src/agents/implementer-prompts.ts
tests/agents/planner.test.ts
tests/agents/planner-design.test.ts
tests/agents/implementer.test.ts
tests/agents/implementer-validation.test.ts
tests/integration/full-flow.test.ts
tests/fixtures/sample-requirements.ts
tests/fixtures/sample-codebase/

Week 6 Files

src/orchestrator/pipeline.ts
src/orchestrator/phases.ts
src/orchestrator/checkpoint.ts
src/orchestrator/context.ts
src/safety/gates.ts
src/cli/gates.ts
src/cli/commands/run.ts
tests/orchestrator/pipeline.test.ts
tests/orchestrator/checkpoint.test.ts
tests/orchestrator/context.test.ts
tests/safety/gates.test.ts
tests/integration/pipeline-full.test.ts

Week 7 Files

src/cli/index.ts
src/cli/commands/run.ts (refine)
src/cli/commands/review.ts
src/cli/commands/test.ts
src/cli/commands/status.ts
src/cli/commands/history.ts
src/cli/commands/init.ts
src/cli/commands/consolidate.ts
src/cli/ui/spinner.ts
src/cli/ui/format.ts
src/cli/ui/table.ts
src/cli/ui/progress.ts
src/cli/ui/findings.ts
src/cli/ui/events.ts
src/cli/ui/dashboard.ts
src/core/config-loader.ts
src/memory/consolidate.ts
forge.config.template.ts
tests/core/config-loader.test.ts
tests/memory/consolidate.test.ts

Week 8 Files

src/core/retry.ts
src/core/fallback.ts
src/cli/commands/resume.ts
src/cli/commands/costs.ts
src/orchestrator/recovery.ts
src/safety/cost-tracker.ts
src/cli/ui/cost-dashboard.ts
tests/core/retry.test.ts
tests/orchestrator/recovery.test.ts
tests/safety/cost-tracker.test.ts
tests/real-world/
README.md
docs/ARCHITECTURE.md
docs/CONFIGURATION.md
docs/CLI.md
docs/DEVELOPMENT.md
CHANGELOG.md
examples/

Total Files: ~150+ (core implementation + tests + docs)


Daily Time Estimates Summary

WeekTotal HoursAvg Hours/Day
Week 125 hours5 hours
Week 227 hours5.4 hours
Week 326 hours5.2 hours
Week 426 hours5.2 hours
Week 528 hours5.6 hours
Week 626 hours5.2 hours
Week 725 hours5 hours
Week 827 hours5.4 hours
Total210 hours5.25 hours/day

Note: Each day is scoped to 4-6 hours of focused work, leaving buffer time for unexpected issues, learning, and context switching.


Success Criteria for MVP (v0.1)

At the end of 8 weeks, the system must meet these criteria to be considered production-ready:

Functional Criteria

  • "forge run" executes a complete pipeline from requirements to tested code
  • All five agents (Planner, Implementer, Reviewer, Tester, Deployer stub) operational
  • Memory system stores and recalls learnings across runs
  • Phase bounces work (review → fix, test → fix)
  • Human approval gates work for high-risk decisions
  • Circuit breakers halt runaway execution
  • Checkpoint system allows resume from failure
  • CLI provides commands for all core workflows
  • Real-world testing on 2+ projects successful

Quality Criteria

  • Test coverage > 80% for core modules
  • All integration tests pass
  • No critical bugs in issue tracker
  • Performance acceptable (pipeline completes in <30 min for simple features)
  • Cost per run < $10 for typical feature

Documentation Criteria

  • README with installation and quick start
  • Architecture documentation
  • CLI reference
  • Configuration guide
  • At least 2 working examples

Safety Criteria

  • No unintended code execution
  • No secret leakage
  • Human approval required for production deploys
  • Cost limits enforced
  • Error recovery tested for all failure modes

Post-MVP Roadmap (Weeks 9-12+)

After v0.1 release, these are the next priorities:

Week 9-10: Deployer Agent (Full Implementation)

Currently just a stub. Implement:

  • Canary deployment strategy
  • Health check monitoring
  • Auto-rollback on failure
  • Feature flag integration

Week 11: Multi-Agent Parallelization

  • Implement agent swarm for parallel implementation
  • Coordinate multiple implementers working on different modules
  • Merge strategy for parallel changes

Week 12: Advanced Memory

  • Vector database integration (replace naive similarity search)
  • Knowledge graph for relationships between patterns
  • Cross-run learning improvements
  • Meta-reflection (reflect on reflection quality)

Beyond Week 12

  • Natural language requirements interface
  • Multi-repo intelligence
  • Self-improving prompts (GEP protocol full implementation)
  • Visual design-to-code pipeline
  • Integration with issue trackers (Jira, Linear)

Conclusion

This 8-week build plan provides a detailed, day-by-day roadmap for implementing the Forge agentic SDLC orchestration system. Each day's work is scoped to 4-6 hours of focused development, with clear deliverables and acceptance criteria.

Key Success Factors:

  1. Follow the order strictly - Dependencies are real; skipping ahead will cause rework
  2. Test continuously - Each day has tests; don't accumulate testing debt
  3. Commit frequently - Small, atomic commits make debugging easier
  4. Review the system design - Keep SYSTEM-DESIGN.md open; refer to it constantly
  5. Adapt as needed - This is a plan, not a contract; adjust based on learnings

By end of Week 8, you will have a production-ready v0.1 of Forge that can orchestrate the full SDLC with AI agents, learn from every execution, and provide a polished developer experience.

Now go build it.