architecture

February 8, 2026

Implementation Sub-Plan: Build Order (8-Week Detailed Breakdown)

Document: 13-build-order.md Date: 2026-02-07 Status: Detailed implementation plan Source: SYSTEM-DESIGN.md Section 13, 02-roadmap.md, 01-architecture.md

Overview

This document breaks down the 8-week build order from the system design into specific day-by-day tasks. Each day is scoped to 4-6 hours of focused work. The plan follows a "skeleton first, organs second" approach: build the foundational infrastructure early, then fill in specialized agent logic.

Core Philosophy:

Week 1-2: Foundation (types, bus, memory, tools)
Week 3-4: First vertical slice (Reviewer)
Week 5: Second agent (Tester)
Week 6: Complete agent set (Planner + Implementer)
Week 7: Orchestration layer
Week 8: Polish and harden

Week 1: Core Skeleton

Goal: Build the foundational types, event system, configuration, and database schema that everything else depends on. By end of week, the project compiles and has passing tests for all core modules.

Day 1: Project Initialization and Schema Design

Time Estimate: 5 hours Dependencies: None

Tasks:

Initialize Bun project with TypeScript
- bun init
- Configure tsconfig.json (strict mode, path aliases)
- Add dependencies: drizzle-orm, better-sqlite3, zod, ulid

Set up directory structure

forge/
├── src/
│   ├── core/
│   ├── safety/
│   ├── memory/
│   ├── tools/
│   ├── agents/
│   ├── orchestrator/
│   └── cli/
├── drizzle/
├── tests/
└── package.json

Create Drizzle schema (src/memory/schema.ts)
- Events table
- Memories table
- Patterns table
- Checkpoints table
- Runs table
- Findings table
- (Full schema from SYSTEM-DESIGN.md Section 5)
Set up Drizzle migrations
- drizzle.config.ts
- Generate initial migration
- Create seed data script

Files to Create:

package.json
tsconfig.json
src/memory/schema.ts
drizzle.config.ts
drizzle/0000_initial.sql
scripts/seed.ts

Acceptance Criteria:

✓ bun run typecheck passes
✓ bun run drizzle-kit migrate creates database
✓ Seed script populates test data

Day 2: Core Types and Error Taxonomy

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Implement core abstractions (src/core/types.ts)
- Agent interface
- ForgeEvent interface
- Tool interface
- Phase interface
- Memory interface
- Checkpoint interface
- All supporting types from SYSTEM-DESIGN.md Section 3
Create error taxonomy (src/core/errors.ts)
- Base ForgeError class
- Specialized errors:
  - CircuitBreakerError
  - ConfigurationError
  - AgentError
  - ToolExecutionError
  - ValidationError
- Error classification helpers
- Severity and recoverability enums
Create Zod schemas for validation (src/core/schemas.ts)
- Validation schemas for all core types
- Runtime type guards

Files to Create:

src/core/types.ts
src/core/errors.ts
src/core/schemas.ts
tests/core/types.test.ts
tests/core/errors.test.ts

Acceptance Criteria:

✓ All types export correctly
✓ Error classes have proper inheritance
✓ Zod schemas validate example data
✓ Test coverage > 80%

Day 3: Event Bus Implementation

Time Estimate: 5 hours Dependencies: Day 2 complete

Tasks:

Implement in-memory event bus (src/core/bus.ts)
- EventBus class with emit/on/off
- Wildcard subscriptions
- Event persistence to SQLite
- Replay functionality
- Subscription cleanup
Add event tracing
- Generate traceId (ulid)
- Chain events with trace IDs
- Event timestamps and ordering
Create event helpers
- Event builder pattern
- Common event factories
- Event filtering utilities

Files to Create:

src/core/bus.ts
tests/core/bus.test.ts

Acceptance Criteria:

✓ Can emit events
✓ Subscribers receive events
✓ Events persist to database
✓ Replay reconstructs event history
✓ Wildcard subscriptions work
✓ No memory leaks in subscriptions

Day 4: Configuration and Safety Defaults

Time Estimate: 4 hours Dependencies: Day 2 complete

Tasks:

Implement configuration system (src/core/config.ts)
- Default configuration from SYSTEM-DESIGN.md Section 8
- Per-project config loading (forge.config.ts)
- Environment variable overrides
- Config validation with Zod
- Config merge strategy
Create safety control structures (src/safety/breakers.ts)
- CircuitBreaker class
- Iteration counter breaker
- Cost tracker breaker
- Time limit breaker
- Error rate breaker
- Breaker state machine (closed/open/half-open)
Implement safety budget (src/safety/budget.ts)
- Cost tracking per phase
- Cost tracking per run
- Budget exhaustion handling

Files to Create:

src/core/config.ts
src/safety/breakers.ts
src/safety/budget.ts
forge.config.example.ts
tests/core/config.test.ts
tests/safety/breakers.test.ts

Acceptance Criteria:

✓ Default config loads
✓ Can override with project config
✓ Circuit breakers trip at thresholds
✓ Budget tracking is accurate

Day 5: LLM Provider Abstraction

Time Estimate: 6 hours Dependencies: Day 2, Day 4 complete

Tasks:

Create LLM provider interface (src/tools/llm.ts)
- LLMProvider interface
- ChatRequest / ChatResponse types
- Token counting utilities
- Cost calculation per model
Implement Anthropic provider
- Claude API integration
- Streaming support
- Tool use protocol
- Error handling and retry
- Rate limiting
Add prompt management
- System prompt templates
- Message formatting
- Token limit enforcement
Create LLM cost tracker
- Track tokens per call
- Calculate USD cost
- Integrate with budget system

Files to Create:

src/tools/llm.ts
src/tools/llm-anthropic.ts
src/tools/prompts.ts
tests/tools/llm.test.ts (with mocks)

Acceptance Criteria:

✓ Can make chat completions
✓ Tool use protocol works
✓ Cost tracking is accurate
✓ Retries on transient errors
✓ Integration test with real API passes

Week 1 Deliverable:

Project skeleton that compiles
Core types defined
Event bus working
Config and safety defaults in place
LLM abstraction ready
All unit tests passing

Week 2: Memory + Tools

Goal: Build the memory system (store, recall, consolidation) and essential tool layer (git, runner, linter). By end of week, an agent loop can execute against a real LLM, call tools, and store memories.

Day 1: Memory Store CRUD

Time Estimate: 5 hours Dependencies: Week 1 complete

Tasks:

Implement memory store (src/memory/store.ts)
- CRUD operations for memories
- Query by type (episodic/semantic/procedural)
- Query by confidence threshold
- Query by recency
- Update confidence scores
- Archive low-confidence memories
Add memory indexing
- Tag-based search
- Context string matching
- Access tracking (lastAccessed, accessCount)
Implement confidence decay
- Weekly decay formula: confidence -= 0.05
- Reinforcement on access: confidence += 0.1
- Pruning threshold: confidence < 0.2

Files to Create:

src/memory/store.ts
tests/memory/store.test.ts

Acceptance Criteria:

✓ Can store and retrieve memories
✓ Confidence decay works correctly
✓ Can query by multiple criteria
✓ Access tracking increments

Day 2: Similarity Search and Embeddings

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Add embedding support to memory store
- Store embeddings as Float32Array in blob column
- Generate embeddings via LLM provider
- Cosine similarity calculation
Implement similarity search
- Embed query string
- Calculate similarity scores
- Return top-k memories
- Cache embeddings to reduce API calls
Create memory recall function
- recall(context, type, limit) → Memory[]
- Combine similarity + recency + confidence
- Ranking algorithm

Files to Create:

src/memory/similarity.ts
tests/memory/similarity.test.ts

Acceptance Criteria:

✓ Embeddings stored correctly
✓ Similarity search returns relevant memories
✓ Recall function uses multi-factor ranking

Day 3: Episodic and Semantic Memory

Time Estimate: 5 hours Dependencies: Day 1-2 complete

Tasks:

Implement episodic memory (src/memory/episodes.ts)
- Store execution events as episodic memories
- Link to trace ID
- Query episodes by time range
- Query episodes by outcome (success/failure)
Implement semantic memory (src/memory/patterns.ts)
- Extract patterns from episodic memories
- Pattern frequency tracking
- Success rate calculation
- Pattern trigger matching
Create pattern extraction logic
- LLM-based pattern extraction from events
- Pattern deduplication
- Pattern confidence scoring

Files to Create:

src/memory/episodes.ts
src/memory/patterns.ts
tests/memory/episodes.test.ts
tests/memory/patterns.test.ts

Acceptance Criteria:

✓ Episodes stored with trace IDs
✓ Patterns extracted from episodes
✓ Pattern matching works

Day 4: Tool Registry and Core Tools

Time Estimate: 6 hours Dependencies: Week 1 complete

Tasks:

Create tool registry (src/tools/registry.ts)
- Tool registration
- Tool discovery
- Tool execution with validation
- Tool sandboxing (execution timeout, resource limits)
Implement git tool (src/tools/git.ts)
- git status
- git diff
- git add
- git commit
- git push
- Parse git output
Implement shell runner tool (src/tools/runner.ts)
- Execute shell commands
- Capture stdout/stderr
- Timeout enforcement
- Working directory management
Implement linter tool (src/tools/linter.ts)
- Run ESLint or Biome
- Parse linter output
- Categorize findings

Files to Create:

src/tools/registry.ts
src/tools/git.ts
src/tools/runner.ts
src/tools/linter.ts
tests/tools/registry.test.ts
tests/tools/git.test.ts
tests/tools/runner.test.ts

Acceptance Criteria:

✓ Tools can be registered
✓ Tool input/output validated
✓ Git operations work
✓ Shell commands execute with timeout
✓ Linter integration works

Day 5: Base Agent with Loop + First Integration Test

Time Estimate: 6 hours Dependencies: Week 2 Days 1-4 complete

Tasks:

Implement base agent (src/agents/base.ts)
- Agent loop: perceive → reason → act → learn
- Integration with LLM provider
- Integration with tool registry
- Integration with memory store
- Circuit breaker integration
- Reflection after execution
Create working memory structure
- Message history
- Tool results
- Iteration tracking
Implement reflection mechanism
- Post-execution reflection prompt
- Extract learnings
- Store learnings in memory
Write first integration test
- Create simple agent that uses a tool
- Execute against real LLM
- Verify tool execution
- Verify memory storage

Files to Create:

src/agents/base.ts
tests/agents/base.integration.test.ts

Acceptance Criteria:

✓ Agent loop executes
✓ LLM decides which tool to use
✓ Tool executes successfully
✓ Memory stores execution
✓ Circuit breakers work
✓ Reflection extracts learnings

Week 2 Deliverable:

Memory system fully functional
Tool layer operational
Base agent can execute with real LLM
First integration test passing
Feedback loop foundation in place

Week 3: Reviewer Agent (First Vertical Slice)

Goal: Build a complete vertical slice: the Reviewer agent that reviews code changes, posts findings, learns from dismissals. This proves the full stack works end-to-end.

Day 1: Reviewer Agent Skeleton

Time Estimate: 5 hours Dependencies: Week 2 complete

Tasks:

Create reviewer agent (src/agents/reviewer.ts)
- Extend BaseAgent
- Define agent type: reviewer
- Three-layer review structure:
  - Layer 1: Static analysis
  - Layer 2: Security scan
  - Layer 3: AI review
Define reviewer input/output types
- ReviewInput: code changes, context
- ReviewOutput: findings, risk score, decision
Implement risk scoring algorithm
- Complexity component
- Change size component
- Criticality component
- Calculate overall risk level (low/medium/high/critical)
Define review decision logic
- Approve (low risk, no critical findings)
- Request changes (fixable issues)
- Require human (high risk or critical findings)

Files to Create:

src/agents/reviewer.ts
src/agents/reviewer-types.ts
tests/agents/reviewer.test.ts

Acceptance Criteria:

✓ Reviewer extends BaseAgent
✓ Risk scoring formula implemented
✓ Decision logic implemented

Day 2: Static Analysis Integration

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Create static analysis layer
- Integrate linter tool (ESLint/Biome)
- Run TypeScript strict check
- Run formatting check (Prettier/Biome)
- Parse and categorize findings
Map linter output to Finding type
- Extract file, line, column
- Map severity
- Categorize (style/correctness/etc)
- Determine fixability
Deduplicate findings
- Hash-based deduplication
- Merge similar findings

Files to Create:

src/agents/reviewer-static.ts
tests/agents/reviewer-static.test.ts

Acceptance Criteria:

✓ Static analysis runs on code changes
✓ Findings extracted correctly
✓ Deduplication works

Day 3: AI Review Layer

Time Estimate: 6 hours Dependencies: Day 1-2 complete

Tasks:

Create AI review layer
- Construct review prompt with code diff
- Include relevant patterns from memory
- Request LLM to review for:
  - Logic correctness
  - Edge cases
  - Performance implications
  - Architecture fit
Parse LLM review output
- Extract findings from structured output
- Map to Finding type
- Confidence scoring
Implement risk-based review depth
- Low risk: skip AI review (static only)
- Medium risk: AI review with fast model
- High risk: AI review with strong model
- Critical risk: AI review + human required

Files to Create:

src/agents/reviewer-ai.ts
src/agents/reviewer-prompts.ts
tests/agents/reviewer-ai.test.ts

Acceptance Criteria:

✓ AI review runs for medium+ risk
✓ Findings extracted from LLM
✓ Risk-based depth selection works

Day 4: GitHub Integration

Time Estimate: 5 hours Dependencies: Day 1-3 complete

Tasks:

Create GitHub tool (src/tools/github.ts)
- Fetch PR details
- Fetch PR diff
- Post review comments
- Update PR status check
- Dismiss comments
- React to comment events
Integrate with reviewer agent
- Fetch PR on review request
- Post findings as PR comments
- Update check status based on decision
- Group findings by file
Format review comments
- Markdown formatting
- Code snippets
- Severity badges
- Fix suggestions

Files to Create:

src/tools/github.ts
src/agents/reviewer-github.ts
tests/tools/github.test.ts (mocked)

Acceptance Criteria:

✓ Can fetch PR details
✓ Can post review comments
✓ Comments formatted nicely
✓ Status checks update

Day 5: Risk Scoring, Finding Persistence, Phase Bounce

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

Persist findings to database
- Store findings in findings table
- Link to run ID
- Track dismissals
Implement finding dismissal learning
- Track when humans dismiss findings
- Decrease confidence in related patterns
- Store dismissal reason as learning
Implement phase bounce logic (review → implement → re-review)
- Detect "changes requested" decision
- Package findings for implementer
- Track bounce count
- Max bounces: 3
Create review metrics
- False positive rate (dismissals / total)
- Review duration
- Findings per review

Files to Create:

src/agents/reviewer-persistence.ts
src/agents/reviewer-learning.ts
tests/agents/reviewer-learning.test.ts

Acceptance Criteria:

✓ Findings persist to database
✓ Dismissals decrease pattern confidence
✓ Bounce logic works
✓ Metrics tracked

Week 3 Deliverable:

Complete Reviewer agent operational
Three-layer review working
GitHub PR integration functional
Findings persisted and learned from
Phase bounce logic implemented

Week 4: Tester Agent

Goal: Build the Tester agent that selects tests, executes them, analyzes failures, and suggests fixes. Integrate with phase bounce (test → implement → retest).

Day 1: Tester Agent Skeleton + Test Selection

Time Estimate: 5 hours Dependencies: Week 3 complete

Tasks:

Create tester agent (src/agents/tester.ts)
- Extend BaseAgent
- Define agent type: tester
Define tester input/output types
- TestInput: code changes, existing test suite
- TestOutput: test results, failures, coverage, generated tests
Implement risk-based test selection
- Changed files → unit tests covering those files
- Medium+ risk → integration tests
- High+ risk → full suite
- Parse test files to build dependency graph
Create test selector algorithm
- Static analysis to find test files for modules
- Impact analysis for integration tests
- Prioritize by risk and recency

Files to Create:

src/agents/tester.ts
src/agents/tester-selection.ts
tests/agents/tester.test.ts

Acceptance Criteria:

✓ Tester extends BaseAgent
✓ Test selection based on changes works
✓ Risk-based selection works

Day 2: Test Runner Integration

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Create test runner tool (src/tools/test-runner.ts)
- Execute Jest/Vitest via shell
- Parse JSON output
- Capture stdout/stderr
- Timeout enforcement
Parse test results
- Extract passed/failed/skipped counts
- Extract failure messages
- Extract stack traces
- Extract coverage data
Implement retry logic for flaky tests
- Retry failed tests once
- Mark as flaky if inconsistent
- Track flakiness over time

Files to Create:

src/tools/test-runner.ts
tests/tools/test-runner.test.ts

Acceptance Criteria:

✓ Can execute tests
✓ Results parsed correctly
✓ Retry logic works
✓ Flaky tests detected

Day 3: Failure Analysis

Time Estimate: 6 hours Dependencies: Day 1-2 complete

Tasks:

Implement failure analyzer (src/agents/tester-analysis.ts)
- Classify failure type:
  - Real bug
  - Flaky test
  - Environment issue
  - Outdated snapshot
- Use LLM for root cause analysis
- Extract relevant code context
Create failure analysis prompt
- Include test code
- Include failure message
- Include relevant source code
- Request root cause and fix suggestion
Confidence scoring for suggested fixes
- High confidence (>0.7) → auto-fixable
- Low confidence → escalate to human

Files to Create:

src/agents/tester-analysis.ts
src/agents/tester-prompts.ts
tests/agents/tester-analysis.test.ts

Acceptance Criteria:

✓ Failures classified correctly
✓ LLM provides root cause analysis
✓ Fix suggestions generated
✓ Confidence scores assigned

Day 4: Test Gap Detection + Generation

Time Estimate: 5 hours Dependencies: Day 1-3 complete

Tasks:

Implement test gap detection
- Parse coverage report
- Identify uncovered lines in changed files
- Identify uncovered branches
- Identify functions without tests
Create test generator
- Use LLM to generate test cases
- Input: function signature, implementation, examples
- Output: test code
- Validate generated tests (parse, typecheck)
Format generated tests
- Match existing test style
- Follow project conventions
- Add descriptive test names

Files to Create:

src/agents/tester-gaps.ts
src/agents/tester-generator.ts
tests/agents/tester-generator.test.ts

Acceptance Criteria:

✓ Uncovered code detected
✓ Tests generated for gaps
✓ Generated tests are valid

Day 5: Phase Bounce Integration + Metrics

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

Implement test → implement → retest bounce
- Detect test failures
- Package failure analysis for implementer
- Track bounce count
- Max bounces: 2
Create test metrics
- Test duration
- Pass rate over time
- Flaky test rate
- Coverage delta (before/after)
- Test gap count
Integrate with memory system
- Store failure patterns
- Store successful fixes
- Recall similar failures from memory

Files to Create:

src/agents/tester-bounce.ts
src/agents/tester-metrics.ts
tests/agents/tester-bounce.test.ts

Acceptance Criteria:

✓ Test failures bounce to implementer
✓ Bounce count tracked
✓ Metrics tracked
✓ Failure patterns stored

Week 4 Deliverable:

Complete Tester agent operational
Test selection and execution working
Failure analysis with suggested fixes
Test generation for gaps
Phase bounce (test → fix → retest) working

Week 5: Planner + Implementer Agents

Goal: Build the Planner agent (requirements → plan) and Implementer agent (plan → code). Integrate self-validation loop (write → typecheck → test → fix). End-to-end flow works: plan → implement → review → test.

Day 1: Planner Agent - Requirements Analysis

Time Estimate: 5 hours Dependencies: Week 4 complete

Tasks:

Create planner agent (src/agents/planner.ts)
- Extend BaseAgent
- Define agent type: planner
Define planner input/output types
- PlanInput: requirements (natural language)
- PlanOutput: analysis, architecture, tasks, risk assessment
Implement requirements analysis
- Use LLM to parse requirements
- Extract acceptance criteria
- Identify constraints
- Decompose into stories
Create planning prompts
- Requirements analysis prompt
- Include codebase context (file structure)
- Include relevant patterns from memory

Files to Create:

src/agents/planner.ts
src/agents/planner-analysis.ts
src/agents/planner-prompts.ts
tests/agents/planner.test.ts

Acceptance Criteria:

✓ Requirements parsed into structured format
✓ Stories extracted
✓ Constraints identified

Day 2: Planner Agent - Architecture Design + Task Decomposition

Time Estimate: 6 hours Dependencies: Day 1 complete

Tasks:

Implement architecture design
- Use LLM to design system architecture
- Identify components
- Define interfaces
- Document architecture decisions
Implement task decomposition
- Break architecture into implementation tasks
- Order tasks by dependencies
- Estimate complexity per task
- Assign priorities
Implement risk assessment
- Calculate risk score (same formula as reviewer)
- Determine review depth requirement
- Determine test coverage requirement
Create implementation plan
- Combine architecture + tasks + risk
- Package for implementer

Files to Create:

src/agents/planner-design.ts
src/agents/planner-tasks.ts
src/agents/planner-risk.ts
tests/agents/planner-design.test.ts

Acceptance Criteria:

✓ Architecture designed
✓ Tasks decomposed and ordered
✓ Risk assessed
✓ Plan packaged for implementer

Day 3: Implementer Agent - Code Generation

Time Estimate: 6 hours Dependencies: Day 1-2 complete

Tasks:

Create implementer agent (src/agents/implementer.ts)
- Extend BaseAgent
- Define agent type: implementer
Define implementer input/output types
- ImplementInput: plan, task
- ImplementOutput: code changes, tests added, validated
Implement code generation
- Use LLM to generate code from task spec
- Include architecture context
- Include existing code patterns
- Generate tests alongside code
Create file operations
- Read existing files
- Write new files
- Modify existing files (diff-based edits)

Files to Create:

src/agents/implementer.ts
src/agents/implementer-generation.ts
src/agents/implementer-prompts.ts
tests/agents/implementer.test.ts

Acceptance Criteria:

✓ Code generated from task spec
✓ Tests generated alongside code
✓ Files written to disk

Day 4: Implementer Agent - Self-Validation Loop

Time Estimate: 6 hours Dependencies: Day 1-3 complete

Tasks:

Implement self-validation loop
- After code generation:
  - Run typecheck
  - Run linter
  - Run affected tests
- If issues found:
  - Analyze issues
  - Fix issues
  - Loop back (max 3 iterations)
Create validation tools
- Typecheck runner (tsc --noEmit)
- Linter runner (reuse from Week 3)
- Test runner (reuse from Week 4)
Implement fix logic
- Parse validation errors
- Use LLM to suggest fixes
- Apply fixes
- Re-validate
Track validation metrics
- Iterations to valid code
- Issues fixed per iteration
- Final validation status

Files to Create:

src/agents/implementer-validation.ts
src/agents/implementer-fix.ts
tests/agents/implementer-validation.test.ts

Acceptance Criteria:

✓ Typecheck runs after code generation
✓ Linter runs after code generation
✓ Tests run after code generation
✓ Issues auto-fixed (up to max iterations)
✓ Final code typechecks and passes tests

Day 5: End-to-End Integration Test

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

Create integration test for full flow
- Input: simple feature requirement
- Plan → Implement → Review → Test
- Verify each phase output
- Verify phase transitions
- Verify bounce logic works
Test phase bounces
- Review finds issue → bounce to implement → fix → re-review
- Test fails → bounce to implement → fix → re-test
Verify memory integration
- Learnings stored at each phase
- Patterns extracted after run
- Memories recalled in next run
Create test fixtures
- Sample requirements
- Sample codebase
- Expected outputs

Files to Create:

tests/integration/full-flow.test.ts
tests/fixtures/sample-requirements.ts
tests/fixtures/sample-codebase/

Acceptance Criteria:

✓ Full flow executes without errors
✓ Phase bounces work
✓ Memories stored and recalled
✓ Final code is valid and tested

Week 5 Deliverable:

Planner agent functional (requirements → plan)
Implementer agent functional (plan → code)
Self-validation loop working
End-to-end flow tested (plan → implement → review → test)
Phase bounces working across all agents

Week 6: Orchestrator

Goal: Build the orchestration layer that coordinates agents through the pipeline. State machine, checkpoints, shared context, human gates. "forge run" executes the complete pipeline.

Day 1: Pipeline State Machine + Phase Sequencing

Time Estimate: 6 hours Dependencies: Week 5 complete

Tasks:

Create pipeline orchestrator (src/orchestrator/pipeline.ts)
- State machine: idle → planning → implementing → reviewing → testing → deploying → monitoring → completed
- Phase transitions
- Phase input/output validation
- Phase execution logic
Define phase configuration
- Each phase has:
  - Agent assignment
  - Guards (pre-conditions)
  - Gates (human approval checkpoints)
  - Breakers (circuit breakers)
  - Next phase
Implement phase execution
- Load phase config
- Check guards
- Execute agent
- Check breakers
- Check gates
- Transition to next phase
Handle phase failures
- Capture errors
- Determine recoverability
- Retry logic (if transient)
- Checkpoint before retry

Files to Create:

src/orchestrator/pipeline.ts
src/orchestrator/phases.ts
tests/orchestrator/pipeline.test.ts

Acceptance Criteria:

✓ State machine transitions correctly
✓ Phases execute in sequence
✓ Phase failures handled
✓ Guards and breakers enforced

Day 2: Checkpoint System

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Create checkpoint system (src/orchestrator/checkpoint.ts)
- Create checkpoint after each phase
- Store phase state to database
- List checkpoints for a run
- Restore from checkpoint
- Compare checkpoints (diff)
Define checkpoint schema
- Checkpoint ID (ulid)
- Trace ID (links to run)
- Phase name
- State (JSON serialized)
- Timestamp
Implement state serialization
- Serialize complex types (Dates, Maps, etc.)
- Deserialize on restore
- Validate restored state
Integrate with pipeline
- Auto-checkpoint after each phase
- Resume from last checkpoint on failure
- Clean up old checkpoints (retention policy)

Files to Create:

src/orchestrator/checkpoint.ts
tests/orchestrator/checkpoint.test.ts

Acceptance Criteria:

✓ Checkpoints created after each phase
✓ State serialized correctly
✓ Can restore from checkpoint
✓ Pipeline resumes from checkpoint

Day 3: Context Bus (Shared State)

Time Estimate: 5 hours Dependencies: Day 1-2 complete

Tasks:

Create context bus (src/orchestrator/context.ts)
- Shared state store (in-memory + persistent)
- Key-value interface
- Get/set/update/subscribe
- Snapshot and restore
Implement context scoping
- Global context (across all runs)
- Run context (scoped to trace ID)
- Phase context (scoped to phase)
Integrate with agents
- Agents read from context
- Agents write to context
- Context persisted at checkpoints
Create context helpers
- Context builders
- Type-safe context accessors
- Context validation

Files to Create:

src/orchestrator/context.ts
tests/orchestrator/context.test.ts

Acceptance Criteria:

✓ Context shared across agents
✓ Context persisted at checkpoints
✓ Context scoping works
✓ Type safety maintained

Day 4: Human Approval Gates

Time Estimate: 5 hours Dependencies: Day 1-3 complete

Tasks:

Create gate system (src/safety/gates.ts)
- Define gate types:
  - Architecture approval (high-risk plans)
  - Production deploy (always)
  - Security findings (critical severity)
  - Cost overrun (approaching budget)
- Gate condition evaluation
- Gate timeout handling
Implement gate workflow
- Pause pipeline at gate
- Notify human (CLI prompt, webhook, etc.)
- Wait for approval/rejection
- Resume pipeline on approval
- Halt pipeline on rejection
- Timeout fallback (default: reject)
Create CLI prompts for gates
- Display gate context
- Display relevant information
- Yes/no/defer prompt
- Optional feedback input
Integrate with pipeline
- Check gates after phase execution
- Store gate decisions as events

Files to Create:

src/safety/gates.ts
src/cli/gates.ts
tests/safety/gates.test.ts

Acceptance Criteria:

✓ Gates evaluated correctly
✓ Pipeline pauses at gate
✓ CLI prompts display
✓ Approval resumes pipeline
✓ Rejection halts pipeline
✓ Timeout handled

Day 5: Full Pipeline Integration Test + "forge run"

Time Estimate: 5 hours Dependencies: Day 1-4 complete

Tasks:

Wire up complete pipeline
- Integrate all agents
- Integrate all phases
- Integrate checkpoints
- Integrate context bus
- Integrate gates
- Integrate breakers
Create "forge run" CLI command (src/cli/commands/run.ts)
- Parse requirements from CLI
- Initialize pipeline
- Execute pipeline
- Display progress
- Handle interruptions (Ctrl+C)
- Display final status
Create full integration test
- Run complete pipeline with real LLM
- Verify all phases execute
- Verify checkpoints created
- Verify gates triggered
- Verify final output

Files to Create:

src/cli/commands/run.ts
tests/integration/pipeline-full.test.ts

Acceptance Criteria:

✓ "forge run" command works
✓ Complete pipeline executes
✓ All agents coordinate correctly
✓ Checkpoints work
✓ Gates work
✓ Breakers work

Week 6 Deliverable:

Pipeline orchestrator functional
State machine working
Checkpoint system operational
Context bus sharing state
Human approval gates working
"forge run" executes complete pipeline
Full integration test passing

Week 7: CLI + Polish

Goal: Build a polished CLI with commands for running, reviewing, testing, status checks, and history. Terminal UI for progress and findings. Per-project configuration. Memory consolidation job.

Day 1: CLI Framework + Core Commands (Part 1)

Time Estimate: 5 hours Dependencies: Week 6 complete

Tasks:

Set up CLI framework
- Use commander or similar for CLI parsing
- Define command structure
- Add help text
- Add version flag
Implement "forge run" (refine from Week 6 Day 5)
- Accept requirements as argument or file
- Accept config overrides via flags
- Display progress indicators
- Handle errors gracefully
Implement "forge review"
- Manually trigger review of PR or local changes
- Display review findings
- Option to auto-fix findings
Create CLI utilities
- Spinner/progress indicators
- Error formatting
- Success messages

Files to Create:

src/cli/index.ts
src/cli/commands/run.ts (refine)
src/cli/commands/review.ts
src/cli/ui/spinner.ts
src/cli/ui/format.ts

Acceptance Criteria:

✓ CLI parses commands
✓ Help text displays
✓ "forge run" works
✓ "forge review" works
✓ Progress indicators display

Day 2: CLI Core Commands (Part 2)

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Implement "forge test"
- Manually trigger test execution
- Display test results
- Display failure analysis
- Option to generate missing tests
Implement "forge status"
- Display current pipeline status
- Display recent runs
- Display cost usage
- Display agent health
Implement "forge history"
- List past runs
- Filter by status/date
- Display run details
- Replay events from a run
Create status display
- Table formatting
- Color coding by status
- Duration formatting
- Cost formatting

Files to Create:

src/cli/commands/test.ts
src/cli/commands/status.ts
src/cli/commands/history.ts
src/cli/ui/table.ts

Acceptance Criteria:

✓ "forge test" works
✓ "forge status" displays current state
✓ "forge history" lists runs
✓ Tables formatted nicely

Day 3: Terminal UI (Progress + Findings Display)

Time Estimate: 5 hours Dependencies: Day 1-2 complete

Tasks:

Create progress display for pipeline
- Show current phase
- Show phase progress (iteration count, time elapsed)
- Show cost so far
- Update in real-time
Create findings display
- Group findings by file
- Color code by severity
- Show code snippets
- Show fix suggestions
- Collapsible sections
Create event stream display
- Show events as they occur
- Filter by event type
- Expandable details
Create dashboard view
- Overview of system state
- Recent activity
- Cost dashboard
- Agent status

Files to Create:

src/cli/ui/progress.ts
src/cli/ui/findings.ts
src/cli/ui/events.ts
src/cli/ui/dashboard.ts

Acceptance Criteria:

✓ Progress updates in real-time
✓ Findings display is readable
✓ Events stream displays
✓ Dashboard shows system state

Day 4: Per-Project Configuration

Time Estimate: 4 hours Dependencies: Week 6 complete

Tasks:

Create configuration loader
- Load forge.config.ts from project root
- Merge with defaults
- Validate configuration
Define configuration schema
- Project metadata (name, language)
- LLM provider settings
- Tool commands (test, lint, build, typecheck)
- Safety settings (cost limits, iteration limits)
- GitHub integration settings
- Memory settings
Implement "forge init" command
- Generate forge.config.ts template
- Interactive prompts for key settings
- Detect project type (package.json, etc.)
Create configuration validation
- Validate all required fields
- Validate types
- Provide helpful error messages

Files to Create:

src/cli/commands/init.ts
src/core/config-loader.ts
forge.config.template.ts
tests/core/config-loader.test.ts

Acceptance Criteria:

✓ "forge init" creates config file
✓ Config loads from project root
✓ Config merges with defaults
✓ Config validation works

Day 5: Memory Consolidation Job

Time Estimate: 6 hours Dependencies: Week 2 complete

Tasks:

Create consolidation job (src/memory/consolidate.ts)
- Run periodically (nightly or on-demand)
- Extract patterns from episodic memories
- Merge similar patterns
- Promote high-frequency episodes to patterns
- Decay confidence on unused memories
- Prune low-confidence memories
- Archive old memories
Implement pattern extraction
- Query recent episodic memories
- Use LLM to extract patterns
- Deduplicate patterns
- Calculate frequency and success rate
Implement memory pruning
- Identify low-confidence memories (< 0.2)
- Identify stale memories (not accessed in 90 days)
- Archive or delete
Implement "forge consolidate" command
- Manually trigger consolidation
- Display consolidation results
- Show patterns extracted
- Show memories pruned

Files to Create:

src/memory/consolidate.ts
src/cli/commands/consolidate.ts
tests/memory/consolidate.test.ts

Acceptance Criteria:

✓ Consolidation job runs
✓ Patterns extracted from episodes
✓ Similar patterns merged
✓ Low-confidence memories pruned
✓ "forge consolidate" command works

Week 7 Deliverable:

Polished CLI with all core commands
Terminal UI with progress and findings display
Per-project configuration system
"forge init" bootstraps new projects
Memory consolidation job operational
User experience polished

Week 8: Harden + Document

Goal: Error recovery, cost tracking dashboard, real-world testing on actual projects, edge case handling, documentation. Production-ready v0.1.

Day 1: Error Recovery (Part 1)

Time Estimate: 5 hours Dependencies: Week 7 complete

Tasks:

Implement retry logic
- Retry transient errors (network, rate limit)
- Exponential backoff
- Max retries per operation
- Log all retries
Implement fallback strategies
- If LLM fails: fallback to cached response or simpler prompt
- If tool fails: fallback to alternative tool or manual mode
- If agent fails: fallback to previous checkpoint
Classify errors by recoverability
- Transient (retry)
- Permanent (fail)
- User error (prompt for correction)
Enhance error messages
- Actionable error messages
- Suggest recovery steps
- Link to documentation

Files to Create:

src/core/retry.ts
src/core/fallback.ts
tests/core/retry.test.ts

Acceptance Criteria:

✓ Transient errors retried
✓ Fallback strategies work
✓ Error messages are helpful

Day 2: Error Recovery (Part 2) + Checkpoint Resume

Time Estimate: 5 hours Dependencies: Day 1 complete

Tasks:

Implement checkpoint resume
- "forge resume" command
- List resumable runs (failed or interrupted)
- Resume from last checkpoint
- Replay events up to checkpoint
Implement error recovery workflow
- When pipeline fails:
  - Create checkpoint
  - Log error
  - Notify user
  - Offer to resume or rollback
- When user resumes:
  - Restore from checkpoint
  - Attempt to fix error
  - Continue pipeline
Test recovery scenarios
- Network failure during LLM call
- Tool execution timeout
- Circuit breaker trip
- User interrupt (Ctrl+C)

Files to Create:

src/cli/commands/resume.ts
src/orchestrator/recovery.ts
tests/orchestrator/recovery.test.ts

Acceptance Criteria:

✓ Failed runs can be resumed
✓ Checkpoint restore works
✓ Recovery workflow works
✓ All recovery scenarios tested

Day 3: Cost Tracking Dashboard

Time Estimate: 5 hours Dependencies: Week 7 complete

Tasks:

Create cost tracking system
- Track costs per run
- Track costs per phase
- Track costs per agent
- Track costs per day/week/month
- Store in database
Create cost dashboard
- Display current spend
- Display budget remaining
- Display cost breakdown (by agent, by phase)
- Display cost trends over time
- Warn on approaching limits
Implement "forge costs" command
- Display cost dashboard
- Filter by date range
- Export cost data (CSV)
Add cost estimates
- Before pipeline execution, estimate cost
- Display estimate to user
- Prompt for confirmation if > threshold

Files to Create:

src/safety/cost-tracker.ts
src/cli/commands/costs.ts
src/cli/ui/cost-dashboard.ts
tests/safety/cost-tracker.test.ts

Acceptance Criteria:

✓ Costs tracked accurately
✓ Dashboard displays cost breakdown
✓ "forge costs" command works
✓ Cost estimates before execution
✓ Warnings on approaching limits

Day 4: Real-World Testing + Bug Fixes

Time Estimate: 6 hours Dependencies: Week 7 complete, Days 1-3 complete

Tasks:

Test on real projects
- Select 2-3 real codebases
- Run "forge review" on recent PRs
- Run "forge test" on test suites
- Run "forge run" with simple requirements
- Document all issues
Fix discovered bugs
- Prioritize critical bugs
- Fix issues from real-world testing
- Add regression tests
Performance profiling
- Profile slow operations
- Optimize database queries
- Add caching where appropriate
- Reduce LLM calls where possible
Improve robustness
- Add null checks
- Add input validation
- Handle edge cases
- Improve error handling

Files to Create:

tests/real-world/ (test logs and results)
Various bug fix commits

Acceptance Criteria:

✓ Tested on real projects
✓ Critical bugs fixed
✓ Performance acceptable
✓ Regression tests added

Day 5: Edge Cases + Documentation + Release Prep

Time Estimate: 6 hours Dependencies: Week 8 Days 1-4 complete

Tasks:

Handle edge cases
- Empty codebase
- Very large codebase
- Binary files
- Permission issues
- Network offline
- Out of disk space
- Invalid configuration
Write documentation
- README.md (installation, quick start)
- ARCHITECTURE.md (system overview)
- CONFIGURATION.md (config options)
- CLI.md (command reference)
- DEVELOPMENT.md (contributing guide)
Create examples
- Example configurations
- Example workflows
- Tutorial: first feature with Forge
Release preparation
- Update version to v0.1.0
- Write CHANGELOG.md
- Create GitHub release
- Tag release commit

Files to Create:

README.md
docs/ARCHITECTURE.md
docs/CONFIGURATION.md
docs/CLI.md
docs/DEVELOPMENT.md
CHANGELOG.md
examples/ (various examples)

Acceptance Criteria:

✓ Edge cases handled
✓ Documentation complete
✓ Examples working
✓ Release tagged

Week 8 Deliverable:

Error recovery working
Checkpoint resume functional
Cost tracking dashboard
Real-world testing complete
Bugs fixed
Edge cases handled
Documentation complete
Production-ready v0.1 released

Dependencies Between Weeks

Critical Path

The following must be completed in sequence (cannot parallelize):

Week 1 → Week 2 → Week 3
                    ↓
                 Week 4
                    ↓
                 Week 5
                    ↓
                 Week 6 → Week 7 → Week 8

Dependency Details

Week	Must Complete Before	Why
Week 1	Week 2	Memory and tools need core types, bus, config
Week 2	Week 3	Reviewer needs memory system and tools
Week 3	Week 4	Tester uses same patterns as Reviewer
Week 4	Week 5	Planner/Implementer need Reviewer and Tester to exist for bounces
Week 5	Week 6	Orchestrator coordinates existing agents
Week 6	Week 7	CLI commands need working orchestrator
Week 7	Week 8	Can't harden what doesn't exist yet

Parallelization Opportunities

Within each week, some tasks can be parallelized:

Week 1:

Day 3 (Bus) and Day 4 (Config) can be parallelized

Week 2:

Day 1-2 (Memory) and Day 4 (Tools) can be partially parallelized

Week 3:

Day 2 (Static) and Day 3 (AI review) can be developed in parallel

Week 5:

Day 1-2 (Planner) and Day 3-4 (Implementer) can be parallelized by 2 developers

Week 7:

Day 1-2 (CLI commands) and Day 3 (UI) can be parallelized

Risk per Week

Week 1 Risks

Risk	Likelihood	Impact	Mitigation
Schema design wrong	Medium	High	Review against all agent requirements before Day 2
Drizzle setup issues	Low	Medium	Use Drizzle docs, test migrations early
Type system too complex	Medium	Medium	Start simple, iterate; avoid premature abstraction

Week 2 Risks

Risk	Likelihood	Impact	Mitigation
Similarity search too slow	Medium	Medium	Start with simple implementation, optimize later
LLM API rate limits	Medium	Medium	Implement exponential backoff, use caching
Memory bloat	Medium	High	Implement pruning early; test with large datasets

Week 3 Risks

Risk	Likelihood	Impact	Mitigation
AI review too expensive	High	High	Use cheap model (Haiku) for reviews; cache aggressively
GitHub API rate limits	Medium	Medium	Implement backoff; use GraphQL for efficiency
False positive rate too high	High	Medium	Tune confidence thresholds; learn from dismissals

Week 4 Risks

Risk	Likelihood	Impact	Mitigation
Test selection misses bugs	Medium	Critical	Always run baseline smoke tests; increase coverage for high-risk
Flaky test detection fails	Medium	Medium	Require 2+ inconsistent results to mark as flaky
Test generation low quality	High	Low	Human review required before merging generated tests

Week 5 Risks

Risk	Likelihood	Impact	Mitigation
Code generation quality poor	High	Critical	Self-validation loop catches many issues; reviewer catches rest
Self-validation loop infinite	Medium	High	Max iterations (3); circuit breakers
Integration test takes too long	Medium	Medium	Use small test fixture; mock LLM for speed tests

Week 6 Risks

Risk	Likelihood	Impact	Mitigation
State machine bugs	Medium	High	Thorough testing of all state transitions
Checkpoint serialization fails	Medium	High	Test with complex state; validate on restore
Phase bounce infinite loop	Medium	High	Max bounce count (3); circuit breaker

Week 7 Risks

Risk	Likelihood	Impact	Mitigation
CLI UX confusing	Medium	Medium	User testing; iterate on feedback
Configuration too complex	Medium	Medium	Sensible defaults; "forge init" generates valid config
Memory consolidation too slow	Low	Medium	Run async; user can continue working

Week 8 Risks

Risk	Likelihood	Impact	Mitigation
Real-world testing uncovers major bugs	High	High	Budget extra time; prioritize critical bugs
Recovery logic flawed	Medium	Critical	Test all recovery scenarios; manual testing
Not enough time for polish	High	Medium	Cut scope if needed; v0.1 is MVP, not perfect

Acceptance Criteria per Week

Week 1: Core Skeleton

Project initializes and compiles with bun run typecheck
Database schema created and migrations run
All core types exported from src/core/types.ts
Error taxonomy has 5+ specialized error classes
Event bus can emit, subscribe, persist, and replay events
Configuration loads from defaults and project config
Circuit breakers trip at configured thresholds
LLM abstraction can make chat completions with Anthropic
All unit tests pass with >80% coverage
At least one integration test with real LLM passes

Week 2: Memory + Tools

Memory store can create, read, update, delete memories
Similarity search returns relevant memories
Episodic memories stored and queried by trace ID
Patterns extracted from episodic memories
Tool registry can register and execute tools
Git tool can run basic git commands
Shell runner tool executes commands with timeout
Linter tool integrates with ESLint or Biome
Base agent loop executes with real LLM
Base agent can call tools and store memories
All unit tests pass with >80% coverage

Week 3: Reviewer Agent

Reviewer agent extends BaseAgent
Static analysis layer runs and parses lint output
AI review layer runs for medium+ risk changes
Risk scoring algorithm implemented
GitHub tool can fetch PR and post comments
Findings persist to database
Finding dismissals decrease pattern confidence
Phase bounce logic (review → implement → re-review) works
Review metrics tracked (false positive rate, duration)
Integration test: review a real PR

Week 4: Tester Agent

Tester agent extends BaseAgent
Risk-based test selection works
Test runner tool executes tests and parses output
Flaky test detection (retry logic) works
Failure analysis classifies failures correctly
LLM provides root cause analysis for failures
Test gap detection identifies uncovered code
Test generator creates valid tests for gaps
Phase bounce logic (test → implement → re-test) works
Test metrics tracked (pass rate, flaky rate, coverage)

Week 5: Planner + Implementer

Planner agent analyzes requirements and extracts stories
Planner agent designs architecture
Planner agent decomposes into ordered tasks
Planner agent assesses risk
Implementer agent generates code from task spec
Implementer agent generates tests alongside code
Self-validation loop runs typecheck, linter, tests
Self-validation loop auto-fixes issues (max 3 iterations)
End-to-end integration test: plan → implement → review → test
Phase bounces work across all agents

Week 6: Orchestrator

Pipeline state machine transitions through all phases
Checkpoint system creates checkpoints after each phase
Checkpoint restore works correctly
Context bus shares state across agents
Human approval gates pause pipeline and wait for approval
Circuit breakers halt pipeline when thresholds exceeded
"forge run" command executes complete pipeline
Full integration test with all phases passes

Week 7: CLI + Polish

CLI framework parses commands and displays help
"forge run" command works (from Week 6 refined)
"forge review" command works
"forge test" command works
"forge status" displays current pipeline status
"forge history" lists past runs
"forge init" generates project configuration
Terminal UI displays progress in real-time
Terminal UI displays findings nicely formatted
Memory consolidation job extracts patterns and prunes memories
"forge consolidate" command works

Week 8: Harden + Document

Retry logic handles transient errors
Fallback strategies implemented for LLM and tool failures
"forge resume" command resumes failed runs from checkpoint
Error recovery workflow tested (all scenarios)
Cost tracking accurate per run/phase/agent
"forge costs" command displays cost dashboard
Cost estimates shown before pipeline execution
Tested on 2-3 real projects
Critical bugs from real-world testing fixed
Performance profiled and optimized
Edge cases handled (empty codebase, large codebase, etc.)
Documentation complete (README, architecture, CLI reference, configuration, development guide)
Examples created and working
v0.1.0 tagged and released

Integration Test Plan

Week Boundary Integration Tests

End of Week 1:

Test: Initialize project, load config, emit event, persist to database, replay events
Test: LLM chat completion with real API
Expected: All pass, no errors

End of Week 2:

Test: Store memory, query by similarity, recall relevant memories
Test: Register tool, execute tool, capture result
Test: Base agent loop with LLM + tool execution
Expected: Agent executes tool based on LLM decision, stores memory

End of Week 3:

Test: Review a small code change (mock PR)
Test: Reviewer runs static analysis, AI review, posts findings
Test: Finding dismissed, confidence decreases
Test: Phase bounce (review → implement → re-review)
Expected: Complete review with findings, bounce works

End of Week 4:

Test: Tester selects tests based on code changes
Test: Tester executes tests, analyzes failures
Test: Tester generates tests for gaps
Test: Phase bounce (test → implement → re-test)
Expected: Test failures analyzed, gaps filled, bounce works

End of Week 5:

Test: Full flow: requirements → plan → implement → review → test
Test: Self-validation loop fixes typecheck/lint/test issues
Test: Phase bounces work across all agents
Expected: End-to-end flow produces valid, tested code

End of Week 6:

Test: "forge run" executes complete pipeline
Test: Checkpoints created and restored correctly
Test: Human approval gates pause and resume pipeline
Test: Circuit breakers halt pipeline when thresholds exceeded
Expected: Complete pipeline execution with all safety controls

End of Week 7:

Test: All CLI commands work ("forge run", "forge review", "forge test", "forge status", "forge history", "forge init", "forge consolidate")
Test: Terminal UI displays correctly
Test: Memory consolidation extracts patterns
Expected: Polished user experience, all commands functional

End of Week 8:

Test: Recovery from all failure scenarios (network error, tool timeout, circuit breaker, user interrupt)
Test: Cost tracking accurate across multiple runs
Test: Real-world project testing (run full pipeline on actual codebase)
Expected: Production-ready system, all edge cases handled

File Creation Checklist by Week

Week 1 Files

src/memory/schema.ts
src/core/types.ts
src/core/errors.ts
src/core/schemas.ts
src/core/bus.ts
src/core/config.ts
src/safety/breakers.ts
src/safety/budget.ts
src/tools/llm.ts
src/tools/llm-anthropic.ts
src/tools/prompts.ts
tests/core/types.test.ts
tests/core/errors.test.ts
tests/core/bus.test.ts
tests/core/config.test.ts
tests/safety/breakers.test.ts
tests/tools/llm.test.ts
drizzle.config.ts
drizzle/0000_initial.sql
scripts/seed.ts
forge.config.example.ts

Week 2 Files

src/memory/store.ts
src/memory/similarity.ts
src/memory/episodes.ts
src/memory/patterns.ts
src/tools/registry.ts
src/tools/git.ts
src/tools/runner.ts
src/tools/linter.ts
src/agents/base.ts
tests/memory/store.test.ts
tests/memory/similarity.test.ts
tests/memory/episodes.test.ts
tests/memory/patterns.test.ts
tests/tools/registry.test.ts
tests/tools/git.test.ts
tests/tools/runner.test.ts
tests/agents/base.integration.test.ts

Week 3 Files

src/agents/reviewer.ts
src/agents/reviewer-types.ts
src/agents/reviewer-static.ts
src/agents/reviewer-ai.ts
src/agents/reviewer-prompts.ts
src/agents/reviewer-github.ts
src/agents/reviewer-persistence.ts
src/agents/reviewer-learning.ts
src/tools/github.ts
tests/agents/reviewer.test.ts
tests/agents/reviewer-static.test.ts
tests/agents/reviewer-ai.test.ts
tests/agents/reviewer-learning.test.ts
tests/tools/github.test.ts

Week 4 Files

src/agents/tester.ts
src/agents/tester-selection.ts
src/agents/tester-analysis.ts
src/agents/tester-prompts.ts
src/agents/tester-gaps.ts
src/agents/tester-generator.ts
src/agents/tester-bounce.ts
src/agents/tester-metrics.ts
src/tools/test-runner.ts
tests/agents/tester.test.ts
tests/agents/tester-analysis.test.ts
tests/agents/tester-generator.test.ts
tests/agents/tester-bounce.test.ts
tests/tools/test-runner.test.ts

Week 5 Files

src/agents/planner.ts
src/agents/planner-analysis.ts
src/agents/planner-design.ts
src/agents/planner-tasks.ts
src/agents/planner-risk.ts
src/agents/planner-prompts.ts
src/agents/implementer.ts
src/agents/implementer-generation.ts
src/agents/implementer-validation.ts
src/agents/implementer-fix.ts
src/agents/implementer-prompts.ts
tests/agents/planner.test.ts
tests/agents/planner-design.test.ts
tests/agents/implementer.test.ts
tests/agents/implementer-validation.test.ts
tests/integration/full-flow.test.ts
tests/fixtures/sample-requirements.ts
tests/fixtures/sample-codebase/

Week 6 Files

src/orchestrator/pipeline.ts
src/orchestrator/phases.ts
src/orchestrator/checkpoint.ts
src/orchestrator/context.ts
src/safety/gates.ts
src/cli/gates.ts
src/cli/commands/run.ts
tests/orchestrator/pipeline.test.ts
tests/orchestrator/checkpoint.test.ts
tests/orchestrator/context.test.ts
tests/safety/gates.test.ts
tests/integration/pipeline-full.test.ts

Week 7 Files

src/cli/index.ts
src/cli/commands/run.ts (refine)
src/cli/commands/review.ts
src/cli/commands/test.ts
src/cli/commands/status.ts
src/cli/commands/history.ts
src/cli/commands/init.ts
src/cli/commands/consolidate.ts
src/cli/ui/spinner.ts
src/cli/ui/format.ts
src/cli/ui/table.ts
src/cli/ui/progress.ts
src/cli/ui/findings.ts
src/cli/ui/events.ts
src/cli/ui/dashboard.ts
src/core/config-loader.ts
src/memory/consolidate.ts
forge.config.template.ts
tests/core/config-loader.test.ts
tests/memory/consolidate.test.ts

Week 8 Files

src/core/retry.ts
src/core/fallback.ts
src/cli/commands/resume.ts
src/cli/commands/costs.ts
src/orchestrator/recovery.ts
src/safety/cost-tracker.ts
src/cli/ui/cost-dashboard.ts
tests/core/retry.test.ts
tests/orchestrator/recovery.test.ts
tests/safety/cost-tracker.test.ts
tests/real-world/
README.md
docs/ARCHITECTURE.md
docs/CONFIGURATION.md
docs/CLI.md
docs/DEVELOPMENT.md
CHANGELOG.md
examples/

Total Files: ~150+ (core implementation + tests + docs)

Daily Time Estimates Summary

Week	Total Hours	Avg Hours/Day
Week 1	25 hours	5 hours
Week 2	27 hours	5.4 hours
Week 3	26 hours	5.2 hours
Week 4	26 hours	5.2 hours
Week 5	28 hours	5.6 hours
Week 6	26 hours	5.2 hours
Week 7	25 hours	5 hours
Week 8	27 hours	5.4 hours
Total	210 hours	5.25 hours/day

Note: Each day is scoped to 4-6 hours of focused work, leaving buffer time for unexpected issues, learning, and context switching.

Success Criteria for MVP (v0.1)

At the end of 8 weeks, the system must meet these criteria to be considered production-ready:

Functional Criteria

"forge run" executes a complete pipeline from requirements to tested code
All five agents (Planner, Implementer, Reviewer, Tester, Deployer stub) operational
Memory system stores and recalls learnings across runs
Phase bounces work (review → fix, test → fix)
Human approval gates work for high-risk decisions
Circuit breakers halt runaway execution
Checkpoint system allows resume from failure
CLI provides commands for all core workflows
Real-world testing on 2+ projects successful

Quality Criteria

Test coverage > 80% for core modules
All integration tests pass
No critical bugs in issue tracker
Performance acceptable (pipeline completes in <30 min for simple features)
Cost per run < $10 for typical feature

Documentation Criteria

README with installation and quick start
Architecture documentation
CLI reference
Configuration guide
At least 2 working examples

Safety Criteria

No unintended code execution
No secret leakage
Human approval required for production deploys
Cost limits enforced
Error recovery tested for all failure modes

Post-MVP Roadmap (Weeks 9-12+)

After v0.1 release, these are the next priorities:

Week 9-10: Deployer Agent (Full Implementation)

Currently just a stub. Implement:

Canary deployment strategy
Health check monitoring
Auto-rollback on failure
Feature flag integration

Week 11: Multi-Agent Parallelization

Implement agent swarm for parallel implementation
Coordinate multiple implementers working on different modules
Merge strategy for parallel changes

Week 12: Advanced Memory

Vector database integration (replace naive similarity search)
Knowledge graph for relationships between patterns
Cross-run learning improvements
Meta-reflection (reflect on reflection quality)

Beyond Week 12

Natural language requirements interface
Multi-repo intelligence
Self-improving prompts (GEP protocol full implementation)
Visual design-to-code pipeline
Integration with issue trackers (Jira, Linear)

Conclusion

This 8-week build plan provides a detailed, day-by-day roadmap for implementing the Forge agentic SDLC orchestration system. Each day's work is scoped to 4-6 hours of focused development, with clear deliverables and acceptance criteria.

Key Success Factors:

Follow the order strictly - Dependencies are real; skipping ahead will cause rework
Test continuously - Each day has tests; don't accumulate testing debt
Commit frequently - Small, atomic commits make debugging easier
Review the system design - Keep SYSTEM-DESIGN.md open; refer to it constantly
Adapt as needed - This is a plan, not a contract; adjust based on learnings

By end of Week 8, you will have a production-ready v0.1 of Forge that can orchestrate the full SDLC with AI agents, learn from every execution, and provide a polished developer experience.

Now go build it.