6 min
architecture
February 8, 2026

Forge Review: Architecture Module-by-Module Analysis

Forge Architecture — Module-by-Module Analysis

Deep-dive into each module's design, strengths, weaknesses, and alignment with SYSTEM-DESIGN.md.


1. Type System (src/types/index.ts — 775 lines)

Strengths

  • Comprehensive union types for all domain concepts (phases, severities, agent types)
  • Zod schemas paired with TypeScript types for runtime validation
  • Clear separation between input/output types per phase

Weaknesses

  • Duplicate error classes: ForgeError + ForgeBaseError in types collide with richer hierarchy in src/core/errors.ts
  • Two LLMClient.chat overloads with incompatible signatures create ambiguity for implementors
  • 775 lines in a single file — should be split by domain (agent types, event types, tool types, config types)

Alignment Score: 7/10

Covers all concepts from SYSTEM-DESIGN.md but the duplication indicates organic growth without pruning.


2. Core Module (src/core/)

Components

  • bus.ts — InMemoryEventBus (EventEmitter-based)
  • config.ts — Config loader with file detection
  • errors.ts — Rich error hierarchy
  • types.ts — Re-export barrel

Strengths

  • Error hierarchy is well-designed with details: ErrorDetails and type-safe getters
  • Config loader supports multiple file formats (.ts, .js, .mjs)
  • Wildcard event subscription (bus.on('*', ...)) is useful for debugging

Weaknesses

  • Unbounded event array in bus.ts — this.events.push(event) never prunes, will leak memory in long-running processes
  • Config merge bugbreakers field always uses defaults (line 90)
  • Two EventBus implementations — core/bus.ts (in-memory) and events/bus.ts (SQLite) with no adapter or factory to choose between them

Alignment Score: 6/10

SYSTEM-DESIGN.md specifies SQLite-backed event bus with checkpointing. The in-memory version exists as a simpler fallback but they're not unified.


3. Database Layer (src/db/schema.ts)

Strengths

  • Drizzle ORM schema covers all entities: events, agents, checkpoints, memories, patterns, runs, findings, executions
  • ULID generation for IDs (sortable, distributed-safe)
  • Proper relations defined between tables
  • JSON columns for flexible payload storage

Weaknesses

  • No migration files checked in — only drizzle.config.ts exists
  • Schema defines relations but some are aspirational (e.g., executions table referenced but no code writes to it)
  • Uses better-sqlite3 instead of bun:sqlite per CLAUDE.md preference

Alignment Score: 8/10

Good coverage of the data model from SYSTEM-DESIGN.md Section 4.


4. Agent Framework (src/agents/)

Components

  • base.ts — BaseAgent with perceive/reason/act/learn loop
  • planner.ts, implementer.ts, reviewer.ts, tester.ts, deployer.ts — 5 specialized agents
  • index.ts — Factory + metadata
  • pi-adapter.ts, pi-model-bridge.ts, pi-event-bridge.ts, pi-tool-converter.ts — pi-agent-core integration

Strengths

  • BaseAgent loop faithfully implements SYSTEM-DESIGN.md Section 6 (perceive → reason → act → learn)
  • Reviewer implements the designed 3-layer review (static → security → AI)
  • Tester has smart risk-based test selection (low/medium/high/critical → different scopes)
  • pi-agent-core adapter is well-structured with proper event bridging, safety integration, and cost tracking
  • Tool definitions use Zod schemas for validation

Weaknesses

  • Command injection in planner (glob via find), implementer (git branch/commit) — tools shell out unsafely
  • Inline tools — each agent defines its own tools instead of using the ToolRegistry. This means 30+ tools scattered across 5 files with no central inventory
  • Deployer's emit() is a no-op — entire agent runs blind to observability
  • BaseAgent reflection calls LLM on every act() success, even for trivial operations — expensive and wasteful
  • Duplicate system promptsPI_AGENT_PROMPTS in index.ts duplicates prompts already defined in each agent class

Alignment Score: 7/10

Core loop matches the design. Tool integration and safety are the main gaps.


5. Orchestrator (src/orchestrator/)

Components

  • pipeline.ts — State machine with bounce-back loops
  • context.ts — PipelineContext factory with defaults
  • checkpoint.ts — Checkpoint persistence (InMemory + SQLite)
  • beads-pipeline.ts — Alternative beads-driven pipeline
  • index.ts — Module exports

Strengths

  • Pipeline state machine correctly implements the bounce-back pattern from SYSTEM-DESIGN.md Section 10
  • Configurable max bounces for review (3) and test (2) with clear failure on exceeded limits
  • Phase input wiring properly passes outputs between phases (plan → impl → review → test → deploy)
  • Checkpoint support enables pipeline resumption after failures
  • Beads pipeline provides an alternative work-discovery mode that integrates with external issue tracking

Weaknesses

  • SQL injection in SQLiteCheckpointStorage.save() — string interpolation instead of parameterized queries
  • Default context is non-functionalDefaultLLMClient returns dummy strings, DefaultSafetyContext auto-approves everything
  • No retry logic — if a phase throws, the entire pipeline fails with no retry
  • Beads pipeline has hardcoded label-matching heuristics for phase determination

Alignment Score: 8/10

Closest module to the design spec. The state machine, bounce-backs, and checkpointing all match.


6. Tools Module (src/tools/)

Components

  • index.ts — Registry, sandbox, and tool category exports
  • beads.ts — 9 beads CLI wrapper tools
  • beads-availability.ts — Availability check for bd CLI

Strengths

  • Beads tools are well-structured: Zod schemas, proper error handling, JSON output parsing
  • Tool collections (beadsPlannerTools, beadsOrchestratorTools) provide role-appropriate subsets
  • Registry supports categories and metadata

Weaknesses

  • Registry is populated but never consumed — agents define their own tools inline
  • Async registration of beads tools creates race conditions
  • Sandbox is declared but tools run unsandboxedBun.spawn() with no resource limits

Alignment Score: 5/10

SYSTEM-DESIGN.md Section 5 specifies a tool registry with sandboxing, permissions, and timeout enforcement. Only the registry shell exists.


7. Safety Module (src/safety/)

Components

  • breakers.ts — 4 circuit breakers (iteration, cost, time, error-rate)
  • gates.ts — Human approval gate manager
  • budget.ts — Budget tracking and enforcement
  • index.ts — Module exports + SafetyManager

Strengths

  • All 4 circuit breaker types from SYSTEM-DESIGN.md are implemented
  • Budget tracking with per-run and per-day limits
  • Human gates with configurable automation levels
  • 5 TypeScript errors are minor (noUncheckedIndexedAccess issues)

Weaknesses

  • 5 TS errors prevent clean compilation
  • Circuit breakers are defined but only integrated through the pi-adapter path, not the BaseAgent path

Alignment Score: 8/10

Good coverage of SYSTEM-DESIGN.md Section 8.


8. Memory Module (src/memory/)

Components

  • store.ts — MemoryStore with SQLite backend
  • index.ts — MemoryManager coordinating episodic, semantic, and procedural memory

Strengths

  • 3 memory types match SYSTEM-DESIGN.md (episodic, semantic/pattern, procedural)
  • Confidence decay over time
  • Memory consolidation pipeline
  • Pattern extraction from episodic memories

Weaknesses

  • Similarity search is keyword-based only (no embeddings)
  • No integration tests for the consolidation pipeline

Alignment Score: 7/10

Structure matches design; quality of retrieval is the gap.


9. Events Module (src/events/bus.ts)

Strengths

  • SQLite persistence with proper table creation
  • Checkpoint snapshotting with phase tracking
  • Event replay capability

Weaknesses

  • Sort direction bug in getLatestCheckpoint() — returns oldest instead of latest
  • Raw SQL instead of Drizzle ORM (the rest of the app uses Drizzle)
  • Uses better-sqlite3 instead of bun:sqlite

Alignment Score: 7/10


10. CLI (src/cli/index.ts)

Strengths

  • Clean commander.js setup with 4 commands
  • Human gate integration via readline prompts
  • Beads mode correctly delegates to real pipeline

Weaknesses

  • forge run uses simulatePhase() (setTimeout) instead of the actual Pipeline class — the main command is non-functional
  • Review and test commands also use simulation
  • No structured output option (everything is console.log)

Alignment Score: 4/10

CLI skeleton exists but doesn't wire to real pipeline execution (except beads mode).


Overall Architecture Assessment

DimensionRatingNotes
Design Fidelity7/10Most SYSTEM-DESIGN.md concepts are represented in code
Code Quality5/10Command injection, SQL injection, TS errors, any types
Completeness6/10All modules exist but many have TODO/simulated paths
Test Coverage7/10104 tests pass, but no integration tests for the full pipeline
Security3/10Multiple injection vectors in tools that shell out
Observability5/10Event bus exists but deployer is silent, no structured logging
Production Readiness3/10Not ready — simulated CLI, injection vulns, no real LLM integration