architecture

February 8, 2026

Implementation Sub-Plan: Deferred Features & Future Extensibility

Section: 15 - What This Design Explicitly Defers Generated: 2026-02-07 Status: Draft Dependencies: System Design (SYSTEM-DESIGN.md), Roadmap, Orchestration, Self-Improvement, Evaluation, Human-AI Collaboration

Executive Summary

This plan catalogs everything NOT in the MVP but explicitly designed for. The goal is to ensure MVP architecture decisions don't paint us into a corner. Each deferred item specifies:

What the MVP MUST do to not block this later
What the MVP MUST NOT do to avoid breaking changes
Estimated effort to add post-MVP
The trigger/milestone that indicates it's time to build this

Philosophy: The MVP is sequential and single-process. Every abstraction boundary is designed so that swapping the implementation doesn't break calling code.

1. Parallel Agent Execution

Current: Sequential pipeline (Plan → Implement → Review → Test → Deploy) Future: Parallel implementation agents working on independent modules

1.1 MVP Requirements to Enable This

MUST implement:

typescript
// Agent interface already assumes statelessness
interface Agent {
  id: string;
  type: AgentType;

  // CRITICAL: No mutable shared state — all inputs via parameters
  execute(input: PhaseInput, ctx: AgentContext): Promise<PhaseOutput>;
}

// Context is READ-ONLY for agents
interface AgentContext {
  readonly memory: MemoryStore;
  readonly llm: LLMProvider;
  readonly bus: EventBus;
  readonly safety: SafetyControls;
  // Agents CANNOT modify these — only read and emit events
}

MUST design:

Event bus must support concurrent emitters (already true for in-memory pub/sub)
Each agent execution must have isolated working memory (no shared scratch space)
Phase outputs must be immutable once returned (freeze objects)
Tool execution must be thread-safe (each tool call gets its own sandbox)

MUST enforce:

typescript
// In base.ts
abstract class BaseAgent {
  private workingMemory: WorkingMemory; // Local to this execution

  async execute(input: PhaseInput, ctx: AgentContext): Promise<PhaseOutput> {
    // NEVER mutate input or ctx — treat as immutable
    const localContext = this.createLocalContext(ctx);
    const result = await this.runLoop(input, localContext);

    // Return immutable output
    return Object.freeze(result);
  }
}

1.2 MVP Must NOT Do

Forbidden patterns:

typescript
// ❌ BAD: Shared mutable state
class PlannerAgent {
  private cachedPlan: Plan; // Parallel agents would clobber this
}

// ❌ BAD: Context mutation
async execute(input, ctx) {
  ctx.sharedState.lastResult = result; // Race conditions
}

// ❌ BAD: Sequential assumptions
const code = await implementer.execute(plan);
// Assumes single implementer; breaks if parallel

Safe alternatives:

typescript
// ✅ GOOD: Stateless with explicit state passing
async execute(input: PhaseInput, ctx: AgentContext): Promise<PhaseOutput> {
  const state = this.perceive(input, ctx); // Local state
  const result = await this.act(state);
  return result; // No side effects on ctx
}

1.3 Orchestrator Changes for Parallelism

MVP orchestrator (sequential):

typescript
// pipeline.ts (MVP)
async function runPipelineSequential(task: string) {
  const plan = await runPhase('planning', { task });
  const code = await runPhase('implementation', plan); // One at a time
  // ...
}

Post-MVP orchestrator (parallel):

typescript
// pipeline.ts (Post-MVP)
async function runPipelineParallel(task: string) {
  const plan = await runPhase('planning', { task });

  // Split plan into independent modules
  const modules = plan.tasks.filter(t => !t.dependencies.length);

  // Spawn parallel implementers
  const implementations = await Promise.all(
    modules.map(module =>
      this.spawnImplementer(module, plan.architecture)
    )
  );

  // Work-stealing for dependent tasks
  const remaining = plan.tasks.filter(t => t.dependencies.length);
  const completed = await this.workStealingExecution(remaining, implementations);

  // Merge results
  const code = this.mergeImplementations([...implementations, ...completed]);
  // ...
}

Interface changes needed: NONE — execute() signature stays the same. Only orchestrator internals change.

1.4 Context Bus Changes

MVP (in-memory):

typescript
interface ContextBus {
  get<T>(key: string): T;
  events: EventBus;
}

Post-MVP (concurrent-safe):

typescript
interface ContextBus {
  get<T>(key: string): Readonly<T>; // Enforce immutability
  getSnapshot(): Snapshot;          // Atomic read
  events: EventBus;
}

1.5 Checkpoint System Changes

MVP (phase-level checkpoints):

typescript
interface Checkpoint {
  phase: PhaseName;
  state: Record<string, unknown>; // One state per phase
}

Post-MVP (agent-level checkpoints):

typescript
interface Checkpoint {
  phase: PhaseName;
  agents: Map<AgentId, AgentState>; // Multiple concurrent agents
  dependencies: DependencyGraph;     // Track which tasks depend on which
}

async function resumeFromCheckpoint(cp: Checkpoint) {
  // Resume all agents that weren't waiting on dependencies
  const resumable = cp.agents.filter(a => a.dependencies.every(d => cp.completed.has(d)));
  await Promise.all(resumable.map(a => a.resume()));
}

1.6 Estimated Effort

Engineering time: 2-3 weeks Breaking changes: None (if MVP interfaces are designed correctly) Components affected:

Orchestrator (major changes)
Checkpoint system (extend schema)
Context bus (add immutability enforcement)
Agents (no changes if stateless)

1.7 Trigger to Build This

Indicators:

Implementation phase takes >10 minutes for tasks that have independent modules
Profile shows agents spending >50% time idle waiting for sequential dependencies
User requests feature that spans >5 independent modules

Milestone: After 500+ successful sequential pipeline runs with no shared-state bugs.

2. Kubernetes Deployment

Current: Single Bun process on developer machine Future: Distributed system across K8s cluster

2.1 MVP Requirements to Enable This

MUST separate concerns:

forge/
├── src/
│   ├── orchestrator/    # Coordinator service (K8s Deployment)
│   ├── agents/          # Worker pods (K8s Jobs or Deployments)
│   ├── tools/           # Tool executors (might stay in agent pods)
│   ├── memory/          # Becomes external service (K8s StatefulSet)
│   └── core/            # Shared library

MUST design interfaces for network boundaries:

typescript
// In MVP, this is in-process function call
interface MemoryStore {
  store(memory: Memory): Promise<void>;
  recall(query: Query): Promise<Memory[]>;
}

// Post-MVP, this becomes gRPC/HTTP API
// But interface signature DOES NOT CHANGE
class RemoteMemoryStore implements MemoryStore {
  async store(memory: Memory): Promise<void> {
    await this.httpClient.post('/memory', memory);
  }
}

MUST externalize configuration:

typescript
// MVP: config.ts exports const
export const config = {
  llm: { provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY },
  memory: { dbPath: './.forge/memory.db' },
};

// Post-MVP: config comes from environment/ConfigMap
export const config = {
  llm: {
    provider: process.env.LLM_PROVIDER,
    apiKey: process.env.LLM_API_KEY,
  },
  memory: {
    dbUrl: process.env.DATABASE_URL, // PostgreSQL connection string
  },
};

2.2 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Hardcoded file paths
const dbPath = '/home/user/.forge/memory.db';

// ❌ BAD: Process-global singletons
class Orchestrator {
  private static instance: Orchestrator; // Doesn't work across pods
}

// ❌ BAD: In-memory state that can't be serialized
const cache = new Map(); // Lost when pod restarts

Safe alternatives:

typescript
// ✅ GOOD: Environment-driven paths
const dbPath = process.env.FORGE_DB_PATH || './.forge/memory.db';

// ✅ GOOD: Dependency injection
class Orchestrator {
  constructor(
    private bus: EventBus,
    private memory: MemoryStore,
    private llm: LLMProvider
  ) {}
}

// ✅ GOOD: External cache with TTL
const cache = new Redis({ url: process.env.REDIS_URL });

2.3 Component Containerization Plan

Orchestrator (Deployment, 1 replica):

yaml
# k8s/orchestrator.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: forge-orchestrator
spec:
  replicas: 1  # Single coordinator
  template:
    spec:
      containers:
      - name: orchestrator
        image: forge/orchestrator:latest
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: forge-secrets
              key: database-url

Agents (Jobs, ephemeral):

yaml
# Spawned dynamically by orchestrator
apiVersion: batch/v1
kind: Job
metadata:
  name: implementer-{{ .TaskId }}
spec:
  template:
    spec:
      containers:
      - name: implementer
        image: forge/agents:latest
        env:
        - name: AGENT_TYPE
          value: "implementer"
        - name: TASK_INPUT
          value: "{{ .TaskInputJSON }}"

Memory Store (StatefulSet with persistent volume):

yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: forge-memory
spec:
  serviceName: "memory"
  replicas: 1
  template:
    spec:
      containers:
      - name: postgres
        image: postgres:16
        volumeMounts:
        - name: memory-data
          mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
  - metadata:
      name: memory-data
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi

2.4 State Migration

SQLite → PostgreSQL:

sql
-- MVP schema (SQLite)
CREATE TABLE events (
  id TEXT PRIMARY KEY,
  trace_id TEXT NOT NULL,
  -- ...
);

-- Post-MVP schema (PostgreSQL)
CREATE TABLE events (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  trace_id UUID NOT NULL,
  -- Same columns, different types
);

Migration script:

typescript
async function migrateSQLiteToPostgres() {
  const sqlite = new Database('.forge/memory.db');
  const postgres = new Client(process.env.DATABASE_URL);

  const events = sqlite.prepare('SELECT * FROM events').all();

  for (const event of events) {
    await postgres.query(
      'INSERT INTO events (...) VALUES (...)',
      mapSQLiteToPostgres(event)
    );
  }
}

2.5 Event Bus Migration

In-memory → Redis Pub/Sub:

typescript
// MVP: bus.ts
class InMemoryEventBus implements EventBus {
  private handlers = new Map<string, Set<EventHandler>>();

  async emit(event: ForgeEvent) {
    const handlers = this.handlers.get(event.type) || new Set();
    handlers.forEach(h => h(event));
  }
}

// Post-MVP: redis-bus.ts (implements same interface!)
class RedisEventBus implements EventBus {
  private redis: Redis;

  async emit(event: ForgeEvent) {
    await this.redis.publish('forge:events', JSON.stringify(event));
  }

  on(type: string, handler: EventHandler) {
    this.redis.subscribe(`forge:events:${type}`);
    this.redis.on('message', (channel, message) => {
      const event = JSON.parse(message);
      handler(event);
    });
  }
}

Interface doesn't change — swap implementation via config.

2.6 Estimated Effort

Engineering time: 4-6 weeks Breaking changes: None if interfaces are clean New infrastructure:

Kubernetes cluster setup
PostgreSQL instance
Redis instance
Container registry
Monitoring (Prometheus/Grafana)

2.7 Trigger to Build This

Indicators:

Users running Forge on multiple machines need shared memory
Single-process execution can't scale to team usage
Need deployment isolation per user/team

Milestone: After 1000+ successful runs in single-process mode with no data loss.

3. Vector Database for Memory

Current: Brute-force cosine similarity in SQLite Future: pgvector / Qdrant / ChromaDB for fast similarity search

3.1 MVP Requirements to Enable This

MUST abstract similarity search behind interface:

typescript
// memory/store.ts (MVP)
interface MemoryStore {
  store(memory: Memory): Promise<void>;

  // This interface stays the same forever
  recall(query: RecallQuery): Promise<Memory[]>;
}

interface RecallQuery {
  context: string;        // Will be embedded
  type?: MemoryType;
  limit?: number;
  minConfidence?: number;
}

MVP implementation (brute force):

typescript
class SQLiteMemoryStore implements MemoryStore {
  async recall(query: RecallQuery): Promise<Memory[]> {
    // Embed query
    const queryEmbedding = await this.llm.embed(query.context);

    // Fetch all memories (brute force!)
    const allMemories = await this.db.select().from(memories);

    // Calculate similarity in-memory
    const withScores = allMemories.map(m => ({
      memory: m,
      score: cosineSimilarity(queryEmbedding, m.embedding)
    }));

    // Sort and filter
    return withScores
      .filter(m => m.score > (query.minConfidence || 0))
      .sort((a, b) => b.score - a.score)
      .slice(0, query.limit || 10)
      .map(m => m.memory);
  }
}

This is O(N) — acceptable for <100K memories.

3.2 Post-MVP Implementation (vector DB)

typescript
class PgVectorMemoryStore implements MemoryStore {
  async recall(query: RecallQuery): Promise<Memory[]> {
    const queryEmbedding = await this.llm.embed(query.context);

    // pgvector does the heavy lifting
    const results = await this.db.execute(sql`
      SELECT *, 1 - (embedding <=> ${queryEmbedding}) as similarity
      FROM memories
      WHERE similarity > ${query.minConfidence || 0}
      ORDER BY similarity DESC
      LIMIT ${query.limit || 10}
    `);

    return results.map(r => this.mapToMemory(r));
  }
}

This is O(log N) with HNSW index — scales to millions.

3.3 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Leaking implementation details
interface RecallQuery {
  embedding: Float32Array; // Forces caller to know about embeddings
}

// ❌ BAD: Vector-DB-specific query syntax
interface RecallQuery {
  hnswQuery: HNSWQuery; // Locks us into HNSW
}

Safe:

typescript
// ✅ GOOD: Abstract, high-level query
interface RecallQuery {
  context: string;        // Store handles embedding internally
  filters?: RecallFilters; // Generic filters, not DB-specific
}

3.4 Migration Path

Step 1: Add vector DB as secondary store (dual-write)

typescript
class DualWriteMemoryStore implements MemoryStore {
  constructor(
    private sqlite: SQLiteMemoryStore,
    private vector: PgVectorMemoryStore
  ) {}

  async store(memory: Memory) {
    await Promise.all([
      this.sqlite.store(memory),
      this.vector.store(memory)  // Write to both
    ]);
  }

  async recall(query: RecallQuery) {
    return this.vector.recall(query); // Read from vector DB
  }
}

Step 2: Backfill historical data

typescript
async function backfillToVectorDB() {
  const allMemories = await sqlite.getAll();
  for (const memory of allMemories) {
    await vectorDB.store(memory);
  }
}

Step 3: Remove SQLite (single-write)

3.5 Performance Threshold

Brute-force is acceptable until:

Memory count > 100,000
Recall latency > 500ms (p95)
Memory usage > 2GB for embeddings

When to migrate:

If recall queries take >1 second
If memory table exceeds 100K rows
If adding an embedding index to SQLite doesn't help

3.6 Estimated Effort

Engineering time: 1-2 weeks Breaking changes: None (swap implementation) New infrastructure:

pgvector extension on PostgreSQL, OR
Qdrant instance, OR
ChromaDB instance

3.7 Trigger to Build This

Indicators:

Memory table exceeds 50K rows
Recall queries taking >500ms
Users complaining about slow memory retrieval

Milestone: When brute-force becomes measurably slow (>1s p95).

4. Multi-Repo Intelligence

Current: Memory scoped to single repository Future: Learn patterns across all codebases a user works on

4.1 MVP Requirements to Enable This

MUST namespace memories by repo:

typescript
// memory/schema.ts
export const memories = sqliteTable('memories', {
  id: text('id').primaryKey(),

  // MVP: Add this field even though we only use one repo
  repoId: text('repo_id').notNull().default('current'),

  type: text('type').notNull(),
  content: text('content').notNull(),
  // ...
});

// Composite index for efficient filtering
// CREATE INDEX idx_repo_type ON memories(repo_id, type);

MUST support repo-scoped and global queries:

typescript
interface RecallQuery {
  context: string;

  // MVP: Always set to 'current', but interface supports multi-repo
  scope?: 'current' | 'global' | 'repo:specific-id';

  limit?: number;
}

class MemoryStore {
  async recall(query: RecallQuery): Promise<Memory[]> {
    const scope = query.scope || 'current';

    if (scope === 'global') {
      // Search across all repos
      return this.searchAllRepos(query);
    } else if (scope.startsWith('repo:')) {
      // Search specific repo
      const repoId = scope.split(':')[1];
      return this.searchRepo(repoId, query);
    } else {
      // Search current repo (MVP default)
      return this.searchRepo(this.currentRepoId, query);
    }
  }
}

4.2 Privacy Considerations

Pattern extraction vs code storage:

typescript
interface Memory {
  content: string; // High-level pattern, NOT actual code
  context: string; // When this applies

  // ❌ NEVER store actual code from other repos
  codeSnippet?: never;

  // ✅ GOOD: Store abstract patterns
  // Example: "When implementing auth, use JWT middleware pattern"
  // Example: "Timestamp columns should be TIMESTAMPTZ in Postgres"
}

Anonymization:

typescript
async function extractPattern(episode: Episode): Promise<Memory> {
  const pattern = await llm.chat({
    system: `Extract a GENERAL pattern from this execution.
      DO NOT include:
      - Specific variable names from this codebase
      - Business logic details
      - API keys or secrets

      DO include:
      - Architectural patterns
      - Language/framework best practices
      - Common pitfalls and solutions`,
    messages: [{ role: 'user', content: episode.events }]
  });

  return {
    type: 'semantic',
    content: pattern.content,
    context: pattern.applicableWhen,
    repoId: 'global', // This pattern is universal
  };
}

4.3 Portability Classification

Which patterns are portable?

typescript
interface PatternPortability {
  universal: [
    'error-handling-strategies',
    'testing-patterns',
    'async-await-best-practices',
  ],

  framework_specific: [
    'react-component-patterns',
    'nextjs-routing-patterns',
    'drizzle-migration-patterns',
  ],

  project_specific: [
    'this-api-authentication-flow',
    'this-database-schema-decisions',
    'this-deployment-process',
  ],
}

async function classifyPattern(memory: Memory): Promise<PortabilityLevel> {
  const classification = await llm.chat({
    system: 'Classify this pattern as universal, framework-specific, or project-specific',
    messages: [{ role: 'user', content: memory.content }]
  });

  return classification.level;
}

4.4 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Hardcode single-repo assumption
const memories = db.select().from(memories); // No repo filter

// ❌ BAD: Store repo-specific code as universal pattern
await memory.store({
  type: 'semantic',
  content: 'Use the UserService class from src/services/user.ts', // Too specific
  repoId: 'global' // Wrong!
});

Safe:

typescript
// ✅ GOOD: Always filter by repo
const memories = db.select()
  .from(memories)
  .where(eq(memories.repoId, currentRepoId));

// ✅ GOOD: Abstract patterns only
await memory.store({
  type: 'semantic',
  content: 'For user management, use a service layer to encapsulate business logic',
  repoId: 'global' // This is actually universal
});

4.5 Multi-Repo Memory Recall Strategy

typescript
async function recallWithFallback(query: RecallQuery): Promise<Memory[]> {
  // 1. Try current repo first (most relevant)
  const repoMemories = await recall({ ...query, scope: 'current', limit: 5 });

  // 2. If not enough, fall back to global
  if (repoMemories.length < 5) {
    const globalMemories = await recall({ ...query, scope: 'global', limit: 5 });
    return [...repoMemories, ...globalMemories].slice(0, query.limit);
  }

  return repoMemories;
}

4.6 Estimated Effort

Engineering time: 2-3 weeks Breaking changes: None if repoId field added in MVP Components affected:

Memory schema (add repoId column)
Recall queries (add repo filtering)
Pattern extraction (classify portability)
CLI (detect repo context)

4.7 Trigger to Build This

Indicators:

User works on >3 repos with similar tech stacks
Patterns from one repo would benefit another
User requests "use the same pattern as in repo X"

Milestone: After 100+ learnings in a single repo, indicating pattern extraction works well.

5. Autonomous Deployment

Current: Always require human approval for production deploys Future: Auto-deploy low-risk changes, human approval for high-risk

5.1 Automation Ladder State

MVP:

typescript
interface AutomationConfig {
  level: 0 | 1 | 2 | 3 | 4;

  // Level 0: Human does everything (not our target)
  // Level 1: AI suggests, human decides (MVP)
  // Level 2: AI acts, human reviews (post-MVP)
  // Level 3: AI acts, human notified (post-MVP)
  // Level 4: Full autonomy for low-risk (post-MVP)
}

// MVP: Always Level 1
const config = {
  automation: {
    level: 1,
    allowedActions: ['suggest', 'analyze'],
    requiredApprovals: ['deploy', 'merge', 'security-changes'],
  }
};

5.2 Earning Higher Automation Levels

Metrics required to advance:

typescript
interface AutomationMetrics {
  // To reach Level 2 (from 1):
  level2Requirements: {
    totalRuns: 50,                    // Minimum experience
    successRate: 0.95,                // 95% of runs succeed
    falsePositiveRate: 0.20,          // <20% review findings dismissed
    humanOverrideRate: 0.05,          // <5% of suggestions rejected
  },

  // To reach Level 3 (from 2):
  level3Requirements: {
    totalRuns: 200,
    successRate: 0.98,
    falsePositiveRate: 0.05,          // <5% false positives
    missedCriticalBugs: 0,            // Zero critical bugs missed
    averageHumanEditSize: 0.10,       // Edits are <10% of code
  },

  // To reach Level 4 (from 3):
  level4Requirements: {
    totalRuns: 500,
    successRate: 0.99,
    productionIncidents: 0,           // Zero incidents from auto-deploys
    rollbackRate: 0.01,               // <1% of deploys rolled back
    humanInterventionRate: 0.02,      // <2% require human help
  },
}

async function evaluateAutomationLevel(): Promise<AutomationLevel> {
  const metrics = await getSystemMetrics();

  if (meetsRequirements(metrics, level4Requirements)) {
    return 4;
  } else if (meetsRequirements(metrics, level3Requirements)) {
    return 3;
  } else if (meetsRequirements(metrics, level2Requirements)) {
    return 2;
  } else {
    return 1; // Stay conservative
  }
}

5.3 Risk-Based Deployment Strategy

typescript
interface DeploymentDecision {
  risk: RiskLevel;
  automation: AutomationLevel;

  // Decision matrix
  strategy: DeploymentStrategy;
}

function selectDeploymentStrategy(
  risk: RiskLevel,
  automation: AutomationLevel
): DeploymentStrategy {
  // At Level 1 (MVP): Everything requires approval
  if (automation === 1) {
    return { type: 'manual', requireApproval: true };
  }

  // At Level 2: Low-risk auto-deploys to staging
  if (automation === 2) {
    if (risk === 'low') {
      return { type: 'auto-staging', requireApproval: false };
    } else {
      return { type: 'manual', requireApproval: true };
    }
  }

  // At Level 3: Auto-deploy low/medium with notification
  if (automation === 3) {
    if (risk === 'low' || risk === 'medium') {
      return { type: 'auto-notify', requireApproval: false, notifyHuman: true };
    } else {
      return { type: 'manual', requireApproval: true };
    }
  }

  // At Level 4: Full autonomy for low-risk, gated for high-risk
  if (automation === 4) {
    if (risk === 'low') {
      return { type: 'fully-auto', requireApproval: false };
    } else if (risk === 'medium') {
      return { type: 'auto-notify', requireApproval: false, notifyHuman: true };
    } else {
      return { type: 'manual', requireApproval: true };
    }
  }
}

5.4 Canary Automation

Post-MVP: Auto-promote or auto-rollback based on health:

typescript
interface CanaryDeployment {
  stages: CanaryStage[];
  healthChecks: HealthCheck[];

  async deploy(artifact: Artifact): Promise<DeploymentResult> {
    for (const stage of this.stages) {
      // Deploy to stage (e.g., 5% of traffic)
      await this.deployToStage(stage, artifact);

      // Wait for bake time
      await this.wait(stage.bakeTime);

      // Check health
      const health = await this.checkHealth(stage);

      if (!health.healthy) {
        // Auto-rollback
        await this.rollback(stage);
        return { status: 'rolled-back', reason: health.issues };
      }

      // Auto-promote to next stage
    }

    return { status: 'deployed' };
  }
}

interface HealthCheck {
  metric: 'error_rate' | 'latency' | 'throughput';
  baseline: number;
  threshold: number; // e.g., 1.2x baseline

  async check(deployment: Deployment): Promise<boolean> {
    const current = await this.measure(deployment);
    return current < this.baseline * this.threshold;
  }
}

5.5 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Hardcode approval requirements
async function deploy(artifact: Artifact) {
  // This won't work when we add automation levels
  const approved = await requestHumanApproval();
  if (!approved) throw new Error('Not approved');
  await this.executeDeploy(artifact);
}

Safe:

typescript
// ✅ GOOD: Strategy-based deployment
async function deploy(artifact: Artifact, risk: RiskLevel) {
  const strategy = selectDeploymentStrategy(risk, config.automation.level);

  if (strategy.requireApproval) {
    const approved = await requestHumanApproval();
    if (!approved) throw new Error('Not approved');
  }

  if (strategy.notifyHuman) {
    this.notifyHuman('Deploying', artifact);
  }

  await this.executeDeploy(artifact, strategy);
}

5.6 Storing Automation State

typescript
// Add to schema.ts
export const automationState = sqliteTable('automation_state', {
  id: text('id').primaryKey(),
  currentLevel: integer('current_level').notNull().default(1),

  // Track metrics for level advancement
  totalRuns: integer('total_runs').notNull().default(0),
  successfulRuns: integer('successful_runs').notNull().default(0),
  falsePositives: integer('false_positives').notNull().default(0),
  criticalMisses: integer('critical_misses').notNull().default(0),

  // Last level evaluation
  lastEvaluated: integer('last_evaluated', { mode: 'timestamp_ms' }),
  levelAdvancedAt: integer('level_advanced_at', { mode: 'timestamp_ms' }),
});

5.7 Estimated Effort

Engineering time: 3-4 weeks Breaking changes: None Components affected:

Deployment strategy selector (new)
Automation metrics tracker (new)
Canary health checks (new)
Human notification system (extend)

5.8 Trigger to Build This

Indicators:

System achieves Level 2 requirements (50+ runs, 95% success rate)
Users request faster deploys for low-risk changes
Manual approval becomes a bottleneck

Milestone: Never auto-enable. Require explicit opt-in even after metrics are met.

6. Natural Language Requirements

Current: Structured task descriptions (e.g., "Add user authentication with JWT") Future: Plain English feature requests (e.g., "Users should be able to log in")

6.1 MVP Planner Agent Design

MVP: Assume well-formed requirements:

typescript
interface PlannerInput {
  task: string; // Specific, actionable: "Add JWT auth middleware"
  constraints?: string[];
  acceptanceCriteria?: string[];
}

async function plan(input: PlannerInput): Promise<ImplementationPlan> {
  // Assume task is clear and unambiguous
  const architecture = await this.designArchitecture(input.task);
  const tasks = await this.decompose(input.task, architecture);
  return { architecture, tasks };
}

6.2 Post-MVP: Ambiguity Handling

Clarification loop:

typescript
interface PlannerInput {
  task: string; // Vague: "Users should be able to log in"
  constraints?: string[];
  acceptanceCriteria?: string[];
}

async function planWithClarification(input: PlannerInput): Promise<ImplementationPlan> {
  // 1. Analyze for ambiguity
  const ambiguities = await this.detectAmbiguities(input.task);

  if (ambiguities.length > 0) {
    // 2. Ask clarifying questions
    const questions = ambiguities.map(a => a.question);
    const answers = await this.askHuman(questions);

    // 3. Refine requirements
    input.task = await this.refineRequirements(input.task, answers);
  }

  // 4. Proceed with planning
  return this.plan(input);
}

interface Ambiguity {
  aspect: string; // e.g., "authentication method"
  question: string; // "Should we use JWT, OAuth, or session cookies?"
  options: string[];
}

async function detectAmbiguities(task: string): Promise<Ambiguity[]> {
  const analysis = await llm.chat({
    system: `Analyze this task for ambiguities.
      Identify aspects that have multiple valid interpretations.
      Generate clarifying questions.`,
    messages: [{ role: 'user', content: task }]
  });

  return analysis.ambiguities;
}

6.3 Figma/Design Integration

Post-MVP: Visual requirements:

typescript
interface VisualRequirement {
  type: 'figma' | 'screenshot' | 'mockup';
  url?: string;
  image?: Buffer;
}

interface PlannerInput {
  task: string;
  visual?: VisualRequirement;
}

async function planWithVisual(input: PlannerInput): Promise<ImplementationPlan> {
  if (input.visual) {
    // Extract requirements from design
    const extracted = await this.extractFromVisual(input.visual);

    // Merge with text requirements
    input.task = this.mergeRequirements(input.task, extracted);
  }

  return this.plan(input);
}

async function extractFromVisual(visual: VisualRequirement): Promise<ExtractedRequirements> {
  // Use vision model (GPT-4V, Claude with vision, etc.)
  const analysis = await visionModel.analyze(visual.image, {
    prompt: `Extract UI requirements from this design:
      - Layout structure
      - Components needed
      - Interactions (buttons, forms, etc.)
      - Styling details`
  });

  return {
    components: analysis.components,
    layout: analysis.layout,
    interactions: analysis.interactions,
  };
}

6.4 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Assume requirements are always clear
async function plan(task: string) {
  // No validation — will produce garbage for vague input
  return this.decompose(task);
}

// ❌ BAD: Hardcode structured input format
interface PlannerInput {
  title: string;
  description: string;
  acceptanceCriteria: string[]; // Forces structured input
}

Safe:

typescript
// ✅ GOOD: Accept string, validate quality
async function plan(input: string | PlannerInput) {
  const task = typeof input === 'string' ? input : input.task;

  // Validate requirement quality
  const quality = await this.assessRequirementQuality(task);

  if (quality.score < 0.7) {
    // Request clarification
    const clarified = await this.requestClarification(task, quality.gaps);
    return this.plan(clarified);
  }

  return this.executePlanning(task);
}

6.5 Estimated Effort

Engineering time: 3-4 weeks Breaking changes: None (input type can be string or structured) Components affected:

Planner agent (add clarification loop)
Human interaction (add Q&A workflow)
Vision model integration (new)

6.6 Trigger to Build This

Indicators:

Users frequently provide vague requirements
Planner produces poor plans due to ambiguity
Users request Figma integration

Milestone: After 200+ successful plans from well-formed requirements.

7. Real-Time Dashboards

Current: CLI output with progress indicators Future: Web UI with live updates, cost tracking, memory browser

7.1 MVP CLI Requirements

MUST provide rich event stream:

typescript
// The events table already captures everything
// CLI just needs to subscribe and render

class CLIRenderer {
  async watchPipeline(traceId: string) {
    bus.on('*', (event: ForgeEvent) => {
      if (event.traceId === traceId) {
        this.render(event);
      }
    });
  }

  private render(event: ForgeEvent) {
    switch (event.type) {
      case 'phase.entered':
        console.log(`\n▶ Entering phase: ${event.payload.phase}`);
        break;
      case 'agent.iteration':
        process.stdout.write('.');
        break;
      case 'finding.detected':
        console.log(`  ⚠ ${event.payload.message}`);
        break;
      // ...
    }
  }
}

Event stream is the source of truth — CLI and dashboard consume the same data.

7.2 Post-MVP Web Dashboard Architecture

Tech stack (matches user's preference):

typescript
// Use TanStack Start since it's in the user's stack
forge-dashboard/
├── app/
│   ├── routes/
│   │   ├── index.tsx          // Dashboard home
│   │   ├── runs.$runId.tsx    // Run detail page
│   │   ├── memory.tsx         // Memory browser
│   │   └── metrics.tsx        // Cost & quality trends
│   ├── components/
│   │   ├── pipeline-viz.tsx   // Pipeline state visualization
│   │   ├── event-stream.tsx   // Live event feed
│   │   └── cost-chart.tsx     // Cost over time
│   └── lib/
│       └── api.ts             // API client for Forge backend

WebSocket streaming:

typescript
// Backend: Expose WebSocket endpoint
import { WebSocketServer } from 'ws';

const wss = new WebSocketServer({ port: 3001 });

wss.on('connection', (ws) => {
  // Subscribe client to event bus
  const unsubscribe = bus.on('*', (event) => {
    ws.send(JSON.stringify(event));
  });

  ws.on('close', () => {
    unsubscribe();
  });
});

// Frontend: Subscribe to events
import { useEffect, useState } from 'react';

function useEventStream(traceId: string) {
  const [events, setEvents] = useState<ForgeEvent[]>([]);

  useEffect(() => {
    const ws = new WebSocket('ws://localhost:3001');

    ws.onmessage = (msg) => {
      const event = JSON.parse(msg.data);
      if (event.traceId === traceId) {
        setEvents(prev => [...prev, event]);
      }
    };

    return () => ws.close();
  }, [traceId]);

  return events;
}

7.3 Dashboard Components

Pipeline Status:

typescript
function PipelineVisualization({ traceId }: { traceId: string }) {
  const events = useEventStream(traceId);
  const status = usePipelineStatus(events);

  return (
    <div className="pipeline">
      {['planning', 'implementation', 'review', 'testing', 'deployment'].map(phase => (
        <Phase
          key={phase}
          name={phase}
          status={status[phase]}
          events={events.filter(e => e.phase === phase)}
        />
      ))}
    </div>
  );
}

Cost Tracking:

typescript
function CostDashboard() {
  const { data: runs } = useQuery({
    queryKey: ['runs'],
    queryFn: () => fetch('/api/runs').then(r => r.json()),
  });

  const totalCost = runs?.reduce((sum, r) => sum + r.totalCostUsd, 0) || 0;
  const avgCostPerRun = totalCost / (runs?.length || 1);

  return (
    <div>
      <Metric label="Total Cost" value={`$${totalCost.toFixed(2)}`} />
      <Metric label="Avg per Run" value={`$${avgCostPerRun.toFixed(2)}`} />
      <CostChart data={runs} />
    </div>
  );
}

Memory Browser:

typescript
function MemoryBrowser() {
  const [query, setQuery] = useState('');
  const { data: memories } = useQuery({
    queryKey: ['memories', query],
    queryFn: () => fetch(`/api/memory/search?q=${query}`).then(r => r.json()),
    enabled: query.length > 0,
  });

  return (
    <div>
      <SearchInput value={query} onChange={setQuery} />
      <MemoryList memories={memories} />
    </div>
  );
}

7.4 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Render output directly in agent code
class PlannerAgent {
  async execute(input) {
    console.log('Starting planning...'); // Couples agent to CLI
  }
}

// ❌ BAD: Store UI state in agent
class PlannerAgent {
  private progressBar: ProgressBar; // UI concerns in agent
}

Safe:

typescript
// ✅ GOOD: Emit events, let consumers render
class PlannerAgent {
  async execute(input) {
    this.bus.emit({ type: 'planning.started', payload: { task: input.task } });
    // Agent doesn't know or care who's listening
  }
}

7.5 Estimated Effort

Engineering time: 2-3 weeks Breaking changes: None (agents already emit events) New components:

WebSocket server
TanStack Start dashboard app
Dashboard components

7.6 Trigger to Build This

Indicators:

Users want to monitor long-running pipelines remotely
Teams want shared visibility into agent activity
CLI output is insufficient for debugging

Milestone: After 100+ CLI users, or when first team adopts Forge.

8. ClickHouse / Kafka Migration

Current: SQLite events table, in-memory event bus Future: ClickHouse for analytics, Kafka for event streaming

8.1 Performance Thresholds

SQLite is fine until:

Events table exceeds 1M rows
Event insertion latency > 50ms (p95)
Analytics queries take > 5 seconds

When to migrate:

If event table grows > 10GB
If multiple processes need to share events (distributed deployment)
If real-time analytics are needed (dashboards querying events table)

8.2 MVP Event Bus Design

Already abstracted:

typescript
interface EventBus {
  emit(event: Omit<ForgeEvent, 'id' | 'timestamp'>): Promise<void>;
  on(type: string, handler: EventHandler): () => void;
  replay(traceId: string): Promise<ForgeEvent[]>;
}

This interface works for:

In-memory (MVP)
Redis Pub/Sub (K8s deployment)
Kafka (high-volume production)

8.3 Kafka Implementation

typescript
class KafkaEventBus implements EventBus {
  private kafka: Kafka;
  private producer: Producer;
  private consumer: Consumer;

  async emit(event: Omit<ForgeEvent, 'id' | 'timestamp'>) {
    const full: ForgeEvent = {
      ...event,
      id: ulid(),
      timestamp: new Date(),
    };

    await this.producer.send({
      topic: 'forge-events',
      messages: [{ key: full.traceId, value: JSON.stringify(full) }],
    });
  }

  on(type: string, handler: EventHandler) {
    this.consumer.subscribe({ topic: 'forge-events' });

    this.consumer.run({
      eachMessage: async ({ message }) => {
        const event = JSON.parse(message.value.toString());
        if (type === '*' || event.type === type) {
          handler(event);
        }
      },
    });

    return () => this.consumer.disconnect();
  }

  async replay(traceId: string): Promise<ForgeEvent[]> {
    // Query ClickHouse instead of SQLite
    const result = await this.clickhouse.query({
      query: 'SELECT * FROM events WHERE trace_id = ? ORDER BY timestamp',
      params: [traceId],
    });

    return result.json();
  }
}

8.4 ClickHouse Schema

sql
-- ClickHouse (optimized for analytics)
CREATE TABLE events (
  id String,
  trace_id String,
  timestamp DateTime64(3),
  source String,
  type String,
  phase String,
  payload String,  -- JSON
  tokens_used UInt32,
  cost_usd Decimal(10, 4),
  duration_ms UInt32,

  -- ClickHouse-specific optimizations
  date Date MATERIALIZED toDate(timestamp)
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (trace_id, timestamp);

-- Fast queries:
SELECT type, count(*) FROM events
WHERE date >= today() - 7
GROUP BY type;

SELECT avg(cost_usd), sum(tokens_used) FROM events
WHERE trace_id = '...' ;

8.5 Migration Path

Step 1: Dual-write

typescript
class DualWriteEventBus implements EventBus {
  constructor(
    private sqlite: SQLiteEventBus,
    private kafka: KafkaEventBus
  ) {}

  async emit(event) {
    await Promise.all([
      this.sqlite.emit(event),
      this.kafka.emit(event),
    ]);
  }

  // Gradually shift reads from SQLite → ClickHouse
  async replay(traceId: string) {
    try {
      return await this.kafka.replay(traceId); // Prefer new system
    } catch (err) {
      return await this.sqlite.replay(traceId); // Fallback to old
    }
  }
}

Step 2: Backfill historical events

typescript
async function backfillToClickHouse() {
  const events = await sqlite.query('SELECT * FROM events');

  for (const event of events) {
    await clickhouse.insert('events', event);
  }
}

Step 3: Single-write to Kafka/ClickHouse

8.6 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: SQL queries directly in agents
const events = await db.select().from(events).where(eq(events.traceId, traceId));

// ❌ BAD: Assuming events table is in same DB as memories
const result = await db.query(`
  SELECT e.*, m.content FROM events e
  JOIN memories m ON e.source = m.id
`); // Won't work if events move to ClickHouse

Safe:

typescript
// ✅ GOOD: Use EventBus interface
const events = await bus.replay(traceId);

// ✅ GOOD: Keep events and memories separate
const events = await bus.replay(traceId);
const memories = await memory.recall(query);

8.7 Estimated Effort

Engineering time: 3-4 weeks Breaking changes: None if EventBus interface is used consistently New infrastructure:

Kafka cluster
ClickHouse instance
Schema registry (for Kafka)

8.8 Trigger to Build This

Indicators:

Events table exceeds 1M rows
Analytics queries taking >5 seconds
Multiple Forge instances need shared event log

Milestone: When SQLite bottlenecks are measurable.

9. Extension Architecture

Current: Hardcoded agents, tools, providers Future: Plugin system for custom agents, tools, integrations

9.1 MVP Abstraction Boundaries

Already designed for extensibility:

typescript
// 1. Tool interface — anyone can implement a tool
interface Tool<TInput, TOutput> {
  name: string;
  description: string;
  schema: { input: ZodSchema<TInput>; output: ZodSchema<TOutput> };
  execute(input: TInput, ctx: ToolContext): Promise<TOutput>;
}

// 2. Agent interface — anyone can implement an agent
interface Agent {
  id: string;
  type: AgentType;
  execute(input: PhaseInput, ctx: AgentContext): Promise<PhaseOutput>;
}

// 3. LLM provider interface — swap providers
interface LLMProvider {
  chat(request: ChatRequest): Promise<ChatResponse>;
  embed(text: string): Promise<Float32Array>;
}

9.2 Plugin System Design

Plugin manifest:

typescript
// forge-plugin-jira/plugin.json
{
  "name": "forge-plugin-jira",
  "version": "1.0.0",
  "type": "tool",
  "entry": "./dist/index.js",
  "provides": {
    "tools": ["jira_create_issue", "jira_search"],
    "agents": [],
    "integrations": ["jira"]
  }
}

// forge-plugin-jira/src/index.ts
import { definePlugin, Tool } from '@forge/plugin-api';

export default definePlugin({
  tools: [
    {
      name: 'jira_create_issue',
      description: 'Create a Jira issue',
      schema: {
        input: z.object({
          project: z.string(),
          summary: z.string(),
          description: z.string(),
        }),
        output: z.object({
          issueKey: z.string(),
          url: z.string(),
        }),
      },
      async execute(input, ctx) {
        const jira = new JiraClient(ctx.config.jiraUrl);
        const issue = await jira.createIssue(input);
        return { issueKey: issue.key, url: issue.url };
      },
    },
  ],
});

Plugin loader:

typescript
class PluginManager {
  private plugins = new Map<string, Plugin>();

  async load(pluginPath: string): Promise<void> {
    // Read manifest
    const manifest = await import(`${pluginPath}/plugin.json`);

    // Load module
    const module = await import(manifest.entry);
    const plugin = module.default;

    // Register tools
    for (const tool of plugin.tools) {
      this.toolRegistry.register(tool);
    }

    // Register agents
    for (const agent of plugin.agents) {
      this.agentRegistry.register(agent);
    }

    this.plugins.set(manifest.name, plugin);
  }

  async loadAllPlugins() {
    const pluginDirs = await fs.readdir('./plugins');
    await Promise.all(pluginDirs.map(dir => this.load(`./plugins/${dir}`)));
  }
}

9.3 Custom Agent Registration

typescript
// forge-plugin-custom-reviewer/src/index.ts
import { definePlugin, Agent } from '@forge/plugin-api';

export default definePlugin({
  agents: [
    {
      id: 'custom-security-reviewer',
      type: 'reviewer',

      async execute(input, ctx) {
        // Custom security review logic
        const findings = await this.runSecurityChecks(input.code);

        return {
          approved: findings.length === 0,
          findings,
        };
      },
    },
  ],
});

// In forge.config.ts, user selects which reviewer to use
export default defineConfig({
  agents: {
    reviewer: 'custom-security-reviewer', // Instead of default
  },
});

9.4 Webhook System

For external integrations:

typescript
interface WebhookConfig {
  events: string[]; // Which event types to forward
  url: string;
  headers?: Record<string, string>;
  transform?: (event: ForgeEvent) => unknown; // Optional transformation
}

class WebhookManager {
  private webhooks: WebhookConfig[] = [];

  async init() {
    // Subscribe to all events
    bus.on('*', async (event) => {
      // Forward to matching webhooks
      const matching = this.webhooks.filter(w =>
        w.events.includes(event.type) || w.events.includes('*')
      );

      await Promise.all(matching.map(w => this.sendWebhook(w, event)));
    });
  }

  private async sendWebhook(webhook: WebhookConfig, event: ForgeEvent) {
    const payload = webhook.transform ? webhook.transform(event) : event;

    await fetch(webhook.url, {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
        ...webhook.headers,
      },
      body: JSON.stringify(payload),
    });
  }
}

// In forge.config.ts
export default defineConfig({
  webhooks: [
    {
      events: ['run.completed', 'deployment.success'],
      url: 'https://slack.com/api/webhook/...',
      transform: (event) => ({
        text: `Forge run completed: ${event.payload.task}`,
      }),
    },
  ],
});

9.5 API Surface for Third-Party Consumers

typescript
// Expose HTTP API for external consumers
import { Hono } from 'hono';

const app = new Hono();

// Start a run
app.post('/api/runs', async (c) => {
  const { task } = await c.req.json();
  const traceId = await orchestrator.startRun(task);
  return c.json({ traceId });
});

// Get run status
app.get('/api/runs/:traceId', async (c) => {
  const { traceId } = c.req.param();
  const run = await db.select().from(runs).where(eq(runs.id, traceId));
  return c.json(run);
});

// Get run events
app.get('/api/runs/:traceId/events', async (c) => {
  const { traceId } = c.req.param();
  const events = await bus.replay(traceId);
  return c.json(events);
});

// Get memories
app.get('/api/memory', async (c) => {
  const { q } = c.req.query();
  const memories = await memory.recall({ context: q });
  return c.json(memories);
});

9.6 MVP Must NOT Do

Forbidden:

typescript
// ❌ BAD: Hardcode tool list
const tools = [gitTool, lintTool, testTool]; // Can't add plugins

// ❌ BAD: Tight coupling to specific implementations
import { PlannerAgent } from './agents/planner';
const planner = new PlannerAgent(); // Can't swap

Safe:

typescript
// ✅ GOOD: Registry-based tool discovery
const tools = toolRegistry.getAll();

// ✅ GOOD: Agent factory
const planner = agentRegistry.get(config.agents.planner);

9.7 Estimated Effort

Engineering time: 3-4 weeks Breaking changes: None if registries already exist Components:

Plugin loader
Plugin API types
Webhook manager
HTTP API server

9.8 Trigger to Build This

Indicators:

Users request custom tools (e.g., Jira, Linear, Notion)
Users want to customize agents
Third-party tools want to integrate with Forge

Milestone: After core agents/tools are stable (200+ runs).

10. Seams and Abstractions

This section catalogs every abstraction boundary in the MVP and what it future-proofs.

10.1 Tool Abstraction

Interface:

typescript
interface Tool<TInput, TOutput> {
  name: string;
  description: string;
  schema: { input: ZodSchema<TInput>; output: ZodSchema<TOutput> };
  execute(input: TInput, ctx: ToolContext): Promise<TOutput>;
}

Future-proofs:

Plugin tools (9. Extension Architecture)
Remote tools (tools run in separate processes/containers)
Tool versioning (same name, different versions)
Tool authentication (ctx provides credentials)

Must NOT:

Return non-serializable objects (e.g., class instances) — breaks remote tools
Mutate global state — breaks parallel execution
Assume local filesystem — breaks containerization

10.2 LLM Provider Abstraction

Interface:

typescript
interface LLMProvider {
  chat(request: ChatRequest): Promise<ChatResponse>;
  embed(text: string): Promise<Float32Array>;
}

Future-proofs:

Swap Claude → OpenAI → Ollama → custom model
Multi-provider routing (cheap model for simple tasks, expensive for complex)
Provider fallback (if primary fails, try secondary)
Local model support (Ollama, LlamaCPP)

Must NOT:

Use provider-specific features in prompts (e.g., Claude-only XML tags) — breaks portability
Hardcode model names in agents — use config

10.3 Memory Store Abstraction

Interface:

typescript
interface MemoryStore {
  store(memory: Memory): Promise<void>;
  recall(query: RecallQuery): Promise<Memory[]>;
  update(id: string, updates: Partial<Memory>): Promise<void>;
  consolidate(): Promise<void>;
}

Future-proofs:

SQLite → PostgreSQL (2. Kubernetes)
Brute-force → Vector DB (3. Vector Database)
Single-repo → Multi-repo (4. Multi-Repo Intelligence)
Local → Shared (team memory)

Must NOT:

Expose SQL queries to agents — breaks DB independence
Return DB-specific types (e.g., Drizzle objects) — breaks when we swap DBs

10.4 Event Bus Abstraction

Interface:

typescript
interface EventBus {
  emit(event: Omit<ForgeEvent, 'id' | 'timestamp'>): Promise<void>;
  on(type: string, handler: EventHandler): () => void;
  replay(traceId: string): Promise<ForgeEvent[]>;
}

Future-proofs:

In-memory → Redis → Kafka (8. ClickHouse/Kafka)
Single-process → Distributed (2. Kubernetes)
Synchronous → Asynchronous (buffered writes)
Local → Remote (WebSocket streaming for dashboards)

Must NOT:

Assume synchronous delivery — events might be buffered
Mutate events after emitting — they might be serialized
Store non-serializable payloads — breaks remote bus

10.5 Agent Abstraction

Interface:

typescript
interface Agent {
  id: string;
  type: AgentType;
  execute(input: PhaseInput, ctx: AgentContext): Promise<PhaseOutput>;
}

Future-proofs:

Sequential → Parallel (1. Parallel Execution)
Single implementer → Multiple (swarm development)
Local → Remote (agents run in K8s Jobs)
Default agents → Custom agents (9. Extension Architecture)

Must NOT:

Share mutable state between agents — breaks parallelism
Assume execute() runs in same process as orchestrator — breaks remote execution
Store instance state across execute() calls — breaks stateless scaling

10.6 Safety Controls Abstraction

Interface:

typescript
interface SafetyControls {
  check(state: ExecutionState): BreakerResult;
  configure(config: SafetyConfig): void;
}

interface BreakerResult {
  shouldBreak: boolean;
  reason?: string;
  breaker?: string;
}

Future-proofs:

Static thresholds → Dynamic thresholds (learned from history)
Hardcoded breakers → Configurable breakers
Phase-level → Agent-level (parallel agents have independent budgets)

Must NOT:

Hardcode threshold values in breakers — use config
Assume global state — each agent should have isolated breaker state

10.7 Checkpoint Abstraction

Interface:

typescript
interface CheckpointSystem {
  save(phase: PhaseName, state: State): Promise<Checkpoint>;
  restore(checkpointId: string): Promise<State>;
  list(traceId: string): Promise<Checkpoint[]>;
}

Future-proofs:

Phase-level → Agent-level (parallel agents)
Local files → Remote storage (S3, DB)
Manual resume → Auto-resume (on crash)

Must NOT:

Store non-serializable state — breaks restore in different process
Assume checkpoints are local files — breaks distributed deployment

10.8 Human Gate Abstraction

Interface:

typescript
interface HumanGate {
  id: string;
  condition: (ctx: Context) => boolean;
  request(ctx: Context): Promise<Approval>;
}

interface Approval {
  approved: boolean;
  reason?: string;
  modifications?: unknown;
}

Future-proofs:

CLI approval → Web UI approval (7. Dashboards)
Single approver → Multi-level approval (per topic 10)
Synchronous → Asynchronous (approval can come hours later)
Manual → Conditional automation (5. Autonomous Deployment)

Must NOT:

Assume approver is available immediately — add timeouts
Block forever on approval — escalate after timeout

10.9 Configuration Abstraction

Interface:

typescript
interface ForgeConfig {
  llm: LLMConfig;
  tools: ToolConfig;
  safety: SafetyConfig;
  memory: MemoryConfig;
  github?: GitHubConfig;
  plugins?: PluginConfig[];
}

Future-proofs:

File-based → Environment-based (12-factor app)
Per-project → Per-user → Per-team
Static → Dynamic (config can change without restart)

Must NOT:

Hardcode defaults in code — use config file
Require config values that might not exist — provide defaults

10.10 Observability Abstraction

Events as audit trail:

Every decision logged with rationale
Every action attributed to agent
Every cost tracked per execution

Future-proofs:

CLI rendering → Dashboard rendering (same events)
SQLite → ClickHouse (same event schema)
Local analysis → Remote analytics

Must NOT:

Log events with ad-hoc formats — use typed events
Store logs separately from events — events ARE the logs

Summary Table: Deferred Features

Feature	MVP Must Do	MVP Must NOT Do	Effort	Trigger
1. Parallel Agents	Stateless agents, immutable context	Shared mutable state	2-3 weeks	>10min implementation phase
2. Kubernetes	Env-driven config, serializable state	Hardcoded paths, singletons	4-6 weeks	Team usage, shared memory
3. Vector DB	Abstract similarity behind interface	Expose DB-specific queries	1-2 weeks	>50K memories, >500ms recall
4. Multi-Repo	Namespace memories by `repoId`	Store repo-specific code globally	2-3 weeks	>3 repos with similar stacks
5. Autonomous Deploy	Strategy-based deployment	Hardcode approval requirements	3-4 weeks	Level 2 metrics achieved
6. Natural Language	Accept string input, validate quality	Assume structured input	3-4 weeks	Frequent vague requirements
7. Dashboards	Rich event stream	Render in agent code	2-3 weeks	Team adoption, remote monitoring
8. ClickHouse/Kafka	EventBus interface abstraction	SQL queries in agents	3-4 weeks	>1M events, >5s queries
9. Extensions	Tool/Agent registries	Hardcode tool lists	3-4 weeks	Users request custom tools
10. Seams	All interfaces abstract, serializable	Non-serializable state, DB exposure	N/A	Foundation for all above

Conclusion

The MVP is deliberately constrained — sequential, single-process, human-gated. But every interface is designed so that upgrading to parallel, distributed, autonomous execution requires NO breaking changes to calling code. We swap implementations, not interfaces.

Key principle: If an agent can't tell the difference between local and remote, sync and async, single and parallel, then the abstraction is correct.

This plan ensures we can ship fast (8 weeks to MVP) while keeping the door wide open for the future (parallel agents, K8s, autonomous deployment, etc.) without rewriting the core.