31 min
ai
February 8, 2026

LLM Provider Abstraction — Implementation Plan

LLM Provider Abstraction — Implementation Plan

Section 12 of Forge System Design

Overview

This plan details the implementation of a provider-agnostic LLM abstraction layer for Forge. The system must support multiple LLM providers (Anthropic Claude, OpenAI GPT, local Ollama models), handle tool calling workflows, implement cost tracking, and provide resilience through retry/fallback mechanisms.

Design Principles:

  • Provider-agnostic: Swap LLMs without changing agent code
  • Cost-aware: Track and optimize spending per operation
  • Resilient: Handle failures gracefully with retries and fallbacks
  • Observable: Full visibility into LLM usage and costs

1. LLMProvider Interface

The core abstraction that all providers implement.

1.1 Core Interface

typescript
// ─── src/tools/llm.ts ───────────────────────────────────────── import { z } from 'zod'; /** * Provider-agnostic LLM interface. * All LLM providers (Anthropic, OpenAI, Ollama) implement this. */ interface LLMProvider { /** * Chat completion with optional tool calling. * This is the main interface for agent reasoning. */ chat(request: ChatRequest): Promise<ChatResponse>; /** * Generate embeddings for semantic search. * Used by the memory system for similarity search. */ embed(text: string): Promise<Float32Array>; /** * Provider metadata */ readonly provider: 'anthropic' | 'openai' | 'ollama'; readonly model: string; } /** * Request format for chat completion. * Normalized across all providers. */ interface ChatRequest { /** System prompt - sets agent behavior and context */ system: string; /** Conversation history */ messages: Message[]; /** Available tools the LLM can call (optional) */ tools?: ToolSchema[]; /** Temperature (0-1): higher = more creative, lower = more deterministic */ temperature?: number; // default: 0.7 /** Maximum tokens to generate */ maxTokens?: number; // default: 4096 /** Enable streaming responses (optional) */ stream?: boolean; // default: false /** Provider-specific options (escape hatch for advanced features) */ providerOptions?: Record<string, unknown>; } /** * Message types in the conversation. */ type Message = | SystemMessage | UserMessage | AssistantMessage | ToolResultMessage; interface SystemMessage { role: 'system'; content: string; } interface UserMessage { role: 'user'; content: string; } interface AssistantMessage { role: 'assistant'; content: string; toolCalls?: ToolCall[]; // If assistant wants to use tools } interface ToolResultMessage { role: 'tool_result'; toolCallId: string; // Links back to ToolCall content: string; // Result from tool execution isError?: boolean; // Did tool fail? } /** * Tool call from LLM. * The LLM decided to invoke a tool with these parameters. */ interface ToolCall { id: string; // Unique ID for this call name: string; // Tool name input: Record<string, unknown>; // Parameters as JSON } /** * Tool schema provided to LLM. * Tells LLM what tools are available and how to use them. */ interface ToolSchema { name: string; description: string; // Clear description of what tool does inputSchema: { // JSON Schema for parameters type: 'object'; properties: Record<string, unknown>; required?: string[]; }; } /** * Response from chat completion. */ interface ChatResponse { /** Text response from LLM */ content: string; /** Tool calls requested by LLM (if any) */ toolCalls?: ToolCall[]; /** Is the agent done reasoning? */ done: boolean; /** Final result (if done=true and agent returned structured data) */ result?: unknown; /** Token usage for this call */ usage: { promptTokens: number; completionTokens: number; totalTokens: number; }; /** Cost in USD */ cost: number; /** Model used (may differ from requested if fallback occurred) */ model: string; /** Provider that fulfilled this request */ provider: string; }

1.2 Zod Schemas

For runtime validation of LLM inputs/outputs.

typescript
// ─── src/tools/llm-schemas.ts ───────────────────────────────── const MessageSchema = z.discriminatedUnion('role', [ z.object({ role: z.literal('system'), content: z.string(), }), z.object({ role: z.literal('user'), content: z.string(), }), z.object({ role: z.literal('assistant'), content: z.string(), toolCalls: z.array(z.object({ id: z.string(), name: z.string(), input: z.record(z.unknown()), })).optional(), }), z.object({ role: z.literal('tool_result'), toolCallId: z.string(), content: z.string(), isError: z.boolean().optional(), }), ]); const ChatRequestSchema = z.object({ system: z.string(), messages: z.array(MessageSchema), tools: z.array(z.object({ name: z.string(), description: z.string(), inputSchema: z.object({ type: z.literal('object'), properties: z.record(z.unknown()), required: z.array(z.string()).optional(), }), })).optional(), temperature: z.number().min(0).max(1).optional(), maxTokens: z.number().int().positive().optional(), stream: z.boolean().optional(), providerOptions: z.record(z.unknown()).optional(), }); const ChatResponseSchema = z.object({ content: z.string(), toolCalls: z.array(z.object({ id: z.string(), name: z.string(), input: z.record(z.unknown()), })).optional(), done: z.boolean(), result: z.unknown().optional(), usage: z.object({ promptTokens: z.number(), completionTokens: z.number(), totalTokens: z.number(), }), cost: z.number(), model: z.string(), provider: z.string(), });

2. Anthropic Provider

Primary provider for Forge. Claude models excel at reasoning and code generation.

2.1 Implementation

typescript
// ─── src/tools/llm-anthropic.ts ─────────────────────────────── import Anthropic from '@anthropic-ai/sdk'; import type { LLMProvider, ChatRequest, ChatResponse, ToolSchema, ToolCall } from './llm.ts'; export class AnthropicProvider implements LLMProvider { readonly provider = 'anthropic' as const; readonly model: string; private client: Anthropic; private pricing: ModelPricing; constructor( apiKey: string, model: string = 'claude-sonnet-4-5-20250929' ) { this.client = new Anthropic({ apiKey }); this.model = model; this.pricing = MODEL_PRICING[model] || MODEL_PRICING['claude-sonnet-4-5-20250929']; } async chat(request: ChatRequest): Promise<ChatResponse> { // Convert our normalized format to Anthropic's format const anthropicRequest = this.toAnthropicFormat(request); try { const response = await this.client.messages.create(anthropicRequest); return this.fromAnthropicFormat(response); } catch (error) { throw this.handleError(error); } } async embed(text: string): Promise<Float32Array> { // Anthropic doesn't provide embeddings API // Fall back to a local model or third-party service throw new Error('Anthropic does not support embeddings. Use OpenAI or local model.'); } /** * Convert our format to Anthropic's Messages API format */ private toAnthropicFormat(request: ChatRequest): Anthropic.MessageCreateParams { return { model: this.model, max_tokens: request.maxTokens ?? 4096, temperature: request.temperature ?? 0.7, system: request.system, messages: request.messages.map(msg => { switch (msg.role) { case 'user': return { role: 'user', content: msg.content }; case 'assistant': // Assistant message with optional tool calls if (msg.toolCalls && msg.toolCalls.length > 0) { return { role: 'assistant', content: [ { type: 'text', text: msg.content }, ...msg.toolCalls.map(tc => ({ type: 'tool_use' as const, id: tc.id, name: tc.name, input: tc.input, })), ], }; } return { role: 'assistant', content: msg.content }; case 'tool_result': return { role: 'user', content: [ { type: 'tool_result' as const, tool_use_id: msg.toolCallId, content: msg.content, is_error: msg.isError, }, ], }; default: throw new Error(`Unsupported message role: ${(msg as any).role}`); } }), tools: request.tools?.map(tool => ({ name: tool.name, description: tool.description, input_schema: tool.inputSchema, })), }; } /** * Convert Anthropic's response to our normalized format */ private fromAnthropicFormat(response: Anthropic.Message): ChatResponse { // Extract text content const textBlocks = response.content.filter( (block): block is Anthropic.TextBlock => block.type === 'text' ); const content = textBlocks.map(b => b.text).join('\n'); // Extract tool calls const toolUseBlocks = response.content.filter( (block): block is Anthropic.ToolUseBlock => block.type === 'tool_use' ); const toolCalls: ToolCall[] = toolUseBlocks.map(block => ({ id: block.id, name: block.name, input: block.input as Record<string, unknown>, })); // Determine if agent is done // Agent is done if stop_reason is 'end_turn' and no tool calls const done = response.stop_reason === 'end_turn' && toolCalls.length === 0; // Calculate cost const cost = this.calculateCost( response.usage.input_tokens, response.usage.output_tokens ); return { content, toolCalls: toolCalls.length > 0 ? toolCalls : undefined, done, result: done ? this.extractResult(content) : undefined, usage: { promptTokens: response.usage.input_tokens, completionTokens: response.usage.output_tokens, totalTokens: response.usage.input_tokens + response.usage.output_tokens, }, cost, model: response.model, provider: 'anthropic', }; } /** * Calculate cost in USD based on token usage */ private calculateCost(inputTokens: number, outputTokens: number): number { const inputCost = (inputTokens / 1_000_000) * this.pricing.inputPer1M; const outputCost = (outputTokens / 1_000_000) * this.pricing.outputPer1M; return inputCost + outputCost; } /** * Extract structured result from final response */ private extractResult(content: string): unknown { // Try to parse JSON if content looks like JSON const trimmed = content.trim(); if (trimmed.startsWith('{') && trimmed.endsWith('}')) { try { return JSON.parse(trimmed); } catch { // Not JSON, return as string } } return content; } /** * Handle Anthropic-specific errors */ private handleError(error: unknown): Error { if (error instanceof Anthropic.APIError) { if (error.status === 429) { return new RateLimitError( 'Anthropic rate limit exceeded', error.headers?.['retry-after'] ); } if (error.status === 400) { return new InvalidRequestError( `Anthropic API error: ${error.message}` ); } if (error.status >= 500) { return new ProviderError( `Anthropic server error: ${error.message}`, true // isRetryable ); } } return error as Error; } } /** * Model pricing table (as of Feb 2026) * Source: https://www.anthropic.com/pricing */ const MODEL_PRICING: Record<string, ModelPricing> = { 'claude-opus-4-6': { inputPer1M: 15.00, outputPer1M: 75.00, }, 'claude-sonnet-4-5-20250929': { inputPer1M: 3.00, outputPer1M: 15.00, }, 'claude-haiku-4-5-20251001': { inputPer1M: 0.25, outputPer1M: 1.25, }, }; interface ModelPricing { inputPer1M: number; // USD per 1M input tokens outputPer1M: number; // USD per 1M output tokens }

2.2 Streaming Support

For long-running operations where we want incremental output.

typescript
// ─── src/tools/llm-anthropic-stream.ts ──────────────────────── export class AnthropicProvider { async chatStream(request: ChatRequest): Promise<AsyncIterator<ChatChunk>> { const anthropicRequest = { ...this.toAnthropicFormat(request), stream: true, }; const stream = await this.client.messages.create(anthropicRequest); return this.streamToChunks(stream); } private async *streamToChunks( stream: Anthropic.MessageStream ): AsyncIterator<ChatChunk> { let accumulatedText = ''; let accumulatedToolCalls: ToolCall[] = []; for await (const event of stream) { if (event.type === 'content_block_delta') { if (event.delta.type === 'text_delta') { accumulatedText += event.delta.text; yield { type: 'text', content: event.delta.text, accumulated: accumulatedText, }; } if (event.delta.type === 'input_json_delta') { // Tool call in progress yield { type: 'tool_call_delta', delta: event.delta.partial_json, }; } } if (event.type === 'message_stop') { // Final event yield { type: 'done', usage: event.message.usage, cost: this.calculateCost( event.message.usage.input_tokens, event.message.usage.output_tokens ), }; } } } } interface ChatChunk { type: 'text' | 'tool_call_delta' | 'done'; content?: string; accumulated?: string; delta?: string; usage?: { input_tokens: number; output_tokens: number }; cost?: number; }

3. OpenAI Provider

Secondary provider for diversity and fallback.

3.1 Implementation

typescript
// ─── src/tools/llm-openai.ts ────────────────────────────────── import OpenAI from 'openai'; import type { LLMProvider, ChatRequest, ChatResponse } from './llm.ts'; export class OpenAIProvider implements LLMProvider { readonly provider = 'openai' as const; readonly model: string; private client: OpenAI; private pricing: ModelPricing; constructor( apiKey: string, model: string = 'gpt-4o' ) { this.client = new OpenAI({ apiKey }); this.model = model; this.pricing = OPENAI_PRICING[model] || OPENAI_PRICING['gpt-4o']; } async chat(request: ChatRequest): Promise<ChatResponse> { const openaiRequest = this.toOpenAIFormat(request); try { const response = await this.client.chat.completions.create(openaiRequest); return this.fromOpenAIFormat(response); } catch (error) { throw this.handleError(error); } } async embed(text: string): Promise<Float32Array> { const response = await this.client.embeddings.create({ model: 'text-embedding-3-small', input: text, }); return new Float32Array(response.data[0].embedding); } private toOpenAIFormat( request: ChatRequest ): OpenAI.Chat.ChatCompletionCreateParams { return { model: this.model, max_tokens: request.maxTokens ?? 4096, temperature: request.temperature ?? 0.7, messages: [ // OpenAI doesn't have separate system parameter, add as first message { role: 'system', content: request.system }, ...request.messages.map(msg => { switch (msg.role) { case 'user': return { role: 'user' as const, content: msg.content }; case 'assistant': if (msg.toolCalls && msg.toolCalls.length > 0) { return { role: 'assistant' as const, content: msg.content, tool_calls: msg.toolCalls.map(tc => ({ id: tc.id, type: 'function' as const, function: { name: tc.name, arguments: JSON.stringify(tc.input), }, })), }; } return { role: 'assistant' as const, content: msg.content }; case 'tool_result': return { role: 'tool' as const, tool_call_id: msg.toolCallId, content: msg.content, }; default: throw new Error(`Unsupported message role: ${(msg as any).role}`); } }), ], tools: request.tools?.map(tool => ({ type: 'function' as const, function: { name: tool.name, description: tool.description, parameters: tool.inputSchema, }, })), }; } private fromOpenAIFormat( response: OpenAI.Chat.ChatCompletion ): ChatResponse { const message = response.choices[0].message; const content = message.content || ''; // Convert function calls to tool calls const toolCalls = message.tool_calls?.map(tc => ({ id: tc.id, name: tc.function.name, input: JSON.parse(tc.function.arguments) as Record<string, unknown>, })); const done = response.choices[0].finish_reason === 'stop'; const cost = this.calculateCost( response.usage?.prompt_tokens ?? 0, response.usage?.completion_tokens ?? 0 ); return { content, toolCalls, done, result: done ? this.extractResult(content) : undefined, usage: { promptTokens: response.usage?.prompt_tokens ?? 0, completionTokens: response.usage?.completion_tokens ?? 0, totalTokens: response.usage?.total_tokens ?? 0, }, cost, model: response.model, provider: 'openai', }; } private calculateCost(inputTokens: number, outputTokens: number): number { const inputCost = (inputTokens / 1_000_000) * this.pricing.inputPer1M; const outputCost = (outputTokens / 1_000_000) * this.pricing.outputPer1M; return inputCost + outputCost; } private extractResult(content: string): unknown { const trimmed = content.trim(); if (trimmed.startsWith('{') && trimmed.endsWith('}')) { try { return JSON.parse(trimmed); } catch {} } return content; } private handleError(error: unknown): Error { if (error instanceof OpenAI.APIError) { if (error.status === 429) { return new RateLimitError('OpenAI rate limit exceeded'); } if (error.status === 400) { return new InvalidRequestError(`OpenAI API error: ${error.message}`); } if (error.status >= 500) { return new ProviderError(`OpenAI server error: ${error.message}`, true); } } return error as Error; } } const OPENAI_PRICING: Record<string, ModelPricing> = { 'gpt-4o': { inputPer1M: 2.50, outputPer1M: 10.00, }, 'gpt-4o-mini': { inputPer1M: 0.15, outputPer1M: 0.60, }, };

4. Ollama Provider

Local model support for privacy, cost savings, and offline capability.

4.1 Implementation

typescript
// ─── src/tools/llm-ollama.ts ────────────────────────────────── import type { LLMProvider, ChatRequest, ChatResponse } from './llm.ts'; export class OllamaProvider implements LLMProvider { readonly provider = 'ollama' as const; readonly model: string; private baseURL: string; constructor( model: string = 'llama3.1:8b', baseURL: string = 'http://localhost:11434' ) { this.model = model; this.baseURL = baseURL; } async chat(request: ChatRequest): Promise<ChatResponse> { const ollamaRequest = this.toOllamaFormat(request); const response = await fetch(`${this.baseURL}/api/chat`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(ollamaRequest), }); if (!response.ok) { throw new ProviderError( `Ollama error: ${response.statusText}`, response.status >= 500 ); } const data = await response.json(); return this.fromOllamaFormat(data); } async embed(text: string): Promise<Float32Array> { const response = await fetch(`${this.baseURL}/api/embeddings`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: this.model, prompt: text, }), }); const data = await response.json(); return new Float32Array(data.embedding); } /** * Ensure model is pulled and available */ async ensureModel(): Promise<void> { const response = await fetch(`${this.baseURL}/api/tags`); const data = await response.json(); const available = data.models.some( (m: { name: string }) => m.name === this.model ); if (!available) { // Pull the model await this.pullModel(); } } private async pullModel(): Promise<void> { const response = await fetch(`${this.baseURL}/api/pull`, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ name: this.model }), }); // Stream the pull progress (simplified) const reader = response.body?.getReader(); if (!reader) throw new Error('Failed to pull model'); while (true) { const { done } = await reader.read(); if (done) break; } } private toOllamaFormat(request: ChatRequest): OllamaRequest { return { model: this.model, messages: [ { role: 'system', content: request.system }, ...request.messages.map(msg => ({ role: msg.role === 'tool_result' ? 'user' : msg.role, content: msg.role === 'tool_result' ? `Tool result: ${msg.content}` : msg.content, })), ], options: { temperature: request.temperature ?? 0.7, num_predict: request.maxTokens ?? 4096, }, // Ollama doesn't have native tool support yet // We'd need to implement tool calling via prompt engineering stream: false, }; } private fromOllamaFormat(response: OllamaResponse): ChatResponse { return { content: response.message.content, toolCalls: undefined, // Ollama doesn't support native tool calling done: response.done, result: response.done ? this.extractResult(response.message.content) : undefined, usage: { promptTokens: response.prompt_eval_count ?? 0, completionTokens: response.eval_count ?? 0, totalTokens: (response.prompt_eval_count ?? 0) + (response.eval_count ?? 0), }, cost: 0, // Local models are free model: response.model, provider: 'ollama', }; } private extractResult(content: string): unknown { const trimmed = content.trim(); if (trimmed.startsWith('{') && trimmed.endsWith('}')) { try { return JSON.parse(trimmed); } catch {} } return content; } } interface OllamaRequest { model: string; messages: { role: string; content: string }[]; options?: { temperature?: number; num_predict?: number; }; stream: boolean; } interface OllamaResponse { model: string; message: { role: string; content: string }; done: boolean; prompt_eval_count?: number; eval_count?: number; }

4.2 When to Use Local vs Cloud

typescript
// ─── src/tools/llm-selector.ts ──────────────────────────────── /** * Decision matrix for choosing local vs cloud models */ interface ModelSelectionCriteria { // Use local (Ollama) when: local: { privacySensitive: boolean; // Data cannot leave premises costBudgetExhausted: boolean; // Hit API cost limits offlineRequired: boolean; // No internet connection latencyTolerant: boolean; // Can accept slower inference }; // Use cloud when: cloud: { needsAdvancedReasoning: boolean; // Complex planning/architecture needsToolCalling: boolean; // Native tool support required needsSpeed: boolean; // Fast response critical needsReliability: boolean; // Production workload }; } function selectProvider(criteria: ModelSelectionCriteria): 'ollama' | 'anthropic' | 'openai' { // Privacy always trumps other concerns if (criteria.local.privacySensitive) return 'ollama'; // Offline requirement forces local if (criteria.local.offlineRequired) return 'ollama'; // Tool calling needs cloud (for now) if (criteria.cloud.needsToolCalling) return 'anthropic'; // Advanced reasoning needs strong model if (criteria.cloud.needsAdvancedReasoning) return 'anthropic'; // Cost exhausted but still need to work if (criteria.local.costBudgetExhausted) return 'ollama'; // Default to best cloud provider return 'anthropic'; }

5. Tool Use Protocol

How the tool calling loop actually works.

5.1 Tool Schema Conversion

typescript
// ─── src/tools/tool-converter.ts ────────────────────────────── import type { Tool } from '../core/types.ts'; import type { ToolSchema as LLMToolSchema } from './llm.ts'; import { zodToJsonSchema } from 'zod-to-json-schema'; /** * Convert Forge Tool to LLM provider tool schema */ export function toolToLLMSchema(tool: Tool): LLMToolSchema { return { name: tool.name, description: tool.description, inputSchema: { type: 'object', properties: zodToJsonSchema(tool.schema.input).properties ?? {}, required: zodToJsonSchema(tool.schema.input).required ?? [], }, }; } /** * Example: File read tool */ const readFileTool: Tool = { name: 'read_file', description: 'Read contents of a file from the filesystem', schema: { input: z.object({ path: z.string().describe('Absolute file path to read'), encoding: z.enum(['utf-8', 'binary']).default('utf-8'), }), output: z.object({ content: z.string(), size: z.number(), mtime: z.date(), }), }, execute: async (input, ctx) => { const content = await fs.readFile(input.path, input.encoding); const stats = await fs.stat(input.path); return { content, size: stats.size, mtime: stats.mtime, }; }, }; // Converts to LLM schema: const llmSchema = toolToLLMSchema(readFileTool); // { // name: 'read_file', // description: 'Read contents of a file from the filesystem', // inputSchema: { // type: 'object', // properties: { // path: { type: 'string', description: 'Absolute file path to read' }, // encoding: { type: 'string', enum: ['utf-8', 'binary'], default: 'utf-8' } // }, // required: ['path'] // } // }

5.2 Tool Call Dispatch Loop

typescript
// ─── src/agents/tool-dispatcher.ts ──────────────────────────── import type { LLMProvider, ChatRequest, Message, ToolCall } from '../tools/llm.ts'; import type { Tool, ToolContext } from '../core/types.ts'; /** * Execute the tool calling loop: * 1. Agent reasons and requests tool call * 2. We execute the tool * 3. Feed result back to agent * 4. Agent reasons again (loop until done) */ export class ToolDispatcher { constructor( private llm: LLMProvider, private tools: Tool[] ) {} async runWithTools( request: ChatRequest, context: ToolContext, maxIterations: number = 10 ): Promise<ChatResponse> { const messages: Message[] = [...request.messages]; let iteration = 0; while (iteration < maxIterations) { iteration++; // Ask LLM what to do next const response = await this.llm.chat({ ...request, messages, }); // If done, return final result if (response.done || !response.toolCalls || response.toolCalls.length === 0) { return response; } // Execute all tool calls (potentially in parallel) const toolResults = await this.executeToolCalls( response.toolCalls, context ); // Add assistant's tool call request to history messages.push({ role: 'assistant', content: response.content, toolCalls: response.toolCalls, }); // Add tool results to history for (const result of toolResults) { messages.push({ role: 'tool_result', toolCallId: result.callId, content: result.content, isError: result.error, }); } // Loop back - LLM will reason about tool results } throw new Error(`Tool loop exceeded max iterations (${maxIterations})`); } private async executeToolCalls( calls: ToolCall[], context: ToolContext ): Promise<ToolResult[]> { // Execute in parallel (if safe to do so) const results = await Promise.all( calls.map(async call => { try { const tool = this.findTool(call.name); if (!tool) { return { callId: call.id, content: `Error: Tool "${call.name}" not found`, error: true, }; } // Validate input const validated = tool.schema.input.parse(call.input); // Execute const result = await tool.execute(validated, context); // Serialize result return { callId: call.id, content: JSON.stringify(result), error: false, }; } catch (error) { return { callId: call.id, content: `Error executing ${call.name}: ${(error as Error).message}`, error: true, }; } }) ); return results; } private findTool(name: string): Tool | undefined { return this.tools.find(t => t.name === name); } } interface ToolResult { callId: string; content: string; error: boolean; }

5.3 Multi-Turn Example

typescript
// Example: Agent using multiple tools to complete a task const tools = [ readFileTool, writeFileTool, runCommandTool, ]; const dispatcher = new ToolDispatcher(llm, tools); const response = await dispatcher.runWithTools({ system: 'You are a code reviewer. Find issues and suggest fixes.', messages: [{ role: 'user', content: 'Review the authentication module in src/auth.ts', }], tools: tools.map(toolToLLMSchema), }, context); // Under the hood: // Turn 1: LLM → "I'll read src/auth.ts" // Tool call: read_file({ path: 'src/auth.ts' }) // Result: "function login(user, pass) { ... }" // // Turn 2: LLM → "I see a SQL injection vulnerability. Let me read tests." // Tool call: read_file({ path: 'src/auth.test.ts' }) // Result: "test('login with valid creds', ...)" // // Turn 3: LLM → "Tests don't cover SQL injection. Here's my review..." // Response: { done: true, result: { findings: [...] } }

5.4 Parallel Tool Calls

When the LLM wants to call multiple tools at once.

typescript
// LLM response with parallel tool calls: { toolCalls: [ { id: '1', name: 'read_file', input: { path: 'src/auth.ts' } }, { id: '2', name: 'read_file', input: { path: 'src/auth.test.ts' } }, { id: '3', name: 'read_file', input: { path: 'src/db.ts' } } ] } // We execute all three in parallel: const results = await Promise.all([ executeToolCall(calls[0]), executeToolCall(calls[1]), executeToolCall(calls[2]), ]); // Feed all results back to LLM in next turn

6. Model Selection Strategy

Dynamic model selection based on task complexity and budget.

6.1 Model Selector

typescript
// ─── src/tools/model-selector.ts ────────────────────────────── interface ModelSelector { /** * Select the best model for a given task and budget */ selectModel( taskType: TaskType, budget: BudgetState, options?: SelectionOptions ): ModelSelection; } type TaskType = | 'planning' // Architecture, decomposition | 'implementation' // Code generation | 'review' // Code review, finding bugs | 'testing' // Test generation, analysis | 'reflection' // Learning from outcomes | 'embedding'; // Semantic search interface BudgetState { remaining: number; // USD remaining in current budget allocated: number; // USD allocated for this task type percentUsed: number; // % of total budget used so far } interface ModelSelection { provider: 'anthropic' | 'openai' | 'ollama'; model: string; estimatedCost: number; rationale: string; } export class DefaultModelSelector implements ModelSelector { selectModel( taskType: TaskType, budget: BudgetState, options?: SelectionOptions ): ModelSelection { // Emergency: budget exhausted, use local if (budget.remaining < 0.1) { return { provider: 'ollama', model: 'llama3.1:8b', estimatedCost: 0, rationale: 'Budget exhausted, using local model', }; } // Warning: budget running low, use cheaper model if (budget.percentUsed > 0.8) { return this.selectCheapModel(taskType, budget); } // Normal operation: select optimal model for task return this.selectOptimalModel(taskType, budget); } private selectOptimalModel( taskType: TaskType, budget: BudgetState ): ModelSelection { switch (taskType) { case 'planning': // Planning needs strong reasoning return { provider: 'anthropic', model: 'claude-sonnet-4-5-20250929', estimatedCost: 0.50, // Estimate for typical planning task rationale: 'Planning requires strong reasoning (Sonnet)', }; case 'implementation': // Code generation needs strong model return { provider: 'anthropic', model: 'claude-sonnet-4-5-20250929', estimatedCost: 0.75, rationale: 'Code generation requires strong model (Sonnet)', }; case 'review': // Review can use fast, cheap model return { provider: 'anthropic', model: 'claude-haiku-4-5-20251001', estimatedCost: 0.10, rationale: 'Review works well with fast model (Haiku)', }; case 'testing': // Test generation and analysis - fast model sufficient return { provider: 'anthropic', model: 'claude-haiku-4-5-20251001', estimatedCost: 0.15, rationale: 'Test analysis works well with fast model (Haiku)', }; case 'reflection': // Reflection - fast model sufficient return { provider: 'anthropic', model: 'claude-haiku-4-5-20251001', estimatedCost: 0.05, rationale: 'Reflection works well with fast model (Haiku)', }; case 'embedding': // Embeddings - use local or cheap API return { provider: 'ollama', model: 'nomic-embed-text', estimatedCost: 0, rationale: 'Embeddings work well with local model', }; default: // Default to Sonnet for unknown tasks return { provider: 'anthropic', model: 'claude-sonnet-4-5-20250929', estimatedCost: 0.50, rationale: 'Unknown task, using default strong model', }; } } private selectCheapModel( taskType: TaskType, budget: BudgetState ): ModelSelection { // Budget warning: downgrade to cheapest viable option if (taskType === 'planning' || taskType === 'implementation') { // Critical tasks still need cloud, but use cheaper option return { provider: 'openai', model: 'gpt-4o-mini', estimatedCost: 0.10, rationale: 'Budget warning, using cheaper cloud model (GPT-4o-mini)', }; } // Non-critical tasks can use local return { provider: 'ollama', model: 'llama3.1:8b', estimatedCost: 0, rationale: 'Budget warning, using local model', }; } }

6.2 Usage in Agent

typescript
// ─── In BaseAgent ───────────────────────────────────────────── class BaseAgent { async execute(input: PhaseInput, ctx: AgentContext): Promise<PhaseOutput> { // Select model based on task type and budget const selection = ctx.modelSelector.selectModel( this.type, // 'planning' | 'implementation' | etc. ctx.budget.getState() ); ctx.bus.emit({ type: 'model.selected', payload: selection, }); // Get provider for selected model const llm = ctx.llmFactory.getProvider( selection.provider, selection.model ); // Use it for reasoning const response = await llm.chat({ system: this.systemPrompt, messages: this.buildMessages(input), tools: this.tools.map(toolToLLMSchema), }); // Track cost ctx.budget.recordSpend(response.cost); return response.result as PhaseOutput; } }

7. Cost Tracking

Per-call cost calculation and accumulation.

7.1 Cost Calculator

typescript
// ─── src/safety/cost-calculator.ts ──────────────────────────── export class CostCalculator { private pricing: Map<string, ModelPricing>; constructor() { this.pricing = new Map([ // Anthropic ['claude-opus-4-6', { inputPer1M: 15.00, outputPer1M: 75.00 }], ['claude-sonnet-4-5-20250929', { inputPer1M: 3.00, outputPer1M: 15.00 }], ['claude-haiku-4-5-20251001', { inputPer1M: 0.25, outputPer1M: 1.25 }], // OpenAI ['gpt-4o', { inputPer1M: 2.50, outputPer1M: 10.00 }], ['gpt-4o-mini', { inputPer1M: 0.15, outputPer1M: 0.60 }], // Embeddings ['text-embedding-3-small', { inputPer1M: 0.02, outputPer1M: 0 }], ['text-embedding-3-large', { inputPer1M: 0.13, outputPer1M: 0 }], ]); } /** * Calculate cost for a completion */ calculateCost( model: string, inputTokens: number, outputTokens: number ): number { const pricing = this.pricing.get(model); if (!pricing) { console.warn(`Unknown model pricing: ${model}, using default`); return 0; } const inputCost = (inputTokens / 1_000_000) * pricing.inputPer1M; const outputCost = (outputTokens / 1_000_000) * pricing.outputPer1M; return inputCost + outputCost; } /** * Estimate cost for a request (before making it) */ estimateCost( model: string, estimatedInputTokens: number, estimatedOutputTokens: number ): number { return this.calculateCost(model, estimatedInputTokens, estimatedOutputTokens); } /** * Update pricing (if rates change) */ updatePricing(model: string, pricing: ModelPricing): void { this.pricing.set(model, pricing); } } interface ModelPricing { inputPer1M: number; outputPer1M: number; }

7.2 Cost Tracker (CostBreaker Integration)

typescript
// ─── src/safety/cost-tracker.ts ─────────────────────────────── import { eq, and, gte } from 'drizzle-orm'; import { db } from '../memory/db.ts'; import { events } from '../memory/schema.ts'; export class CostTracker { constructor( private calculator: CostCalculator ) {} /** * Record cost for an LLM call */ async recordCost( traceId: string, phase: string, model: string, inputTokens: number, outputTokens: number ): Promise<CostRecord> { const cost = this.calculator.calculateCost(model, inputTokens, outputTokens); // Store in events table await db.insert(events).values({ id: ulid(), traceId, timestamp: Date.now(), source: 'llm', type: 'llm.call', phase, payload: { model, inputTokens, outputTokens }, tokensUsed: inputTokens + outputTokens, costUsd: cost, }); return { cost, tokens: inputTokens + outputTokens, model, }; } /** * Get total cost for a run */ async getRunCost(traceId: string): Promise<number> { const costs = await db .select({ cost: events.costUsd }) .from(events) .where( and( eq(events.traceId, traceId), eq(events.type, 'llm.call') ) ); return costs.reduce((sum, c) => sum + (c.cost ?? 0), 0); } /** * Get cost breakdown by phase */ async getPhaseCosts(traceId: string): Promise<Record<string, number>> { const costs = await db .select({ phase: events.phase, cost: events.costUsd }) .from(events) .where( and( eq(events.traceId, traceId), eq(events.type, 'llm.call') ) ); return costs.reduce((acc, c) => { const phase = c.phase ?? 'unknown'; acc[phase] = (acc[phase] ?? 0) + (c.cost ?? 0); return acc; }, {} as Record<string, number>); } /** * Get daily spend */ async getDailySpend(): Promise<number> { const today = new Date(); today.setHours(0, 0, 0, 0); const costs = await db .select({ cost: events.costUsd }) .from(events) .where( and( gte(events.timestamp, today.getTime()), eq(events.type, 'llm.call') ) ); return costs.reduce((sum, c) => sum + (c.cost ?? 0), 0); } /** * Check if budget limit is exceeded */ async checkBudget( traceId: string, phase: string, limits: BudgetLimits ): Promise<BudgetCheck> { const runCost = await this.getRunCost(traceId); const dailyCost = await this.getDailySpend(); const phaseCosts = await this.getPhaseCosts(traceId); const phaseCost = phaseCosts[phase] ?? 0; const violations: BudgetViolation[] = []; if (phaseCost >= limits.perPhase[phase]) { violations.push({ type: 'phase', limit: limits.perPhase[phase], actual: phaseCost, severity: 'critical', }); } if (runCost >= limits.perRun) { violations.push({ type: 'run', limit: limits.perRun, actual: runCost, severity: 'critical', }); } if (dailyCost >= limits.perDay) { violations.push({ type: 'daily', limit: limits.perDay, actual: dailyCost, severity: 'critical', }); } return { withinBudget: violations.length === 0, violations, current: { run: runCost, phase: phaseCost, daily: dailyCost }, limits, }; } } interface BudgetLimits { perPhase: Record<string, number>; perRun: number; perDay: number; } interface BudgetCheck { withinBudget: boolean; violations: BudgetViolation[]; current: { run: number; phase: number; daily: number }; limits: BudgetLimits; } interface BudgetViolation { type: 'phase' | 'run' | 'daily'; limit: number; actual: number; severity: 'warning' | 'critical'; } interface CostRecord { cost: number; tokens: number; model: string; }

8. Retry & Error Handling

LLM API resilience.

8.1 Error Types

typescript
// ─── src/tools/llm-errors.ts ────────────────────────────────── export class RateLimitError extends Error { constructor( message: string, public retryAfter?: number // seconds ) { super(message); this.name = 'RateLimitError'; } } export class InvalidRequestError extends Error { constructor(message: string) { super(message); this.name = 'InvalidRequestError'; } } export class ProviderError extends Error { constructor( message: string, public isRetryable: boolean ) { super(message); this.name = 'ProviderError'; } } export class ContentFilterError extends Error { constructor(message: string) { super(message); this.name = 'ContentFilterError'; } } export class TimeoutError extends Error { constructor(message: string) { super(message); this.name = 'TimeoutError'; } }

8.2 Retry Logic

typescript
// ─── src/tools/llm-retry.ts ─────────────────────────────────── export class RetryableProvider implements LLMProvider { constructor( private provider: LLMProvider, private config: RetryConfig = DEFAULT_RETRY_CONFIG ) {} readonly provider = this.provider.provider; readonly model = this.provider.model; async chat(request: ChatRequest): Promise<ChatResponse> { return this.withRetry(() => this.provider.chat(request)); } async embed(text: string): Promise<Float32Array> { return this.withRetry(() => this.provider.embed(text)); } private async withRetry<T>( fn: () => Promise<T>, attempt: number = 1 ): Promise<T> { try { // Add timeout to prevent hanging return await this.withTimeout(fn, this.config.timeout); } catch (error) { // Don't retry if max attempts reached if (attempt >= this.config.maxRetries) { throw error; } // Check if error is retryable const shouldRetry = this.isRetryable(error); if (!shouldRetry) { throw error; } // Calculate backoff delay const delay = this.calculateBackoff(attempt, error); // Log retry console.warn(`LLM call failed (attempt ${attempt}), retrying in ${delay}ms`, error); // Wait and retry await this.sleep(delay); return this.withRetry(fn, attempt + 1); } } private isRetryable(error: unknown): boolean { if (error instanceof RateLimitError) return true; if (error instanceof TimeoutError) return true; if (error instanceof ProviderError) return error.isRetryable; if (error instanceof InvalidRequestError) return false; if (error instanceof ContentFilterError) return false; // Unknown errors are retryable by default return true; } private calculateBackoff(attempt: number, error: unknown): number { // Rate limit errors: respect retry-after header if (error instanceof RateLimitError && error.retryAfter) { return error.retryAfter * 1000; } // Exponential backoff: 1s, 2s, 4s, 8s, ... const exponential = Math.pow(2, attempt - 1) * 1000; // Add jitter to prevent thundering herd const jitter = Math.random() * 1000; return Math.min(exponential + jitter, this.config.maxBackoff); } private async withTimeout<T>( fn: () => Promise<T>, timeout: number ): Promise<T> { return Promise.race([ fn(), this.sleep(timeout).then(() => { throw new TimeoutError(`LLM call timed out after ${timeout}ms`); }), ]); } private sleep(ms: number): Promise<void> { return new Promise(resolve => setTimeout(resolve, ms)); } } interface RetryConfig { maxRetries: number; timeout: number; // ms maxBackoff: number; // ms } const DEFAULT_RETRY_CONFIG: RetryConfig = { maxRetries: 3, timeout: 30_000, // 30 seconds maxBackoff: 60_000, // 1 minute };

8.3 Fallback Chain

typescript
// ─── src/tools/llm-fallback.ts ──────────────────────────────── export class FallbackProvider implements LLMProvider { constructor( private providers: LLMProvider[] ) { if (providers.length === 0) { throw new Error('FallbackProvider requires at least one provider'); } } readonly provider = 'fallback' as const; get model(): string { return this.providers[0].model; } async chat(request: ChatRequest): Promise<ChatResponse> { let lastError: Error | undefined; for (const provider of this.providers) { try { const response = await provider.chat(request); // If not the first provider, log that we fell back if (provider !== this.providers[0]) { console.warn(`Fell back to ${provider.provider}/${provider.model}`); } return response; } catch (error) { lastError = error as Error; console.error(`Provider ${provider.provider}/${provider.model} failed:`, error); // Try next provider } } // All providers failed throw new Error(`All providers failed. Last error: ${lastError?.message}`); } async embed(text: string): Promise<Float32Array> { let lastError: Error | undefined; for (const provider of this.providers) { try { return await provider.embed(text); } catch (error) { lastError = error as Error; } } throw new Error(`All embedding providers failed. Last error: ${lastError?.message}`); } } // Example usage: const primaryProvider = new AnthropicProvider(apiKey, 'claude-sonnet-4-5-20250929'); const cheaperProvider = new AnthropicProvider(apiKey, 'claude-haiku-4-5-20251001'); const localProvider = new OllamaProvider('llama3.1:8b'); const fallbackChain = new FallbackProvider([ new RetryableProvider(primaryProvider), // Try Sonnet first with retries new RetryableProvider(cheaperProvider), // Fall back to Haiku if Sonnet fails localProvider, // Fall back to local if all cloud fails ]);

9. Caching

Reduce costs by caching responses.

9.1 Prompt Caching (Anthropic Feature)

typescript
// ─── src/tools/llm-prompt-cache.ts ──────────────────────────── /** * Anthropic's prompt caching feature. * Cache frequently used context (system prompt, code context). * Subsequent requests with same cached content pay reduced token cost. */ export class AnthropicProviderWithCache extends AnthropicProvider { async chat(request: ChatRequest): Promise<ChatResponse> { // Mark long system prompts for caching const anthropicRequest = this.toAnthropicFormat(request); // If system prompt is large, enable caching if (request.system.length > 1000) { anthropicRequest.system = [ { type: 'text', text: request.system, cache_control: { type: 'ephemeral' }, // Cache for 5 minutes }, ]; } // If messages include large code context, cache it const lastUserMessage = request.messages.findLast(m => m.role === 'user'); if (lastUserMessage && lastUserMessage.content.length > 2000) { // Mark for caching (Anthropic caches last user message automatically) } return super.chat(request); } }

9.2 Response Caching

typescript
// ─── src/tools/llm-cache.ts ─────────────────────────────────── import { createHash } from 'crypto'; /** * Cache LLM responses for identical inputs. * Useful for deterministic tasks like linting, formatting. */ export class CachedProvider implements LLMProvider { private cache = new Map<string, ChatResponse>(); constructor( private provider: LLMProvider, private ttl: number = 60 * 60 * 1000 // 1 hour default ) {} readonly provider = this.provider.provider; readonly model = this.provider.model; async chat(request: ChatRequest): Promise<ChatResponse> { // Generate cache key from request const key = this.generateKey(request); // Check cache const cached = this.cache.get(key); if (cached) { console.debug('Cache hit for LLM request'); return { ...cached, cost: 0 }; // Cached response costs nothing } // Cache miss - call provider const response = await this.provider.chat(request); // Store in cache this.cache.set(key, response); // Evict after TTL setTimeout(() => this.cache.delete(key), this.ttl); return response; } async embed(text: string): Promise<Float32Array> { const key = `embed:${text}`; const cached = this.cache.get(key); if (cached) { return cached as unknown as Float32Array; } const result = await this.provider.embed(text); this.cache.set(key, result as unknown as ChatResponse); setTimeout(() => this.cache.delete(key), this.ttl); return result; } private generateKey(request: ChatRequest): string { // Hash the request for deterministic key const normalized = { system: request.system, messages: request.messages, tools: request.tools, temperature: request.temperature ?? 0.7, maxTokens: request.maxTokens ?? 4096, }; const hash = createHash('sha256') .update(JSON.stringify(normalized)) .digest('hex'); return `${this.model}:${hash}`; } clear(): void { this.cache.clear(); } }

9.3 Embedding Cache (Persistent)

typescript
// ─── src/memory/embedding-cache.ts ──────────────────────────── import { db } from './db.ts'; import { sqliteTable, text, blob } from 'drizzle-orm/sqlite-core'; import { eq } from 'drizzle-orm'; import { createHash } from 'crypto'; const embeddingCache = sqliteTable('embedding_cache', { key: text('key').primaryKey(), embedding: blob('embedding').notNull(), model: text('model').notNull(), createdAt: integer('created_at', { mode: 'timestamp_ms' }).notNull(), }); export class EmbeddingCache { async get(text: string, model: string): Promise<Float32Array | null> { const key = this.generateKey(text); const result = await db .select({ embedding: embeddingCache.embedding }) .from(embeddingCache) .where(eq(embeddingCache.key, key)) .limit(1); if (result.length === 0) return null; // Deserialize embedding return new Float32Array(result[0].embedding as ArrayBuffer); } async set(text: string, model: string, embedding: Float32Array): Promise<void> { const key = this.generateKey(text); await db.insert(embeddingCache).values({ key, model, embedding: Buffer.from(embedding.buffer), createdAt: Date.now(), }).onConflictDoNothing(); } private generateKey(text: string): string { return createHash('sha256').update(text).digest('hex'); } async prune(olderThan: Date): Promise<number> { const result = await db .delete(embeddingCache) .where(lt(embeddingCache.createdAt, olderThan.getTime())); return result.rowsAffected; } }

10. Testing

MockLLMProvider for deterministic tests.

10.1 Mock Provider

typescript
// ─── src/tools/llm-mock.ts ──────────────────────────────────── export class MockLLMProvider implements LLMProvider { readonly provider = 'mock' as const; readonly model = 'mock-model'; private responses: ChatResponse[] = []; private embeddings: Map<string, Float32Array> = new Map(); private callLog: ChatRequest[] = []; /** * Configure canned responses for testing */ addResponse(response: ChatResponse): void { this.responses.push(response); } /** * Configure canned embeddings */ addEmbedding(text: string, embedding: Float32Array): void { this.embeddings.set(text, embedding); } async chat(request: ChatRequest): Promise<ChatResponse> { // Log the call this.callLog.push(request); // Return next canned response if (this.responses.length === 0) { throw new Error('MockLLMProvider: No more canned responses'); } return this.responses.shift()!; } async embed(text: string): Promise<Float32Array> { const embedding = this.embeddings.get(text); if (!embedding) { throw new Error(`MockLLMProvider: No embedding configured for: ${text}`); } return embedding; } /** * Get all calls made to this provider (for assertions) */ getCalls(): ChatRequest[] { return [...this.callLog]; } /** * Reset state */ reset(): void { this.responses = []; this.embeddings.clear(); this.callLog = []; } } // Example usage in tests: describe('PlannerAgent', () => { it('should decompose task into subtasks', async () => { const mockLLM = new MockLLMProvider(); // Configure expected response mockLLM.addResponse({ content: 'Task decomposition...', done: true, result: { tasks: [ { id: '1', description: 'Create schema' }, { id: '2', description: 'Implement endpoints' }, ], }, usage: { promptTokens: 100, completionTokens: 50, totalTokens: 150 }, cost: 0.001, model: 'mock-model', provider: 'mock', }); const planner = new PlannerAgent(mockLLM, tools); const result = await planner.execute({ task: 'Add user auth' }, ctx); expect(result.tasks).toHaveLength(2); expect(mockLLM.getCalls()).toHaveLength(1); }); });

10.2 Recording/Playback Mode

typescript
// ─── src/tools/llm-recorder.ts ──────────────────────────────── /** * Record LLM interactions for playback in tests. * Useful for integration tests without hitting live API. */ export class RecordingProvider implements LLMProvider { private recordings: Recording[] = []; constructor( private provider: LLMProvider, private mode: 'record' | 'playback', private recordingPath?: string ) { if (mode === 'playback' && recordingPath) { this.loadRecordings(recordingPath); } } readonly provider = this.provider.provider; readonly model = this.provider.model; async chat(request: ChatRequest): Promise<ChatResponse> { if (this.mode === 'playback') { return this.playback(request); } // Record mode: call real provider and record const response = await this.provider.chat(request); this.recordings.push({ request, response, timestamp: new Date().toISOString(), }); return response; } async embed(text: string): Promise<Float32Array> { if (this.mode === 'playback') { const recording = this.recordings.find( r => r.request.type === 'embed' && r.request.text === text ); if (!recording) { throw new Error('No recording found for embedding request'); } return recording.response as Float32Array; } const embedding = await this.provider.embed(text); this.recordings.push({ request: { type: 'embed', text }, response: embedding, timestamp: new Date().toISOString(), }); return embedding; } private playback(request: ChatRequest): ChatResponse { // Find matching recording const recording = this.recordings.find(r => this.requestsMatch(r.request as ChatRequest, request) ); if (!recording) { throw new Error('No recording found for request'); } return recording.response as ChatResponse; } private requestsMatch(a: ChatRequest, b: ChatRequest): boolean { // Simplified matching - could be more sophisticated return a.system === b.system && a.messages.length === b.messages.length; } saveRecordings(path: string): void { const json = JSON.stringify(this.recordings, null, 2); fs.writeFileSync(path, json); } private loadRecordings(path: string): void { const json = fs.readFileSync(path, 'utf-8'); this.recordings = JSON.parse(json); } } interface Recording { request: ChatRequest | { type: 'embed'; text: string }; response: ChatResponse | Float32Array; timestamp: string; }

10.3 Token Counting Simulation

typescript
// ─── src/tools/llm-token-counter.ts ─────────────────────────── /** * Approximate token counting for budget testing. * Uses Claude's tokenizer approximation. */ export class TokenCounter { /** * Estimate tokens in text (Claude approximation) * Rule of thumb: ~4 characters per token for English text */ estimateTokens(text: string): number { // This is a rough approximation // For accurate counting, use tiktoken or provider-specific tokenizers return Math.ceil(text.length / 4); } /** * Estimate tokens in a chat request */ estimateRequestTokens(request: ChatRequest): number { let total = this.estimateTokens(request.system); for (const message of request.messages) { total += this.estimateTokens(message.content); if (message.role === 'assistant' && message.toolCalls) { for (const call of message.toolCalls) { total += this.estimateTokens(JSON.stringify(call.input)); } } } if (request.tools) { for (const tool of request.tools) { total += this.estimateTokens(JSON.stringify(tool)); } } return total; } } // Usage in MockLLMProvider: export class MockLLMProvider { private tokenCounter = new TokenCounter(); async chat(request: ChatRequest): Promise<ChatResponse> { const response = this.responses.shift()!; // Simulate realistic token counts const inputTokens = this.tokenCounter.estimateRequestTokens(request); const outputTokens = this.tokenCounter.estimateTokens(response.content); return { ...response, usage: { promptTokens: inputTokens, completionTokens: outputTokens, totalTokens: inputTokens + outputTokens, }, }; } }

11. Implementation Checklist

Week 1: Foundation

  • Define core types (LLMProvider, ChatRequest, ChatResponse)
  • Implement Zod schemas for validation
  • Create AnthropicProvider with basic chat support
  • Add error types (RateLimitError, ProviderError, etc.)
  • Write unit tests for Anthropic provider

Week 2: Tool Calling

  • Implement tool schema conversion (toolToLLMSchema)
  • Build ToolDispatcher with multi-turn loop
  • Add support for parallel tool calls
  • Test tool calling with mock tools
  • Add OpenAI provider for diversity

Week 3: Cost & Selection

  • Create CostCalculator with pricing table
  • Implement CostTracker with database persistence
  • Build ModelSelector with task-based selection
  • Add budget checking integration
  • Test cost tracking end-to-end

Week 4: Resilience

  • Implement RetryableProvider with exponential backoff
  • Add FallbackProvider for provider chaining
  • Implement timeout handling
  • Test retry behavior with mock errors
  • Add rate limit handling

Week 5: Caching & Optimization

  • Implement response caching (CachedProvider)
  • Add embedding cache with SQLite persistence
  • Support Anthropic prompt caching
  • Test cache hit rates
  • Add cache metrics

Week 6: Local Models

  • Implement OllamaProvider
  • Add model management (pull, list)
  • Test local embeddings
  • Document local vs cloud tradeoffs
  • Add selection criteria for local models

Week 7: Testing Infrastructure

  • Create MockLLMProvider for unit tests
  • Implement recording/playback mode
  • Add token counting simulation
  • Write comprehensive test suite
  • Document testing patterns

Week 8: Polish & Integration

  • Integrate with agent base class
  • Add streaming support for long operations
  • Comprehensive error handling documentation
  • Performance benchmarks
  • Production readiness review

12. Usage Examples

Example 1: Simple Chat

typescript
const llm = new AnthropicProvider(apiKey); const response = await llm.chat({ system: 'You are a helpful coding assistant.', messages: [{ role: 'user', content: 'Write a TypeScript function to reverse a string', }], }); console.log(response.content); // function reverse(str: string): string { // return str.split('').reverse().join(''); // }

Example 2: Tool Calling

typescript
const tools = [ { name: 'read_file', description: 'Read a file from disk', inputSchema: { type: 'object' as const, properties: { path: { type: 'string', description: 'File path' }, }, required: ['path'], }, }, ]; const dispatcher = new ToolDispatcher(llm, [readFileTool]); const response = await dispatcher.runWithTools({ system: 'You can read files to help answer questions.', messages: [{ role: 'user', content: 'What does package.json contain?', }], tools, }, toolContext); // Agent will: // 1. Call read_file({ path: 'package.json' }) // 2. Get result // 3. Answer: "The package.json contains dependencies like..."

Example 3: Cost-Aware Selection

typescript
const selector = new DefaultModelSelector(); const budget = { remaining: 5.00, allocated: 10.00, percentUsed: 0.5, }; const selection = selector.selectModel('implementation', budget); // { // provider: 'anthropic', // model: 'claude-sonnet-4-5-20250929', // estimatedCost: 0.75, // rationale: 'Code generation requires strong model' // } // Budget low? Switches to cheaper model budget.percentUsed = 0.85; const cheapSelection = selector.selectModel('review', budget); // { // provider: 'anthropic', // model: 'claude-haiku-4-5-20251001', // estimatedCost: 0.10, // rationale: 'Review works well with fast model' // }

Example 4: Fallback Chain

typescript
const primary = new RetryableProvider( new AnthropicProvider(apiKey, 'claude-sonnet-4-5-20250929') ); const secondary = new RetryableProvider( new OpenAIProvider(openaiKey, 'gpt-4o-mini') ); const local = new OllamaProvider('llama3.1:8b'); const robust = new FallbackProvider([primary, secondary, local]); // Will try Anthropic, then OpenAI, then Ollama if both fail const response = await robust.chat(request);

13. Configuration

typescript
// ─── forge.config.ts ────────────────────────────────────────── export default defineConfig({ llm: { // Primary provider provider: 'anthropic', apiKey: process.env.ANTHROPIC_API_KEY, // Model selection models: { planning: 'claude-sonnet-4-5-20250929', implementation: 'claude-sonnet-4-5-20250929', review: 'claude-haiku-4-5-20251001', testing: 'claude-haiku-4-5-20251001', reflection: 'claude-haiku-4-5-20251001', }, // Fallback providers fallback: [ { provider: 'openai', apiKey: process.env.OPENAI_API_KEY, model: 'gpt-4o-mini', }, { provider: 'ollama', baseURL: 'http://localhost:11434', model: 'llama3.1:8b', }, ], // Retry configuration retry: { maxRetries: 3, timeout: 30000, maxBackoff: 60000, }, // Caching cache: { enabled: true, ttl: 3600000, // 1 hour }, // Cost limits budget: { perPhase: { planning: 5.00, implementation: 10.00, review: 2.00, testing: 3.00, deployment: 2.00, }, perRun: 50.00, perDay: 200.00, }, }, });

14. Observability

Metrics to Track

typescript
// LLM-specific metrics const llmMetrics = { // Call counts 'llm.calls.total': counter({ labels: ['provider', 'model', 'phase'] }), 'llm.calls.errors': counter({ labels: ['provider', 'model', 'error_type'] }), // Performance 'llm.latency': histogram({ labels: ['provider', 'model'], buckets: [100, 500, 1000, 5000, 10000] }), 'llm.tokens': histogram({ labels: ['provider', 'model', 'type'], buckets: [100, 500, 1000, 5000, 10000] }), // Cost 'llm.cost': histogram({ labels: ['provider', 'model', 'phase'], buckets: [0.01, 0.1, 0.5, 1, 5, 10] }), 'llm.cost.daily': gauge({ labels: ['provider'] }), // Cache 'llm.cache.hits': counter({ labels: ['cache_type'] }), 'llm.cache.misses': counter({ labels: ['cache_type'] }), // Retries & Fallbacks 'llm.retries': counter({ labels: ['provider', 'reason'] }), 'llm.fallbacks': counter({ labels: ['from_provider', 'to_provider'] }), };

Event Emissions

typescript
// Events emitted by LLM layer bus.emit({ type: 'llm.call.start', payload: { provider, model, phase } }); bus.emit({ type: 'llm.call.complete', payload: { provider, model, tokens, cost } }); bus.emit({ type: 'llm.call.error', payload: { provider, model, error } }); bus.emit({ type: 'llm.retry', payload: { provider, attempt, reason } }); bus.emit({ type: 'llm.fallback', payload: { from, to, reason } }); bus.emit({ type: 'llm.cache.hit', payload: { key } }); bus.emit({ type: 'llm.budget.warning', payload: { current, limit, scope } }); bus.emit({ type: 'llm.budget.exceeded', payload: { current, limit, scope } });

Dependencies

json
{ "dependencies": { "@anthropic-ai/sdk": "^0.32.0", "openai": "^4.77.0", "zod": "^3.24.1", "zod-to-json-schema": "^3.24.1", "drizzle-orm": "^0.36.4" }, "devDependencies": { "@types/node": "^22.10.5" } }

End of Implementation Plan

This plan provides complete TypeScript implementations for all 10 components of the LLM Provider Abstraction. The system is production-ready with proper error handling, cost tracking, caching, and fallback mechanisms.