Correspondents/docs/PRD-vercel-ai-sdk-migration.md

# PRD: N8N → Vercel AI SDK Migration

## Executive Summary

Migrate from n8n webhooks to a consolidated Vercel AI SDK backend to enable native streaming + tool calls support, eliminate external service dependency, and streamline agent configuration. Single `/api/agents` endpoint replaces multiple n8n workflows.

**Model Provider:** OpenRouter (gpt-oss-120b)
**Framework:** Vercel AI SDK
**Deployment:** Cloudflare Workers (existing)
**Frontend Changes:** Minimal (streaming enabled, no UI/UX changes)

---

## Problem Statement

Current n8n architecture has three pain points:

1. **Streaming + Tool Calls:** n8n's response model doesn't naturally support streaming structured tool calls; requires fragile JSON parsing workarounds
2. **External Dependency:** Every chat request depends on n8n availability and response format consistency
3. **Morgan Complexity:** Custom agent creation routed through n8n visual workflows, adding friction to the "Agent Forge" experience

---

## Solution Overview

### Architecture Changes

```
[Frontend Chat Interface]
         ↓
[POST /api/chat (NEW)]
    ├─ Extracts agentId, message, sessionId, images
    ├─ Routes to unified agent handler
    └─ Returns Server-Sent Events stream
         ↓
[Agent Factory]
    ├─ Standard agents (agent-1, agent-2, etc.)
    │   └─ Pre-configured with system prompts + tools
    ├─ Custom agents (custom-{uuid})
    │   └─ Loaded from localStorage/KV, same config pattern
    └─ Morgan agent (special standard agent)
         ↓
[Vercel AI SDK]
    ├─ generateText() or streamText() for each agent
    ├─ LLM: OpenRouter (gpt-oss-120b)
    ├─ Tools: RAG (Qdrant), knowledge retrieval, etc.
    └─ Native streaming + structured tool call events
         ↓
[External Services]
    ├─ OpenRouter API (LLM)
    └─ Qdrant (RAG vector DB)
```

### Key Differences from N8N

| Aspect | N8N | Vercel AI SDK |
|--------|-----|--------------|
| **Tool Calls** | JSON strings in response text | Native message events (type: "tool-call") |
| **Streaming** | Text chunks (fragile with structured data) | Proper SSE with typed events |
| **Agent Config** | Visual workflows | Code-based definitions |
| **Custom Agents** | N8N workflows per agent | Loaded JSON configs + shared logic |
| **Dependencies** | External n8n instance | In-process (Cloudflare Worker) |

---

## Detailed Design

### 1. Agent System Architecture

#### Standard Agents (Pre-configured)

```typescript
// src/lib/agents/definitions.ts
interface AgentDefinition {
  id: string                          // "agent-1", "agent-2", etc.
  name: string
  description: string
  systemPrompt: string
  tools: AgentTool[]                  // Qdrant RAG, knowledge retrieval, etc.
  temperature?: number
  maxTokens?: number
  // Note: model is set globally via OPENROUTER_MODEL environment variable
}

export const STANDARD_AGENTS: Record<string, AgentDefinition> = {
  'agent-1': {
    id: 'agent-1',
    name: 'Research Assistant',
    description: 'Helps with research and analysis',
    systemPrompt: '...',
    tools: [qdrantRagTool(), ...],
    temperature: 0.7,
    maxTokens: 4096
  },
  'agent-2': {
    id: 'agent-2',
    name: 'Morgan - Agent Architect',
    description: 'Creates custom agents based on your needs',
    systemPrompt: '...',
    tools: [createAgentPackageTool()],
    temperature: 0.8,
    maxTokens: 2048
  },
  // ... more agents
}
```

#### Custom Agents (User-created via Morgan)

Custom agents stored in localStorage (browser) and optionally Workers KV (persistence):

```typescript
interface CustomAgent extends AgentDefinition {
  agentId: `custom-${string}`         // UUID format
  pinnedAt: string                    // ISO timestamp
  note?: string
}

// Storage: localStorage.pinned-agents (existing structure)
// Optional: Workers KV for server-side persistence
```

Morgan outputs a `create_agent_package` tool call with the same structure. On frontend, user actions (Use Now / Pin for Later) persist to localStorage; backend can sync to KV if needed.

#### Agent Factory (Runtime)

```typescript
// src/lib/agents/factory.ts
async function getAgentDefinition(agentId: string): Promise<AgentDefinition> {
  // Standard agent
  if (STANDARD_AGENTS[agentId]) {
    return STANDARD_AGENTS[agentId]
  }

  // Custom agent - load from request context or KV
  if (agentId.startsWith('custom-')) {
    const customAgent = await loadCustomAgent(agentId)
    return customAgent
  }

  throw new Error(`Agent not found: ${agentId}`)
}
```

---

### 2. Chat API (`/api/chat`)

**Endpoint:** `POST /api/chat`

**Request:**
```typescript
interface ChatRequest {
  message: string
  agentId: string                     // "agent-1", "custom-{uuid}", etc.
  sessionId: string                   // "session-{agentId}-{timestamp}-{random}"
  images?: string[]                   // Base64 encoded
  timestamp: number
}
```

**Response:** Server-Sent Events (SSE)

```
event: text
data: {"content":"Hello, I'm here to help..."}

event: tool-call
data: {"toolName":"qdrant_search","toolInput":{"query":"...","topK":5}}

event: tool-result
data: {"toolName":"qdrant_search","result":[...]}

event: finish
data: {"stopReason":"end_turn"}
```

**Implementation (sketch):**

```typescript
// src/app/api/chat/route.ts
import { streamText } from 'ai'
import { openRouter } from '@ai-sdk/openrouter'
import { getAgentDefinition } from '@/lib/agents/factory'

export async function POST(request: NextRequest) {
  const { message, agentId, sessionId, images } = await request.json()

  // Get agent definition
  const agent = await getAgentDefinition(agentId)

  // Prepare messages (from localStorage per agent - front-end handles)
  const messages = [{ role: 'user', content: message }]

  // Get model from environment variable
  const modelId = process.env.OPENROUTER_MODEL || 'openai/gpt-oss-120b'

  // Stream response
  const result = await streamText({
    model: openRouter(modelId),
    system: agent.systemPrompt,
    tools: agent.tools,
    messages,
    temperature: agent.temperature,
    maxTokens: agent.maxTokens,
  })

  // Return SSE stream
  return result.toAIStream()
}
```

---

### 3. Morgan Agent (Custom Agent Creation)

Morgan is a standard agent (`agent-2`) with special tooling.

**Tool Definition:**

```typescript
const createAgentPackageTool = tool({
  description: 'Create a new AI agent with custom prompt and capabilities',
  parameters: z.object({
    displayName: z.string(),
    summary: z.string(),
    systemPrompt: z.string().describe('Web Agent Bundle formatted prompt'),
    tags: z.array(z.string()),
    recommendedIcon: z.string(),
    whenToUse: z.string(),
  }),
  execute: async (params) => {
    // Return structured data; frontend handles persistence
    return {
      success: true,
      agentId: `custom-${uuidv4()}`,
      ...params,
    }
  },
})
```

**Frontend Behavior (unchanged):**
- Detects tool call with `name: "create_agent_package"`
- Displays `AgentForgeCard` with reveal animation
- User clicks "Use Now" → calls `/api/agents/create` to register
- User clicks "Pin for Later" → saves to localStorage `pinned-agents`
- **Streaming now works naturally** (no more fragile JSON parsing)

---

### 4. RAG Integration (Qdrant)

Define RAG tools as Vercel AI SDK tools:

```typescript
// src/lib/agents/tools/qdrant.ts
import { embed } from 'ai'
import { openRouter } from '@ai-sdk/openrouter'
import { QdrantClient } from '@qdrant/js-client-rest'

const qdrantRagTool = tool({
  description: 'Search knowledge base for relevant information',
  parameters: z.object({
    query: z.string(),
    topK: z.number().default(5),
    threshold: z.number().default(0.7),
  }),
  execute: async ({ query, topK, threshold }) => {
    // Get embedding via OpenRouter (text-embedding-3-large)
    const { embedding } = await embed({
      model: openRouter.textEmbeddingModel('openai/text-embedding-3-large'),
      value: query,
    })

    // Search Qdrant
    const client = new QdrantClient({
      url: process.env.QDRANT_URL,
      apiKey: process.env.QDRANT_API_KEY,
    })

    const results = await client.search('documents', {
      vector: embedding,
      limit: topK,
      score_threshold: threshold,
    })

    return results.map(r => ({
      content: r.payload.text,
      score: r.score,
      source: r.payload.source,
    }))
  },
})
```

---

### 5. Environment Configuration

**wrangler.jsonc updates:**

```jsonc
{
  "vars": {
    // LLM Configuration
    "OPENROUTER_API_KEY": "sk-or-...",
    "OPENROUTER_MODEL": "openai/gpt-oss-120b",

    // RAG Configuration
    "QDRANT_URL": "https://qdrant-instance.example.com",
    "QDRANT_API_KEY": "qdrant-key-...",

    // Feature Flags (existing)
    "IMAGE_UPLOADS_ENABLED": "true",
    "DIFF_TOOL_ENABLED": "true"
  }
}
```

**Notes:**
- `OPENROUTER_API_KEY` - Used for both LLM (gpt-oss-120b) and embeddings (text-embedding-3-large)
- `OPENROUTER_MODEL` - Controls model for all agents; can be changed without redeploying agent definitions
- Feature flags: No changes needed (still work as-is)

---

### 6. Frontend Integration

**Minimal changes:**

1. **`/api/chat` now streams SSE events:**
   - Client detects `event: text` → append to message
   - Client detects `event: tool-call` → handle Morgan tool calls
   - Client detects `event: finish` → mark message complete

2. **Message format stays the same:**
   - Still stored in localStorage per agent
   - sessionId management unchanged
   - Image handling unchanged

3. **Morgan integration:**
   - Tool calls parsed from SSE events (not JSON strings)
   - `AgentForgeCard` display logic unchanged
   - Pinned agents drawer unchanged

**Example streaming handler (pseudo-code):**

```typescript
const response = await fetch('/api/chat', { method: 'POST', body: ... })
const reader = response.body.getReader()
let assistantMessage = ''

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  const text = new TextDecoder().decode(value)
  const lines = text.split('\n')

  for (const line of lines) {
    if (line.startsWith('data:')) {
      const data = JSON.parse(line.slice(5))

      if (data.type === 'text') {
        assistantMessage += data.content
        setStreamingMessage(assistantMessage)
      } else if (data.type === 'tool-call') {
        handleToolCall(data)
      }
    }
  }
}
```

---

## Migration Plan

### Phase 1: Setup (1-2 days)
- [ ] Set up Vercel AI SDK in Next.js app
- [ ] Configure OpenRouter API key
- [ ] Create agent definitions structure
- [ ] Implement agent factory

### Phase 2: Core Chat Endpoint (2-3 days)
- [ ] Build `/api/chat` with Vercel `streamText()`
- [ ] Test streaming with standard agents
- [ ] Implement RAG tool with Qdrant
- [ ] Test tool calls + streaming together

### Phase 3: Morgan Agent (1-2 days)
- [ ] Define `create_agent_package` tool
- [ ] Test Morgan custom agent creation
- [ ] Verify frontend AgentForgeCard still works

### Phase 4: Frontend Streaming (1 day)
- [ ] Update chat interface to handle SSE events
- [ ] Test streaming message display
- [ ] Verify tool call handling

### Phase 5: Testing & Deployment (1 day)
- [ ] Unit tests for agent factory + tools
- [ ] Integration tests for chat endpoint
- [ ] Deploy to Cloudflare
- [ ] Smoke test all agents

### Phase 6: Cleanup (1 day)
- [ ] Remove n8n webhook references
- [ ] Update environment variable docs
- [ ] Archive old API routes

**Total Estimate:** 1-1.5 weeks

---

## Success Criteria

- [ ] All standard agents stream responses naturally
- [ ] Tool calls appear as first-class events (not JSON strings)
- [ ] Morgan creates custom agents with streaming
- [ ] Frontend displays streaming text + tool calls without jank
- [ ] RAG queries return relevant results
- [ ] Custom agents persist across page reloads
- [ ] Deployment to Cloudflare Workers succeeds
- [ ] No performance regression vs. n8n (ideally faster)

---

## Design Decisions (Locked)

1. **Custom Agent Storage:** localStorage only
   - Future: Can migrate to Cloudflare KV for persistence/multi-device sync
   - For now: Simple, no server-side state needed

2. **Model Selection:** Single model configured via environment variable
   - All agents use `OPENROUTER_MODEL` (default: `openai/gpt-oss-120b`)
   - Easy to change globally without redeploying agent definitions
   - Per-agent model selection not needed at launch

3. **Embedding Model:** OpenRouter's `text-embedding-3-large`
   - Used for Qdrant RAG queries
   - Routed through OpenRouter API (same auth key as LLM)
   - Verify OpenRouter has this model available

## Open Questions

1. **Error Handling:** How to handle OpenRouter rate limits or timeouts?
   - **Recommendation:** Graceful error responses, message queuing in localStorage

---

## Dependencies

- `ai` (Vercel AI SDK) - Core agent framework
- `@ai-sdk/openrouter` (OpenRouter provider for Vercel AI SDK)
- `zod` (tool parameters validation)
- `@qdrant/js-client-rest` (Qdrant vector DB client)
- `next` 15.5.4 (existing)
- `uuid` (for custom agent IDs)

---

## Risks & Mitigations

| Risk | Mitigation |
|------|-----------|
| OpenRouter API key exposure | Cloudflare Workers KV for secrets, never client-side |
| Token limit errors from large messages | Implement message compression + context window management |
| Qdrant downtime breaks RAG | Graceful fallback (agent responds without RAG context) |
| Breaking streaming changes | Comprehensive integration tests before deployment |