Correspondents/docs/PRD-vercel-ai-sdk-migration.md
Nicholai dc9673005b fix: embed Morgan system prompt for Cloudflare deployment
Morgan's system prompt is now generated at build time and embedded directly
in the code, making it available in Cloudflare Worker environments where
file system access isn't available.

Changes:
- Add scripts/generate-morgan-prompt.js to generate TypeScript constant from markdown
- Generate src/lib/agents/morgan-system-prompt.ts with full Fortura Agent Bundle
- Update agent definitions to import and use the embedded constant
- Update package.json build scripts to generate prompt before building
- Remove runtime file system access (readFileSync) that failed on Cloudflare

This ensures Morgan agent has full system prompt capabilities on all deployments.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-16 14:32:51 -07:00

13 KiB

PRD: N8N → Vercel AI SDK Migration

Executive Summary

Migrate from n8n webhooks to a consolidated Vercel AI SDK backend to enable native streaming + tool calls support, eliminate external service dependency, and streamline agent configuration. Single /api/agents endpoint replaces multiple n8n workflows.

Model Provider: OpenRouter (gpt-oss-120b) Framework: Vercel AI SDK Deployment: Cloudflare Workers (existing) Frontend Changes: Minimal (streaming enabled, no UI/UX changes)


Problem Statement

Current n8n architecture has three pain points:

  1. Streaming + Tool Calls: n8n's response model doesn't naturally support streaming structured tool calls; requires fragile JSON parsing workarounds
  2. External Dependency: Every chat request depends on n8n availability and response format consistency
  3. Morgan Complexity: Custom agent creation routed through n8n visual workflows, adding friction to the "Agent Forge" experience

Solution Overview

Architecture Changes

[Frontend Chat Interface]
         ↓
[POST /api/chat (NEW)]
    ├─ Extracts agentId, message, sessionId, images
    ├─ Routes to unified agent handler
    └─ Returns Server-Sent Events stream
         ↓
[Agent Factory]
    ├─ Standard agents (agent-1, agent-2, etc.)
    │   └─ Pre-configured with system prompts + tools
    ├─ Custom agents (custom-{uuid})
    │   └─ Loaded from localStorage/KV, same config pattern
    └─ Morgan agent (special standard agent)
         ↓
[Vercel AI SDK]
    ├─ generateText() or streamText() for each agent
    ├─ LLM: OpenRouter (gpt-oss-120b)
    ├─ Tools: RAG (Qdrant), knowledge retrieval, etc.
    └─ Native streaming + structured tool call events
         ↓
[External Services]
    ├─ OpenRouter API (LLM)
    └─ Qdrant (RAG vector DB)

Key Differences from N8N

Aspect N8N Vercel AI SDK
Tool Calls JSON strings in response text Native message events (type: "tool-call")
Streaming Text chunks (fragile with structured data) Proper SSE with typed events
Agent Config Visual workflows Code-based definitions
Custom Agents N8N workflows per agent Loaded JSON configs + shared logic
Dependencies External n8n instance In-process (Cloudflare Worker)

Detailed Design

1. Agent System Architecture

Standard Agents (Pre-configured)

// src/lib/agents/definitions.ts
interface AgentDefinition {
  id: string                          // "agent-1", "agent-2", etc.
  name: string
  description: string
  systemPrompt: string
  tools: AgentTool[]                  // Qdrant RAG, knowledge retrieval, etc.
  temperature?: number
  maxTokens?: number
  // Note: model is set globally via OPENROUTER_MODEL environment variable
}

export const STANDARD_AGENTS: Record<string, AgentDefinition> = {
  'agent-1': {
    id: 'agent-1',
    name: 'Research Assistant',
    description: 'Helps with research and analysis',
    systemPrompt: '...',
    tools: [qdrantRagTool(), ...],
    temperature: 0.7,
    maxTokens: 4096
  },
  'agent-2': {
    id: 'agent-2',
    name: 'Morgan - Agent Architect',
    description: 'Creates custom agents based on your needs',
    systemPrompt: '...',
    tools: [createAgentPackageTool()],
    temperature: 0.8,
    maxTokens: 2048
  },
  // ... more agents
}

Custom Agents (User-created via Morgan)

Custom agents stored in localStorage (browser) and optionally Workers KV (persistence):

interface CustomAgent extends AgentDefinition {
  agentId: `custom-${string}`         // UUID format
  pinnedAt: string                    // ISO timestamp
  note?: string
}

// Storage: localStorage.pinned-agents (existing structure)
// Optional: Workers KV for server-side persistence

Morgan outputs a create_agent_package tool call with the same structure. On frontend, user actions (Use Now / Pin for Later) persist to localStorage; backend can sync to KV if needed.

Agent Factory (Runtime)

// src/lib/agents/factory.ts
async function getAgentDefinition(agentId: string): Promise<AgentDefinition> {
  // Standard agent
  if (STANDARD_AGENTS[agentId]) {
    return STANDARD_AGENTS[agentId]
  }

  // Custom agent - load from request context or KV
  if (agentId.startsWith('custom-')) {
    const customAgent = await loadCustomAgent(agentId)
    return customAgent
  }

  throw new Error(`Agent not found: ${agentId}`)
}

2. Chat API (/api/chat)

Endpoint: POST /api/chat

Request:

interface ChatRequest {
  message: string
  agentId: string                     // "agent-1", "custom-{uuid}", etc.
  sessionId: string                   // "session-{agentId}-{timestamp}-{random}"
  images?: string[]                   // Base64 encoded
  timestamp: number
}

Response: Server-Sent Events (SSE)

event: text
data: {"content":"Hello, I'm here to help..."}

event: tool-call
data: {"toolName":"qdrant_search","toolInput":{"query":"...","topK":5}}

event: tool-result
data: {"toolName":"qdrant_search","result":[...]}

event: finish
data: {"stopReason":"end_turn"}

Implementation (sketch):

// src/app/api/chat/route.ts
import { streamText } from 'ai'
import { openRouter } from '@ai-sdk/openrouter'
import { getAgentDefinition } from '@/lib/agents/factory'

export async function POST(request: NextRequest) {
  const { message, agentId, sessionId, images } = await request.json()

  // Get agent definition
  const agent = await getAgentDefinition(agentId)

  // Prepare messages (from localStorage per agent - front-end handles)
  const messages = [{ role: 'user', content: message }]

  // Get model from environment variable
  const modelId = process.env.OPENROUTER_MODEL || 'openai/gpt-oss-120b'

  // Stream response
  const result = await streamText({
    model: openRouter(modelId),
    system: agent.systemPrompt,
    tools: agent.tools,
    messages,
    temperature: agent.temperature,
    maxTokens: agent.maxTokens,
  })

  // Return SSE stream
  return result.toAIStream()
}

3. Morgan Agent (Custom Agent Creation)

Morgan is a standard agent (agent-2) with special tooling.

Tool Definition:

const createAgentPackageTool = tool({
  description: 'Create a new AI agent with custom prompt and capabilities',
  parameters: z.object({
    displayName: z.string(),
    summary: z.string(),
    systemPrompt: z.string().describe('Web Agent Bundle formatted prompt'),
    tags: z.array(z.string()),
    recommendedIcon: z.string(),
    whenToUse: z.string(),
  }),
  execute: async (params) => {
    // Return structured data; frontend handles persistence
    return {
      success: true,
      agentId: `custom-${uuidv4()}`,
      ...params,
    }
  },
})

Frontend Behavior (unchanged):

  • Detects tool call with name: "create_agent_package"
  • Displays AgentForgeCard with reveal animation
  • User clicks "Use Now" → calls /api/agents/create to register
  • User clicks "Pin for Later" → saves to localStorage pinned-agents
  • Streaming now works naturally (no more fragile JSON parsing)

4. RAG Integration (Qdrant)

Define RAG tools as Vercel AI SDK tools:

// src/lib/agents/tools/qdrant.ts
import { embed } from 'ai'
import { openRouter } from '@ai-sdk/openrouter'
import { QdrantClient } from '@qdrant/js-client-rest'

const qdrantRagTool = tool({
  description: 'Search knowledge base for relevant information',
  parameters: z.object({
    query: z.string(),
    topK: z.number().default(5),
    threshold: z.number().default(0.7),
  }),
  execute: async ({ query, topK, threshold }) => {
    // Get embedding via OpenRouter (text-embedding-3-large)
    const { embedding } = await embed({
      model: openRouter.textEmbeddingModel('openai/text-embedding-3-large'),
      value: query,
    })

    // Search Qdrant
    const client = new QdrantClient({
      url: process.env.QDRANT_URL,
      apiKey: process.env.QDRANT_API_KEY,
    })

    const results = await client.search('documents', {
      vector: embedding,
      limit: topK,
      score_threshold: threshold,
    })

    return results.map(r => ({
      content: r.payload.text,
      score: r.score,
      source: r.payload.source,
    }))
  },
})

5. Environment Configuration

wrangler.jsonc updates:

{
  "vars": {
    // LLM Configuration
    "OPENROUTER_API_KEY": "sk-or-...",
    "OPENROUTER_MODEL": "openai/gpt-oss-120b",

    // RAG Configuration
    "QDRANT_URL": "https://qdrant-instance.example.com",
    "QDRANT_API_KEY": "qdrant-key-...",

    // Feature Flags (existing)
    "IMAGE_UPLOADS_ENABLED": "true",
    "DIFF_TOOL_ENABLED": "true"
  }
}

Notes:

  • OPENROUTER_API_KEY - Used for both LLM (gpt-oss-120b) and embeddings (text-embedding-3-large)
  • OPENROUTER_MODEL - Controls model for all agents; can be changed without redeploying agent definitions
  • Feature flags: No changes needed (still work as-is)

6. Frontend Integration

Minimal changes:

  1. /api/chat now streams SSE events:

    • Client detects event: text → append to message
    • Client detects event: tool-call → handle Morgan tool calls
    • Client detects event: finish → mark message complete
  2. Message format stays the same:

    • Still stored in localStorage per agent
    • sessionId management unchanged
    • Image handling unchanged
  3. Morgan integration:

    • Tool calls parsed from SSE events (not JSON strings)
    • AgentForgeCard display logic unchanged
    • Pinned agents drawer unchanged

Example streaming handler (pseudo-code):

const response = await fetch('/api/chat', { method: 'POST', body: ... })
const reader = response.body.getReader()
let assistantMessage = ''

while (true) {
  const { done, value } = await reader.read()
  if (done) break

  const text = new TextDecoder().decode(value)
  const lines = text.split('\n')

  for (const line of lines) {
    if (line.startsWith('data:')) {
      const data = JSON.parse(line.slice(5))

      if (data.type === 'text') {
        assistantMessage += data.content
        setStreamingMessage(assistantMessage)
      } else if (data.type === 'tool-call') {
        handleToolCall(data)
      }
    }
  }
}

Migration Plan

Phase 1: Setup (1-2 days)

  • Set up Vercel AI SDK in Next.js app
  • Configure OpenRouter API key
  • Create agent definitions structure
  • Implement agent factory

Phase 2: Core Chat Endpoint (2-3 days)

  • Build /api/chat with Vercel streamText()
  • Test streaming with standard agents
  • Implement RAG tool with Qdrant
  • Test tool calls + streaming together

Phase 3: Morgan Agent (1-2 days)

  • Define create_agent_package tool
  • Test Morgan custom agent creation
  • Verify frontend AgentForgeCard still works

Phase 4: Frontend Streaming (1 day)

  • Update chat interface to handle SSE events
  • Test streaming message display
  • Verify tool call handling

Phase 5: Testing & Deployment (1 day)

  • Unit tests for agent factory + tools
  • Integration tests for chat endpoint
  • Deploy to Cloudflare
  • Smoke test all agents

Phase 6: Cleanup (1 day)

  • Remove n8n webhook references
  • Update environment variable docs
  • Archive old API routes

Total Estimate: 1-1.5 weeks


Success Criteria

  • All standard agents stream responses naturally
  • Tool calls appear as first-class events (not JSON strings)
  • Morgan creates custom agents with streaming
  • Frontend displays streaming text + tool calls without jank
  • RAG queries return relevant results
  • Custom agents persist across page reloads
  • Deployment to Cloudflare Workers succeeds
  • No performance regression vs. n8n (ideally faster)

Design Decisions (Locked)

  1. Custom Agent Storage: localStorage only

    • Future: Can migrate to Cloudflare KV for persistence/multi-device sync
    • For now: Simple, no server-side state needed
  2. Model Selection: Single model configured via environment variable

    • All agents use OPENROUTER_MODEL (default: openai/gpt-oss-120b)
    • Easy to change globally without redeploying agent definitions
    • Per-agent model selection not needed at launch
  3. Embedding Model: OpenRouter's text-embedding-3-large

    • Used for Qdrant RAG queries
    • Routed through OpenRouter API (same auth key as LLM)
    • Verify OpenRouter has this model available

Open Questions

  1. Error Handling: How to handle OpenRouter rate limits or timeouts?
    • Recommendation: Graceful error responses, message queuing in localStorage

Dependencies

  • ai (Vercel AI SDK) - Core agent framework
  • @ai-sdk/openrouter (OpenRouter provider for Vercel AI SDK)
  • zod (tool parameters validation)
  • @qdrant/js-client-rest (Qdrant vector DB client)
  • next 15.5.4 (existing)
  • uuid (for custom agent IDs)

Risks & Mitigations

Risk Mitigation
OpenRouter API key exposure Cloudflare Workers KV for secrets, never client-side
Token limit errors from large messages Implement message compression + context window management
Qdrant downtime breaks RAG Graceful fallback (agent responds without RAG context)
Breaking streaming changes Comprehensive integration tests before deployment