diff --git a/WEBSOCKET-DEBUG-STATUS.md b/WEBSOCKET-DEBUG-STATUS.md new file mode 100644 index 0000000..5205ee7 --- /dev/null +++ b/WEBSOCKET-DEBUG-STATUS.md @@ -0,0 +1,162 @@ +# WebSocket Debugging Status + +## ✅ What's Working + +1. **App loads without errors** - Fixed `__name is not defined` with polyfill in layout.tsx +2. **Model selection** - Dropdown populated with OpenRouter models +3. **HTTP API routes** - All working: + - `/api/agent/[runId]/start` → 200 ✅ + - `/api/agent/[runId]/status` → 200 ✅ + - `/api/agent/[runId]/pause` → 200 ✅ + - `/api/agent/[runId]/resume` → 200 ✅ +4. **Durable Object HTTP** - DO responds to HTTP requests correctly +5. **UI state updates** - Status changes from IDLE → RUNNING, agent message appears + +## ❌ What's Broken + +**WebSocket connection fails with 500 error during handshake** + +### Error Details +``` +WebSocket connection to 'wss://bandit-runner-app.nicholaivogelfilms.workers.dev/api/agent/run-XXX/ws' +failed: Error during WebSocket handshake: Unexpected response code: 500 +``` + +### Test Results + +| Test | Result | Details | +|------|--------|---------| +| curl with WS headers | 426 | Returns "Expected Upgrade: websocket" | +| Browser WebSocket | 500 | Handshake fails | +| DO `/status` endpoint | 200 | DO is accessible | + +## Code Analysis + +### /ws Route (`src/app/api/agent/[runId]/ws/route.ts`) +- ✅ Checks for `Upgrade: websocket` header +- ✅ Gets DO stub correctly +- ✅ Forwards request to DO +- ⚠️ **curl gets 426, browser gets 500** - different behavior! + +### Durable Object WebSocket Code +```javascript +// In patch-worker.js (deployed to .open-next/worker.js) +if (request.headers.get("Upgrade") === "websocket") { + const pair = new WebSocketPair(); + const [client, server] = Object.values(pair); + this.ctx.acceptWebSocket(server); // ✅ Modern Hibernatable API + return new Response(null, { status: 101, webSocket: client }); +} + +// WebSocket handler methods exist: +async webSocketMessage(ws, message) { ... } +async webSocketClose(ws, code, reason, wasClean) { ... } +async webSocketError(ws, error) { ... } +``` + +### Verified Deployed Code +- ✅ Polyfill at top of worker.js +- ✅ `BanditAgentDO` class exported +- ✅ WebSocket handling using Hibernatable API +- ✅ Handler methods present + +## Possible Causes + +### 1. **Next.js/OpenNext Middleware Interception** +- OpenNext may be intercepting WebSocket upgrades before they reach the route +- Middleware might be stripping headers or modifying the request + +### 2. **Request Object Compatibility** +- `NextRequest` forwarded to DO might not be compatible with DO's `fetch()` +- Headers may be lost/modified during forwarding + +### 3. **Deployment Issue** +- Despite code looking correct, deployed worker may differ +- Bundling process may be corrupting WebSocket code + +### 4. **Missing Secret** +- `OPENROUTER_API_KEY` not set (though this shouldn't affect WS upgrade) + +## Next Steps to Try + +### Option A: Bypass Next.js Route Entirely +Create a direct Worker route handler that doesn't go through Next.js: + +1. Add to `wrangler.jsonc`: +```json +{ + "routes": [ + { + "pattern": "*/ws/*", + "custom_domain": false, + "zone_name": "your-domain.com" + } + ] +} +``` + +2. Create Worker-native WebSocket handler + +### Option B: Use Service Bindings +Instead of routing through Next.js, create a Service Binding to the DO: + +```json +{ + "services": [ + { + "binding": "WS_SERVICE", + "service": "websocket-handler", + "environment": "production" + } + ] +} +``` + +### Option C: Deploy Separate DO Worker (RECOMMENDED) +As outlined in the plan - this guarantees no Next.js interference: + +```bash +# 1. Deploy standalone DO worker +cd workers/bandit-agent-do +wrangler deploy + +# 2. Update main wrangler.jsonc +{ + "durable_objects": { + "bindings": [{ + "name": "BANDIT_AGENT", + "class_name": "BanditAgentDO", + "script_name": "bandit-agent-do" // External worker + }] + } +} + +# 3. Remove patch script from deploy process +``` + +### Option D: Add Debug Logging and Re-test +- Deploy with comprehensive logging +- Use `wrangler tail` to capture actual request/response +- Identify exact failure point + +## Current Theory + +**Most Likely**: Next.js/OpenNext is incompatible with WebSocket upgrades in API routes. The framework expects HTTP responses, not protocol upgrades. This is a known limitation in serverless environments. + +**Evidence**: +- curl (bypassing Next.js routing somehow) gets 426 +- Browser (going through full Next.js stack) gets 500 +- HTTP routes work fine (standard request/response) +- WebSocket routes fail (protocol upgrade) + +## Recommendation + +**Proceed with Option C** (Separate DO Worker) as it: +1. Completely bypasses Next.js/OpenNext +2. Uses Cloudflare's recommended architecture +3. Matches the plan we already created +4. Eliminates all bundling/compatibility issues +5. Provides independent deployment and debugging + +The inline DO + patch script approach was worth trying, but WebSocket upgrades likely need a native Worker environment, not a Next.js API route. + diff --git a/bandit-runner-app/package.json b/bandit-runner-app/package.json index 9910741..dbc2226 100644 --- a/bandit-runner-app/package.json +++ b/bandit-runner-app/package.json @@ -7,7 +7,7 @@ "build": "next build", "start": "next start", "lint": "next lint", - "deploy": "opennextjs-cloudflare build && node scripts/patch-worker.js && opennextjs-cloudflare deploy", + "deploy": "pnpm --filter bandit-agent-do deploy && opennextjs-cloudflare build && node scripts/patch-worker.js && opennextjs-cloudflare deploy", "preview": "opennextjs-cloudflare build && node scripts/patch-worker.js && opennextjs-cloudflare preview", "cf-typegen": "wrangler types --env-interface CloudflareEnv ./cloudflare-env.d.ts" }, diff --git a/bandit-runner-app/pnpm-lock.yaml b/bandit-runner-app/pnpm-lock.yaml index 09c26ad..3f52054 100644 --- a/bandit-runner-app/pnpm-lock.yaml +++ b/bandit-runner-app/pnpm-lock.yaml @@ -178,6 +178,18 @@ importers: specifier: ^4.42.1 version: 4.42.1(@cloudflare/workers-types@4.20251008.0) + workers/bandit-agent-do: + devDependencies: + '@cloudflare/workers-types': + specifier: ^4.20251008.0 + version: 4.20251008.0 + typescript: + specifier: ^5 + version: 5.9.3 + wrangler: + specifier: ^4.42.1 + version: 4.42.1(@cloudflare/workers-types@4.20251008.0) + packages: '@ai-sdk/gateway@1.0.35': diff --git a/bandit-runner-app/pnpm-workspace.yaml b/bandit-runner-app/pnpm-workspace.yaml new file mode 100644 index 0000000..bfc7aa5 --- /dev/null +++ b/bandit-runner-app/pnpm-workspace.yaml @@ -0,0 +1,3 @@ +packages: + - 'workers/*' + diff --git a/bandit-runner-app/scripts/patch-worker.js b/bandit-runner-app/scripts/patch-worker.js index 5ac4eeb..05a5cdf 100644 --- a/bandit-runner-app/scripts/patch-worker.js +++ b/bandit-runner-app/scripts/patch-worker.js @@ -1,288 +1,92 @@ #!/usr/bin/env node /** - * Patch the OpenNext worker to export Durable Objects - * Directly inlines the DO code into the worker + * Patch the OpenNext worker to add WebSocket handling + * Intercepts WebSocket requests before they reach Next.js */ const fs = require('fs') const path = require('path') -console.log('🔨 Patching worker to export Durable Object...') +console.log('🔨 Patching worker to add WebSocket handler...') const workerPath = path.join(__dirname, '../.open-next/worker.js') -const doPath = path.join(__dirname, '../src/lib/durable-objects/BanditAgentDO.ts') if (!fs.existsSync(workerPath)) { console.error('❌ Worker file not found at:', workerPath) process.exit(1) } -if (!fs.existsSync(doPath)) { - console.error('❌ Durable Object file not found at:', doPath) - process.exit(1) -} - // Read worker file let workerContent = fs.readFileSync(workerPath, 'utf-8') // Check if already patched -if (workerContent.includes('export class BanditAgentDO')) { +if (workerContent.includes('// WebSocket Intercept Handler')) { console.log('✅ Worker already patched, skipping') process.exit(0) } -// Read the DO source (not used, but keep for reference) -const doSource = fs.readFileSync(doPath, 'utf-8') - -// Create the DO class inline (minimal working version) -const doCode = ` -// ===== Durable Object: BanditAgentDO ===== - -export class BanditAgentDO { - constructor(ctx, env) { - this.ctx = ctx; - this.env = env; - this.state = null; - this.isRunning = false; - } - - async fetch(request) { - try { - const url = new URL(request.url); - const pathname = url.pathname; - - // Handle WebSocket upgrade using Hibernatable WebSockets API - if (request.headers.get("Upgrade") === "websocket") { - const pair = new WebSocketPair(); - const [client, server] = Object.values(pair); - - // Use modern Hibernatable WebSockets API - this.ctx.acceptWebSocket(server); - - return new Response(null, { status: 101, webSocket: client }); - } - - // Handle HTTP requests - if (pathname.endsWith('/start')) { - const body = await request.json(); - - // Initialize state - this.state = { - runId: body.runId, - modelName: body.modelName, - status: 'running', - currentLevel: body.startLevel || 0, - targetLevel: body.endLevel || 33 - }; - - // Save to storage - await this.ctx.storage.put('state', this.state); - - // Broadcast to WebSocket clients - this.broadcast({ - type: 'agent_message', - data: { - content: \`Run started: \${body.modelName} - Levels \${body.startLevel}-\${body.endLevel}\`, - }, - timestamp: new Date().toISOString() - }); - - // Start agent execution in background - this.runAgent().catch(err => console.error('Agent error:', err)); - - return new Response(JSON.stringify({ - success: true, - runId: body.runId, - state: this.state - }), { - headers: { 'Content-Type': 'application/json' } - }); - } - - if (pathname.endsWith('/pause')) { - if (this.state) { - this.state.status = 'paused'; - this.isRunning = false; - await this.ctx.storage.put('state', this.state); - } - return new Response(JSON.stringify({ success: true, state: this.state }), { - headers: { 'Content-Type': 'application/json' } - }); - } - - if (pathname.endsWith('/resume')) { - if (this.state) { - this.state.status = 'running'; - this.isRunning = true; - await this.ctx.storage.put('state', this.state); - this.runAgent().catch(err => console.error('Agent error:', err)); - } - return new Response(JSON.stringify({ success: true, state: this.state }), { - headers: { 'Content-Type': 'application/json' } - }); - } - - if (pathname.endsWith('/status')) { - return new Response(JSON.stringify({ - state: this.state, - isRunning: this.isRunning, - connectedClients: this.ctx.getWebSockets().length - }), { - headers: { 'Content-Type': 'application/json' } - }); - } - - return new Response('Not found', { status: 404 }); - } catch (error) { - console.error('DO fetch error:', error); - return new Response(JSON.stringify({ error: error.message }), { - status: 500, - headers: { 'Content-Type': 'application/json' } - }); +// Create WebSocket intercept handler +const wsInterceptCode = ` +// WebSocket Intercept Handler +function handleWebSocketUpgrade(request, env) { + const url = new URL(request.url); + const upgradeHeader = request.headers.get('Upgrade'); + + // Check if this is a WebSocket upgrade for agent endpoints + if (upgradeHeader === 'websocket' && url.pathname.includes('/api/agent/') && url.pathname.endsWith('/ws')) { + // Extract runId from path: /api/agent/{runId}/ws + const pathParts = url.pathname.split('/'); + const runIdIndex = pathParts.indexOf('agent') + 1; + const runId = pathParts[runIdIndex]; + + if (runId && env.BANDIT_AGENT) { + // Forward directly to Durable Object + const id = env.BANDIT_AGENT.idFromName(runId); + const stub = env.BANDIT_AGENT.get(id); + return stub.fetch(request); } } - - // Hibernatable WebSockets API handlers - async webSocketMessage(ws, message) { - try { - if (typeof message !== 'string') return; - const data = JSON.parse(message); - if (data.type === 'ping') { - ws.send(JSON.stringify({ type: 'pong', timestamp: new Date().toISOString() })); - } - } catch (error) { - console.error('WebSocket message error:', error); - } - } - - async webSocketClose(ws, code, reason, wasClean) { - console.log(\`WebSocket closed: Code \${code}, Reason: \${reason}, Clean: \${wasClean}\`); - } - - async webSocketError(ws, error) { - console.error('WebSocket error:', error); - } - - async runAgent() { - if (!this.state) return; - this.isRunning = true; - - try { - // Call SSH proxy agent endpoint - const response = await fetch(\`\${this.env.SSH_PROXY_URL}/agent/run\`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify({ - runId: this.state.runId, - modelName: this.state.modelName, - startLevel: this.state.currentLevel, - endLevel: this.state.targetLevel, - apiKey: this.env.OPENROUTER_API_KEY - }) - }); - - // Stream agent events - const reader = response.body.getReader(); - const decoder = new TextDecoder(); - - while (true) { - const { done, value } = await reader.read(); - if (done) break; - - const chunk = decoder.decode(value); - const lines = chunk.split('\\n').filter(l => l.trim()); - - for (const line of lines) { - try { - const event = JSON.parse(line); - this.broadcast(event); - - // Update state based on events - if (event.type === 'level_complete') { - this.state.currentLevel = event.data.level + 1; - } - if (event.type === 'run_complete') { - this.state.status = 'complete'; - this.isRunning = false; - } - if (event.type === 'error') { - this.state.status = 'failed'; - this.state.error = event.data.content; - this.isRunning = false; - } - } catch (e) { - // Ignore parse errors - } - } - } - } catch (error) { - this.state.status = 'failed'; - this.state.error = error.message; - this.isRunning = false; - this.broadcast({ - type: 'error', - data: { content: error.message }, - timestamp: new Date().toISOString() - }); - } - } - - broadcast(event) { - const message = JSON.stringify(event); - const sockets = this.ctx.getWebSockets(); - console.log(\`Broadcasting \${event.type} to \${sockets.length} clients\`); - for (const socket of sockets) { - try { - socket.send(message); - } catch (error) { - console.error('Broadcast error:', error); - } - } - } - - async alarm() { - // Cleanup after 2 hours - if (!this.isRunning && this.state) { - const startedAt = new Date(this.state.startedAt || 0).getTime(); - if (Date.now() - startedAt > 2 * 60 * 60 * 1000) { - await this.ctx.storage.deleteAll(); - this.state = null; - } - } - await this.ctx.storage.setAlarm(Date.now() + 60 * 60 * 1000); - } + + return null; // Not a WebSocket request, continue normal handling } -// ===== End Durable Object ===== -` +`; -// Insert DO code right after the other DO exports -// Find the line with "export { BucketCachePurge }" -const bucketCacheLine = 'export { BucketCachePurge } from "./.build/durable-objects/bucket-cache-purge.js";' -const insertIndex = workerContent.indexOf(bucketCacheLine) - -if (insertIndex === -1) { - console.error('❌ Could not find insertion point in worker.js') - process.exit(1) +// Find where to inject the WebSocket intercept +const fetchFunctionStart = workerContent.indexOf('export default {'); +if (fetchFunctionStart === -1) { + console.error('❌ Could not find export default in worker.js'); + process.exit(1); } -// Insert right after that line -const insertPosition = insertIndex + bucketCacheLine.length +// Find the async fetch function +const asyncFetchStart = workerContent.indexOf('async fetch(request, env, ctx) {', fetchFunctionStart); +if (asyncFetchStart === -1) { + console.error('❌ Could not find async fetch function in worker.js'); + process.exit(1); +} -// Add __name polyfill at the very beginning -const polyfill = ` -// Polyfill for esbuild __name helper -globalThis.__name = globalThis.__name || function(fn, name) { return fn }; -` +// Find the opening brace of the fetch function +const fetchBodyStart = workerContent.indexOf('{', asyncFetchStart) + 1; +// Find the first return statement in the fetch body +const returnStatement = workerContent.indexOf('return', fetchBodyStart); + +// Insert WebSocket intercept at the beginning of fetch, before the return const patchedContent = - polyfill + '\n' + - workerContent.slice(0, insertPosition) + - '\n' + doCode + '\n' + - workerContent.slice(insertPosition) + workerContent.slice(0, fetchBodyStart) + + wsInterceptCode + + ` + // Check for WebSocket upgrades first (before Next.js) + const wsResponse = handleWebSocketUpgrade(request, env); + if (wsResponse) { + return wsResponse; + } + + ` + + workerContent.slice(fetchBodyStart); // Write back -fs.writeFileSync(workerPath, patchedContent, 'utf-8') - -console.log('✅ Worker patched successfully - BanditAgentDO exported') -console.log('📝 Note: Using stub DO implementation. Full LangGraph integration via SSH proxy.') +fs.writeFileSync(workerPath, patchedContent, 'utf-8'); +console.log('✅ Worker patched successfully - WebSocket handler added'); +console.log('📝 Note: WebSocket requests now bypass Next.js and go directly to DO'); diff --git a/bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts b/bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts index 125b843..b7d009d 100644 --- a/bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts +++ b/bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts @@ -20,9 +20,13 @@ export async function GET( { params }: { params: { runId: string } } ) { const runId = params.runId + console.log('[WS Route] Incoming request for runId:', runId) + console.log('[WS Route] Headers:', Object.fromEntries(request.headers.entries())) + const { env } = await getCloudflareContext() if (!env?.BANDIT_AGENT) { + console.error('[WS Route] Durable Object binding not found') return new Response("Durable Object binding not found", { status: 500 }) } @@ -32,14 +36,20 @@ export async function GET( // Create a new request with WebSocket upgrade headers const upgradeHeader = request.headers.get('Upgrade') + console.log('[WS Route] Upgrade header:', upgradeHeader) + if (!upgradeHeader || upgradeHeader !== 'websocket') { + console.log('[WS Route] Invalid upgrade header, returning 426') return new Response('Expected Upgrade: websocket', { status: 426 }) } + console.log('[WS Route] Forwarding to DO...') // Forward the request to DO - return await stub.fetch(request) + const response = await stub.fetch(request) + console.log('[WS Route] DO response status:', response.status) + return response } catch (error) { - console.error('WebSocket upgrade error:', error) + console.error('[WS Route] WebSocket upgrade error:', error) return new Response( error instanceof Error ? error.message : 'Unknown error', { status: 500 } diff --git a/bandit-runner-app/workers/bandit-agent-do/package.json b/bandit-runner-app/workers/bandit-agent-do/package.json new file mode 100644 index 0000000..ef34f22 --- /dev/null +++ b/bandit-runner-app/workers/bandit-agent-do/package.json @@ -0,0 +1,16 @@ +{ + "name": "bandit-agent-do", + "version": "1.0.0", + "private": true, + "type": "module", + "scripts": { + "deploy": "wrangler deploy", + "tail": "wrangler tail" + }, + "devDependencies": { + "@cloudflare/workers-types": "^4.20251008.0", + "typescript": "^5", + "wrangler": "^4.42.1" + } +} + diff --git a/bandit-runner-app/workers/bandit-agent-do/tsconfig.json b/bandit-runner-app/workers/bandit-agent-do/tsconfig.json new file mode 100644 index 0000000..cd977b4 --- /dev/null +++ b/bandit-runner-app/workers/bandit-agent-do/tsconfig.json @@ -0,0 +1,14 @@ +{ + "compilerOptions": { + "target": "ES2020", + "module": "ES2020", + "lib": ["ES2020"], + "moduleResolution": "node", + "types": ["@cloudflare/workers-types"], + "strict": true, + "skipLibCheck": true, + "esModuleInterop": true + }, + "include": ["src/**/*"] +} + diff --git a/bandit-runner-app/workers/bandit-agent-do/wrangler.toml b/bandit-runner-app/workers/bandit-agent-do/wrangler.toml new file mode 100644 index 0000000..4af88d8 --- /dev/null +++ b/bandit-runner-app/workers/bandit-agent-do/wrangler.toml @@ -0,0 +1,19 @@ +name = "bandit-agent-do" +main = "src/index.ts" +compatibility_date = "2024-01-01" +account_id = "a19f770b9be1b20e78b8d25bdcfd3bbd" + +[durable_objects] +bindings = [ + { name = "BANDIT_AGENT", class_name = "BanditAgentDO" } +] + +[[migrations]] +tag = "v1" +new_sqlite_classes = ["BanditAgentDO"] + +[vars] +SSH_PROXY_URL = "https://bandit-ssh-proxy.fly.dev" +MAX_RUN_DURATION_MINUTES = "60" +MAX_RETRIES_PER_LEVEL = "3" + diff --git a/bandit-runner-app/wrangler.jsonc b/bandit-runner-app/wrangler.jsonc index 53015bc..cc3883a 100644 --- a/bandit-runner-app/wrangler.jsonc +++ b/bandit-runner-app/wrangler.jsonc @@ -27,16 +27,11 @@ "bindings": [ { "name": "BANDIT_AGENT", - "class_name": "BanditAgentDO" + "class_name": "BanditAgentDO", + "script_name": "bandit-agent-do" } ] }, - "migrations": [ - { - "tag": "v1", - "new_sqlite_classes": ["BanditAgentDO"] - } - ], /** * Environment Variables * https://developers.cloudflare.com/workers/wrangler/configuration/#environment-variables