# WebSocket Debugging Status ## ✅ What's Working 1. **App loads without errors** - Fixed `__name is not defined` with polyfill in layout.tsx 2. **Model selection** - Dropdown populated with OpenRouter models 3. **HTTP API routes** - All working: - `/api/agent/[runId]/start` → 200 ✅ - `/api/agent/[runId]/status` → 200 ✅ - `/api/agent/[runId]/pause` → 200 ✅ - `/api/agent/[runId]/resume` → 200 ✅ 4. **Durable Object HTTP** - DO responds to HTTP requests correctly 5. **UI state updates** - Status changes from IDLE → RUNNING, agent message appears ## ❌ What's Broken **WebSocket connection fails with 500 error during handshake** ### Error Details ``` WebSocket connection to 'wss://bandit-runner-app.nicholaivogelfilms.workers.dev/api/agent/run-XXX/ws' failed: Error during WebSocket handshake: Unexpected response code: 500 ``` ### Test Results | Test | Result | Details | |------|--------|---------| | curl with WS headers | 426 | Returns "Expected Upgrade: websocket" | | Browser WebSocket | 500 | Handshake fails | | DO `/status` endpoint | 200 | DO is accessible | ## Code Analysis ### /ws Route (`src/app/api/agent/[runId]/ws/route.ts`) - ✅ Checks for `Upgrade: websocket` header - ✅ Gets DO stub correctly - ✅ Forwards request to DO - ⚠️ **curl gets 426, browser gets 500** - different behavior! ### Durable Object WebSocket Code ```javascript // In patch-worker.js (deployed to .open-next/worker.js) if (request.headers.get("Upgrade") === "websocket") { const pair = new WebSocketPair(); const [client, server] = Object.values(pair); this.ctx.acceptWebSocket(server); // ✅ Modern Hibernatable API return new Response(null, { status: 101, webSocket: client }); } // WebSocket handler methods exist: async webSocketMessage(ws, message) { ... } async webSocketClose(ws, code, reason, wasClean) { ... } async webSocketError(ws, error) { ... } ``` ### Verified Deployed Code - ✅ Polyfill at top of worker.js - ✅ `BanditAgentDO` class exported - ✅ WebSocket handling using Hibernatable API - ✅ Handler methods present ## Possible Causes ### 1. **Next.js/OpenNext Middleware Interception** - OpenNext may be intercepting WebSocket upgrades before they reach the route - Middleware might be stripping headers or modifying the request ### 2. **Request Object Compatibility** - `NextRequest` forwarded to DO might not be compatible with DO's `fetch()` - Headers may be lost/modified during forwarding ### 3. **Deployment Issue** - Despite code looking correct, deployed worker may differ - Bundling process may be corrupting WebSocket code ### 4. **Missing Secret** - `OPENROUTER_API_KEY` not set (though this shouldn't affect WS upgrade) ## Next Steps to Try ### Option A: Bypass Next.js Route Entirely Create a direct Worker route handler that doesn't go through Next.js: 1. Add to `wrangler.jsonc`: ```json { "routes": [ { "pattern": "*/ws/*", "custom_domain": false, "zone_name": "your-domain.com" } ] } ``` 2. Create Worker-native WebSocket handler ### Option B: Use Service Bindings Instead of routing through Next.js, create a Service Binding to the DO: ```json { "services": [ { "binding": "WS_SERVICE", "service": "websocket-handler", "environment": "production" } ] } ``` ### Option C: Deploy Separate DO Worker (RECOMMENDED) As outlined in the plan - this guarantees no Next.js interference: ```bash # 1. Deploy standalone DO worker cd workers/bandit-agent-do wrangler deploy # 2. Update main wrangler.jsonc { "durable_objects": { "bindings": [{ "name": "BANDIT_AGENT", "class_name": "BanditAgentDO", "script_name": "bandit-agent-do" // External worker }] } } # 3. Remove patch script from deploy process ``` ### Option D: Add Debug Logging and Re-test - Deploy with comprehensive logging - Use `wrangler tail` to capture actual request/response - Identify exact failure point ## Current Theory **Most Likely**: Next.js/OpenNext is incompatible with WebSocket upgrades in API routes. The framework expects HTTP responses, not protocol upgrades. This is a known limitation in serverless environments. **Evidence**: - curl (bypassing Next.js routing somehow) gets 426 - Browser (going through full Next.js stack) gets 500 - HTTP routes work fine (standard request/response) - WebSocket routes fail (protocol upgrade) ## Recommendation **Proceed with Option C** (Separate DO Worker) as it: 1. Completely bypasses Next.js/OpenNext 2. Uses Cloudflare's recommended architecture 3. Matches the plan we already created 4. Eliminates all bundling/compatibility issues 5. Provides independent deployment and debugging The inline DO + patch script approach was worth trying, but WebSocket upgrades likely need a native Worker environment, not a Next.js API route.