163 lines
4.7 KiB
Markdown
163 lines
4.7 KiB
Markdown
# WebSocket Debugging Status
|
|
|
|
## ✅ What's Working
|
|
|
|
1. **App loads without errors** - Fixed `__name is not defined` with polyfill in layout.tsx
|
|
2. **Model selection** - Dropdown populated with OpenRouter models
|
|
3. **HTTP API routes** - All working:
|
|
- `/api/agent/[runId]/start` → 200 ✅
|
|
- `/api/agent/[runId]/status` → 200 ✅
|
|
- `/api/agent/[runId]/pause` → 200 ✅
|
|
- `/api/agent/[runId]/resume` → 200 ✅
|
|
4. **Durable Object HTTP** - DO responds to HTTP requests correctly
|
|
5. **UI state updates** - Status changes from IDLE → RUNNING, agent message appears
|
|
|
|
## ❌ What's Broken
|
|
|
|
**WebSocket connection fails with 500 error during handshake**
|
|
|
|
### Error Details
|
|
```
|
|
WebSocket connection to 'wss://bandit-runner-app.nicholaivogelfilms.workers.dev/api/agent/run-XXX/ws'
|
|
failed: Error during WebSocket handshake: Unexpected response code: 500
|
|
```
|
|
|
|
### Test Results
|
|
|
|
| Test | Result | Details |
|
|
|------|--------|---------|
|
|
| curl with WS headers | 426 | Returns "Expected Upgrade: websocket" |
|
|
| Browser WebSocket | 500 | Handshake fails |
|
|
| DO `/status` endpoint | 200 | DO is accessible |
|
|
|
|
## Code Analysis
|
|
|
|
### /ws Route (`src/app/api/agent/[runId]/ws/route.ts`)
|
|
- ✅ Checks for `Upgrade: websocket` header
|
|
- ✅ Gets DO stub correctly
|
|
- ✅ Forwards request to DO
|
|
- ⚠️ **curl gets 426, browser gets 500** - different behavior!
|
|
|
|
### Durable Object WebSocket Code
|
|
```javascript
|
|
// In patch-worker.js (deployed to .open-next/worker.js)
|
|
if (request.headers.get("Upgrade") === "websocket") {
|
|
const pair = new WebSocketPair();
|
|
const [client, server] = Object.values(pair);
|
|
this.ctx.acceptWebSocket(server); // ✅ Modern Hibernatable API
|
|
return new Response(null, { status: 101, webSocket: client });
|
|
}
|
|
|
|
// WebSocket handler methods exist:
|
|
async webSocketMessage(ws, message) { ... }
|
|
async webSocketClose(ws, code, reason, wasClean) { ... }
|
|
async webSocketError(ws, error) { ... }
|
|
```
|
|
|
|
### Verified Deployed Code
|
|
- ✅ Polyfill at top of worker.js
|
|
- ✅ `BanditAgentDO` class exported
|
|
- ✅ WebSocket handling using Hibernatable API
|
|
- ✅ Handler methods present
|
|
|
|
## Possible Causes
|
|
|
|
### 1. **Next.js/OpenNext Middleware Interception**
|
|
- OpenNext may be intercepting WebSocket upgrades before they reach the route
|
|
- Middleware might be stripping headers or modifying the request
|
|
|
|
### 2. **Request Object Compatibility**
|
|
- `NextRequest` forwarded to DO might not be compatible with DO's `fetch()`
|
|
- Headers may be lost/modified during forwarding
|
|
|
|
### 3. **Deployment Issue**
|
|
- Despite code looking correct, deployed worker may differ
|
|
- Bundling process may be corrupting WebSocket code
|
|
|
|
### 4. **Missing Secret**
|
|
- `OPENROUTER_API_KEY` not set (though this shouldn't affect WS upgrade)
|
|
|
|
## Next Steps to Try
|
|
|
|
### Option A: Bypass Next.js Route Entirely
|
|
Create a direct Worker route handler that doesn't go through Next.js:
|
|
|
|
1. Add to `wrangler.jsonc`:
|
|
```json
|
|
{
|
|
"routes": [
|
|
{
|
|
"pattern": "*/ws/*",
|
|
"custom_domain": false,
|
|
"zone_name": "your-domain.com"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
2. Create Worker-native WebSocket handler
|
|
|
|
### Option B: Use Service Bindings
|
|
Instead of routing through Next.js, create a Service Binding to the DO:
|
|
|
|
```json
|
|
{
|
|
"services": [
|
|
{
|
|
"binding": "WS_SERVICE",
|
|
"service": "websocket-handler",
|
|
"environment": "production"
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### Option C: Deploy Separate DO Worker (RECOMMENDED)
|
|
As outlined in the plan - this guarantees no Next.js interference:
|
|
|
|
```bash
|
|
# 1. Deploy standalone DO worker
|
|
cd workers/bandit-agent-do
|
|
wrangler deploy
|
|
|
|
# 2. Update main wrangler.jsonc
|
|
{
|
|
"durable_objects": {
|
|
"bindings": [{
|
|
"name": "BANDIT_AGENT",
|
|
"class_name": "BanditAgentDO",
|
|
"script_name": "bandit-agent-do" // External worker
|
|
}]
|
|
}
|
|
}
|
|
|
|
# 3. Remove patch script from deploy process
|
|
```
|
|
|
|
### Option D: Add Debug Logging and Re-test
|
|
- Deploy with comprehensive logging
|
|
- Use `wrangler tail` to capture actual request/response
|
|
- Identify exact failure point
|
|
|
|
## Current Theory
|
|
|
|
**Most Likely**: Next.js/OpenNext is incompatible with WebSocket upgrades in API routes. The framework expects HTTP responses, not protocol upgrades. This is a known limitation in serverless environments.
|
|
|
|
**Evidence**:
|
|
- curl (bypassing Next.js routing somehow) gets 426
|
|
- Browser (going through full Next.js stack) gets 500
|
|
- HTTP routes work fine (standard request/response)
|
|
- WebSocket routes fail (protocol upgrade)
|
|
|
|
## Recommendation
|
|
|
|
**Proceed with Option C** (Separate DO Worker) as it:
|
|
1. Completely bypasses Next.js/OpenNext
|
|
2. Uses Cloudflare's recommended architecture
|
|
3. Matches the plan we already created
|
|
4. Eliminates all bundling/compatibility issues
|
|
5. Provides independent deployment and debugging
|
|
|
|
The inline DO + patch script approach was worth trying, but WebSocket upgrades likely need a native Worker environment, not a Next.js API route.
|
|
|