bandit-runner/docs/development_documentation/CORE-FUNCTIONALITY-STATUS.md
2025-10-09 22:03:37 -06:00

8.2 KiB

Core Functionality Implementation Status

Date: 2025-10-09 Priority: CRITICAL PATH - Making the app actually work

🎯 Goal

Enable end-to-end agent execution: User clicks START → WebSocket connects → Agent runs → SSH commands execute → Terminal and Chat show real output

Completed

1. Durable Object WebSocket Handling

File: bandit-runner-app/src/lib/durable-objects/BanditAgentDO.ts

Changes Made:

  • Accepts WebSocket upgrades properly
  • Manages WebSocket connections in a Set
  • Calls SSH proxy /agent/run endpoint via HTTP
  • Streams JSONL events from SSH proxy
  • Broadcasts events to all connected WebSocket clients
  • Updates DO state based on events (level_complete, error, run_complete)
  • Removed broken LangGraph-in-DO code
  • Clean separation: DO = coordinator, SSH Proxy = executor

Key Implementation:

private async runAgentViaProxy(config: RunConfig) {
  // Call SSH proxy
  const response = await fetch(`${SSH_PROXY_URL}/agent/run`, {...})
  
  // Stream JSONL events
  const reader = response.body?.getReader()
  while (true) {
    const { done, value } = await reader.read()
    // Parse JSONL lines
    // Broadcast to WebSocket clients
    this.broadcast(event)
  }
}

2. SSH Connection in Agent

File: ssh-proxy/agent.ts

Changes Made:

  • Added SSH connection logic in planLevel node
  • Connects to bandit.labs.overthewire.org:2220
  • Uses correct username (bandit0, bandit1, etc.)
  • Stores connection ID in state
  • Reuses connection across commands

Key Code:

if (!sshConnectionId) {
  const connectResponse = await fetch(`${sshProxyUrl}/ssh/connect`, {
    method: 'POST',
    body: JSON.stringify({
      host: 'bandit.labs.overthewire.org',
      port: 2220,
      username: `bandit${currentLevel}`,
      password: currentPassword,
    }),
  })
  // Store connectionId in state
}

3. WebSocket Route

File: bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts

Status: Already correct

  • Forwards WebSocket upgrades to Durable Object
  • Passes all headers through
  • Error handling in place

4. Worker Patch Script

File: bandit-runner-app/scripts/patch-worker.js

Status: Already has correct implementation

  • Inlines DO code into .open-next/worker.js
  • Includes runAgent() method that streams from SSH proxy
  • Broadcasts events to WebSocket clients
  • Exports BanditAgentDO class

5. Event Handlers

Files:

  • bandit-runner-app/src/lib/websocket/agent-events.ts
  • bandit-runner-app/src/hooks/useAgentWebSocket.ts

Status: Already implemented

  • handleAgentEvent processes all event types
  • Terminal lines updated from terminal_output events
  • Chat messages updated from agent_message and thinking events
  • ANSI rendering ready with dangerouslySetInnerHTML

🚧 In Progress / Needs Testing

1. Deploy and Test

Next Steps:

cd bandit-runner-app
pnpm run deploy  # Builds, patches worker, deploys

What to Test:

  1. Open https://bandit-runner-app.nicholaivogelfilms.workers.dev/
  2. Click START button
  3. Check browser DevTools → Network → WS tab
  4. Verify WebSocket connection established
  5. Watch for events flowing
  6. Check Terminal panel for SSH output
  7. Check Chat panel for LLM reasoning

2. SSH Proxy Environment Variable

File: ssh-proxy/agent.ts

Issue: Calls http://localhost:3001 for SSH proxy
Fix Needed: Should call own endpoints (they're in the same service)

Solution:

// In ssh-proxy/agent.ts executeCommand():
const sshProxyUrl = 'http://localhost:3001'  // Same service!

This is actually correct since the SSH proxy calls its own /ssh/connect and /ssh/exec endpoints.

Known Issues

1. Model Search Filtering

File: bandit-runner-app/src/components/agent-control-panel.tsx

Issue: Search box doesn't filter models
Priority: Low (UI polish, not critical path)
Fix: Add keywords prop to CommandItem

2. Missing Error Recovery

File: ssh-proxy/agent.ts

Issue: No retry logic in agent
Priority: Medium
Impact: Agent will fail on transient errors

Solution Needed:

  • Add retry count tracking
  • Exponential backoff
  • Max retries per level (already in state)

📋 Testing Checklist

Critical Path (MUST WORK)

  • User clicks START
  • WebSocket connects (check DevTools)
  • SSH connection established (check terminal for connection message)
  • LLM generates reasoning (check chat panel)
  • SSH command executes (check terminal for $ cat readme)
  • Command output appears (check terminal for readme contents)
  • Password extracted
  • Level advances

Nice to Have

  • ANSI colors render correctly
  • Manual mode works
  • Pause/resume works
  • Error messages display properly

🏗️ Architecture Flow

1. User clicks START
   ↓
2. Frontend: handleStartRun() → fetch('/api/agent/run-123/start')
   ↓
3. API Route: → DO.fetch('/start')
   ↓
4. Durable Object: 
   - Initialize state
   - runAgentViaProxy()
   - fetch('https://bandit-ssh-proxy.fly.dev/agent/run')
   ↓
5. SSH Proxy (/agent/run):
   - Create BanditAgent
   - agent.run() starts LangGraph
   - Stream JSONL events back
   ↓
6. Durable Object:
   - Read JSONL stream
   - broadcast(event) to WebSocket clients
   ↓
7. Frontend WebSocket:
   - Receive events
   - handleAgentEvent()
   - Update terminal lines
   - Update chat messages
   ↓
8. User sees:
   - Terminal: "$ cat readme" + output
   - Chat: "Planning: [LLM reasoning]"

🔧 Environment Variables Required

Frontend (.dev.vars)

OPENROUTER_API_KEY=sk-or-...
SSH_PROXY_URL=https://bandit-ssh-proxy.fly.dev

SSH Proxy (.env or Fly.io secrets)

PORT=3001

🚀 Deployment Commands

Deploy Frontend

cd bandit-runner-app
pnpm run deploy  # OpenNext build + patch + deploy

Deploy SSH Proxy (if needed)

cd ssh-proxy
flyctl deploy

📊 Success Metrics

The app is working when you see this flow:

  1. Click START
  2. Chat: "Starting run - Level 0 to 5 using openai/gpt-4o-mini"
  3. Chat: "Planning: I need to read the readme file..."
  4. Terminal: "$ cat readme"
  5. Terminal: "Congratulations on your first steps into..."
  6. Chat: "Password found: [32-char password]"
  7. Terminal: "$ ssh bandit1@bandit.labs.overthewire.org"
  8. Chat: "Planning: Now on level 1..."

If you see all 8 steps, the core functionality is WORKING 🎉

🐛 Debugging

WebSocket Not Connecting

  1. Check browser DevTools → Network → WS filter
  2. Look for /api/agent/run-xxx/ws
  3. Check status: should be 101 Switching Protocols
  4. If 500: Check Durable Object is exported
  5. If 404: Check route.ts exists

No Terminal Output

  1. Open browser console
  2. Look for WebSocket messages
  3. Check if events are being received
  4. Check useAgentWebSocket is processing events
  5. Check wsTerminalLines is being rendered

No Chat Messages

  1. Same as terminal debugging
  2. Check agent_message and thinking events
  3. Check wsChatMessages state
  4. Verify handleAgentEvent case statements

SSH Connection Fails

  1. Check SSH proxy logs: flyctl logs -a bandit-ssh-proxy
  2. Verify password is correct (bandit0 for level 0)
  3. Check Bandit server is accessible
  4. Test manually: ssh bandit0@bandit.labs.overthewire.org -p 2220

📝 Next Steps

  1. Deploy and test - Most critical
  2. Fix any deployment issues
  3. Test end-to-end flow
  4. Add error recovery - Medium priority
  5. Polish UI - Low priority (model search, etc.)

💡 Key Insights

What Changed from Original Plan:

  • Running LangGraph in DO doesn't work (Node.js APIs needed)
  • SSH Proxy runs full LangGraph agent
  • DO is lightweight coordinator + WebSocket server
  • JSONL streaming over HTTP works great
  • Architecture is correct and deployable

Why This Works:

  • Durable Objects are perfect for WebSocket management
  • SSH Proxy (Node.js on Fly.io) can run LangGraph
  • HTTP streaming is simpler than complex DO↔Worker communication
  • Clean separation of concerns

Status: Ready for deployment and testing Risk: Medium (untested in production) Confidence: High (architecture is sound)