bandit-runner/docs/development_documentation/CORE-FUNCTIONALITY-STATUS.md

# Core Functionality Implementation Status

**Date**: 2025-10-09
**Priority**: CRITICAL PATH - Making the app actually work

## 🎯 Goal

Enable end-to-end agent execution: User clicks START → WebSocket connects → Agent runs → SSH commands execute → Terminal and Chat show real output

## ✅ Completed

### 1. Durable Object WebSocket Handling
**File**: `bandit-runner-app/src/lib/durable-objects/BanditAgentDO.ts`

**Changes Made**:
- ✅ Accepts WebSocket upgrades properly
- ✅ Manages WebSocket connections in a Set
- ✅ Calls SSH proxy `/agent/run` endpoint via HTTP
- ✅ Streams JSONL events from SSH proxy
- ✅ Broadcasts events to all connected WebSocket clients
- ✅ Updates DO state based on events (level_complete, error, run_complete)
- ✅ Removed broken LangGraph-in-DO code
- ✅ Clean separation: DO = coordinator, SSH Proxy = executor

**Key Implementation**:
```typescript
private async runAgentViaProxy(config: RunConfig) {
  // Call SSH proxy
  const response = await fetch(`${SSH_PROXY_URL}/agent/run`, {...})

  // Stream JSONL events
  const reader = response.body?.getReader()
  while (true) {
    const { done, value } = await reader.read()
    // Parse JSONL lines
    // Broadcast to WebSocket clients
    this.broadcast(event)
  }
}
```

### 2. SSH Connection in Agent
**File**: `ssh-proxy/agent.ts`

**Changes Made**:
- ✅ Added SSH connection logic in `planLevel` node
- ✅ Connects to `bandit.labs.overthewire.org:2220`
- ✅ Uses correct username (`bandit0`, `bandit1`, etc.)
- ✅ Stores connection ID in state
- ✅ Reuses connection across commands

**Key Code**:
```typescript
if (!sshConnectionId) {
  const connectResponse = await fetch(`${sshProxyUrl}/ssh/connect`, {
    method: 'POST',
    body: JSON.stringify({
      host: 'bandit.labs.overthewire.org',
      port: 2220,
      username: `bandit${currentLevel}`,
      password: currentPassword,
    }),
  })
  // Store connectionId in state
}
```

### 3. WebSocket Route
**File**: `bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts`

**Status**: ✅ Already correct
- Forwards WebSocket upgrades to Durable Object
- Passes all headers through
- Error handling in place

### 4. Worker Patch Script
**File**: `bandit-runner-app/scripts/patch-worker.js`

**Status**: ✅ Already has correct implementation
- Inlines DO code into `.open-next/worker.js`
- Includes `runAgent()` method that streams from SSH proxy
- Broadcasts events to WebSocket clients
- Exports `BanditAgentDO` class

### 5. Event Handlers
**Files**:
- `bandit-runner-app/src/lib/websocket/agent-events.ts`
- `bandit-runner-app/src/hooks/useAgentWebSocket.ts`

**Status**: ✅ Already implemented
- `handleAgentEvent` processes all event types
- Terminal lines updated from `terminal_output` events
- Chat messages updated from `agent_message` and `thinking` events
- ANSI rendering ready with `dangerouslySetInnerHTML`

## 🚧 In Progress / Needs Testing

### 1. Deploy and Test
**Next Steps**:
```bash
cd bandit-runner-app
pnpm run deploy  # Builds, patches worker, deploys
```

**What to Test**:
1. Open https://bandit-runner-app.nicholaivogelfilms.workers.dev/
2. Click START button
3. Check browser DevTools → Network → WS tab
4. Verify WebSocket connection established
5. Watch for events flowing
6. Check Terminal panel for SSH output
7. Check Chat panel for LLM reasoning

### 2. SSH Proxy Environment Variable
**File**: `ssh-proxy/agent.ts`

**Issue**: Calls `http://localhost:3001` for SSH proxy
**Fix Needed**: Should call own endpoints (they're in the same service)

**Solution**:
```typescript
// In ssh-proxy/agent.ts executeCommand():
const sshProxyUrl = 'http://localhost:3001'  // Same service!
```

This is actually correct since the SSH proxy calls its own `/ssh/connect` and `/ssh/exec` endpoints.

## ❌ Known Issues

### 1. Model Search Filtering
**File**: `bandit-runner-app/src/components/agent-control-panel.tsx`

**Issue**: Search box doesn't filter models
**Priority**: Low (UI polish, not critical path)
**Fix**: Add `keywords` prop to CommandItem

### 2. Missing Error Recovery
**File**: `ssh-proxy/agent.ts`

**Issue**: No retry logic in agent
**Priority**: Medium
**Impact**: Agent will fail on transient errors

**Solution Needed**:
- Add retry count tracking
- Exponential backoff
- Max retries per level (already in state)

## 📋 Testing Checklist

### Critical Path (MUST WORK)
- [ ] User clicks START
- [ ] WebSocket connects (check DevTools)
- [ ] SSH connection established (check terminal for connection message)
- [ ] LLM generates reasoning (check chat panel)
- [ ] SSH command executes (check terminal for `$ cat readme`)
- [ ] Command output appears (check terminal for readme contents)
- [ ] Password extracted
- [ ] Level advances

### Nice to Have
- [ ] ANSI colors render correctly
- [ ] Manual mode works
- [ ] Pause/resume works
- [ ] Error messages display properly

## 🏗️ Architecture Flow

```
1. User clicks START
   ↓
2. Frontend: handleStartRun() → fetch('/api/agent/run-123/start')
   ↓
3. API Route: → DO.fetch('/start')
   ↓
4. Durable Object:
   - Initialize state
   - runAgentViaProxy()
   - fetch('https://bandit-ssh-proxy.fly.dev/agent/run')
   ↓
5. SSH Proxy (/agent/run):
   - Create BanditAgent
   - agent.run() starts LangGraph
   - Stream JSONL events back
   ↓
6. Durable Object:
   - Read JSONL stream
   - broadcast(event) to WebSocket clients
   ↓
7. Frontend WebSocket:
   - Receive events
   - handleAgentEvent()
   - Update terminal lines
   - Update chat messages
   ↓
8. User sees:
   - Terminal: "$ cat readme" + output
   - Chat: "Planning: [LLM reasoning]"
```

## 🔧 Environment Variables Required

### Frontend (.dev.vars)
```env
OPENROUTER_API_KEY=sk-or-...
SSH_PROXY_URL=https://bandit-ssh-proxy.fly.dev
```

### SSH Proxy (.env or Fly.io secrets)
```env
PORT=3001
```

## 🚀 Deployment Commands

### Deploy Frontend
```bash
cd bandit-runner-app
pnpm run deploy  # OpenNext build + patch + deploy
```

### Deploy SSH Proxy (if needed)
```bash
cd ssh-proxy
flyctl deploy
```

## 📊 Success Metrics

**The app is working when you see this flow**:

1. Click START
2. Chat: "Starting run - Level 0 to 5 using openai/gpt-4o-mini"
3. Chat: "Planning: I need to read the readme file..."
4. Terminal: "$ cat readme"
5. Terminal: "Congratulations on your first steps into..."
6. Chat: "Password found: [32-char password]"
7. Terminal: "$ ssh bandit1@bandit.labs.overthewire.org"
8. Chat: "Planning: Now on level 1..."

**If you see all 8 steps, the core functionality is WORKING** 🎉

## 🐛 Debugging

### WebSocket Not Connecting
1. Check browser DevTools → Network → WS filter
2. Look for `/api/agent/run-xxx/ws`
3. Check status: should be 101 Switching Protocols
4. If 500: Check Durable Object is exported
5. If 404: Check route.ts exists

### No Terminal Output
1. Open browser console
2. Look for WebSocket messages
3. Check if events are being received
4. Check `useAgentWebSocket` is processing events
5. Check `wsTerminalLines` is being rendered

### No Chat Messages
1. Same as terminal debugging
2. Check `agent_message` and `thinking` events
3. Check `wsChatMessages` state
4. Verify `handleAgentEvent` case statements

### SSH Connection Fails
1. Check SSH proxy logs: `flyctl logs -a bandit-ssh-proxy`
2. Verify password is correct (bandit0 for level 0)
3. Check Bandit server is accessible
4. Test manually: `ssh bandit0@bandit.labs.overthewire.org -p 2220`

## 📝 Next Steps

1. **Deploy and test** - Most critical
2. **Fix any deployment issues**
3. **Test end-to-end flow**
4. **Add error recovery** - Medium priority
5. **Polish UI** - Low priority (model search, etc.)

## 💡 Key Insights

**What Changed from Original Plan**:
- ❌ Running LangGraph in DO doesn't work (Node.js APIs needed)
- ✅ SSH Proxy runs full LangGraph agent
- ✅ DO is lightweight coordinator + WebSocket server
- ✅ JSONL streaming over HTTP works great
- ✅ Architecture is correct and deployable

**Why This Works**:
- Durable Objects are perfect for WebSocket management
- SSH Proxy (Node.js on Fly.io) can run LangGraph
- HTTP streaming is simpler than complex DO↔Worker communication
- Clean separation of concerns

---

**Status**: Ready for deployment and testing
**Risk**: Medium (untested in production)
**Confidence**: High (architecture is sound)