301 lines
8.2 KiB
Markdown
301 lines
8.2 KiB
Markdown
# Core Functionality Implementation Status
|
|
|
|
**Date**: 2025-10-09
|
|
**Priority**: CRITICAL PATH - Making the app actually work
|
|
|
|
## 🎯 Goal
|
|
|
|
Enable end-to-end agent execution: User clicks START → WebSocket connects → Agent runs → SSH commands execute → Terminal and Chat show real output
|
|
|
|
## ✅ Completed
|
|
|
|
### 1. Durable Object WebSocket Handling
|
|
**File**: `bandit-runner-app/src/lib/durable-objects/BanditAgentDO.ts`
|
|
|
|
**Changes Made**:
|
|
- ✅ Accepts WebSocket upgrades properly
|
|
- ✅ Manages WebSocket connections in a Set
|
|
- ✅ Calls SSH proxy `/agent/run` endpoint via HTTP
|
|
- ✅ Streams JSONL events from SSH proxy
|
|
- ✅ Broadcasts events to all connected WebSocket clients
|
|
- ✅ Updates DO state based on events (level_complete, error, run_complete)
|
|
- ✅ Removed broken LangGraph-in-DO code
|
|
- ✅ Clean separation: DO = coordinator, SSH Proxy = executor
|
|
|
|
**Key Implementation**:
|
|
```typescript
|
|
private async runAgentViaProxy(config: RunConfig) {
|
|
// Call SSH proxy
|
|
const response = await fetch(`${SSH_PROXY_URL}/agent/run`, {...})
|
|
|
|
// Stream JSONL events
|
|
const reader = response.body?.getReader()
|
|
while (true) {
|
|
const { done, value } = await reader.read()
|
|
// Parse JSONL lines
|
|
// Broadcast to WebSocket clients
|
|
this.broadcast(event)
|
|
}
|
|
}
|
|
```
|
|
|
|
### 2. SSH Connection in Agent
|
|
**File**: `ssh-proxy/agent.ts`
|
|
|
|
**Changes Made**:
|
|
- ✅ Added SSH connection logic in `planLevel` node
|
|
- ✅ Connects to `bandit.labs.overthewire.org:2220`
|
|
- ✅ Uses correct username (`bandit0`, `bandit1`, etc.)
|
|
- ✅ Stores connection ID in state
|
|
- ✅ Reuses connection across commands
|
|
|
|
**Key Code**:
|
|
```typescript
|
|
if (!sshConnectionId) {
|
|
const connectResponse = await fetch(`${sshProxyUrl}/ssh/connect`, {
|
|
method: 'POST',
|
|
body: JSON.stringify({
|
|
host: 'bandit.labs.overthewire.org',
|
|
port: 2220,
|
|
username: `bandit${currentLevel}`,
|
|
password: currentPassword,
|
|
}),
|
|
})
|
|
// Store connectionId in state
|
|
}
|
|
```
|
|
|
|
### 3. WebSocket Route
|
|
**File**: `bandit-runner-app/src/app/api/agent/[runId]/ws/route.ts`
|
|
|
|
**Status**: ✅ Already correct
|
|
- Forwards WebSocket upgrades to Durable Object
|
|
- Passes all headers through
|
|
- Error handling in place
|
|
|
|
### 4. Worker Patch Script
|
|
**File**: `bandit-runner-app/scripts/patch-worker.js`
|
|
|
|
**Status**: ✅ Already has correct implementation
|
|
- Inlines DO code into `.open-next/worker.js`
|
|
- Includes `runAgent()` method that streams from SSH proxy
|
|
- Broadcasts events to WebSocket clients
|
|
- Exports `BanditAgentDO` class
|
|
|
|
### 5. Event Handlers
|
|
**Files**:
|
|
- `bandit-runner-app/src/lib/websocket/agent-events.ts`
|
|
- `bandit-runner-app/src/hooks/useAgentWebSocket.ts`
|
|
|
|
**Status**: ✅ Already implemented
|
|
- `handleAgentEvent` processes all event types
|
|
- Terminal lines updated from `terminal_output` events
|
|
- Chat messages updated from `agent_message` and `thinking` events
|
|
- ANSI rendering ready with `dangerouslySetInnerHTML`
|
|
|
|
## 🚧 In Progress / Needs Testing
|
|
|
|
### 1. Deploy and Test
|
|
**Next Steps**:
|
|
```bash
|
|
cd bandit-runner-app
|
|
pnpm run deploy # Builds, patches worker, deploys
|
|
```
|
|
|
|
**What to Test**:
|
|
1. Open https://bandit-runner-app.nicholaivogelfilms.workers.dev/
|
|
2. Click START button
|
|
3. Check browser DevTools → Network → WS tab
|
|
4. Verify WebSocket connection established
|
|
5. Watch for events flowing
|
|
6. Check Terminal panel for SSH output
|
|
7. Check Chat panel for LLM reasoning
|
|
|
|
### 2. SSH Proxy Environment Variable
|
|
**File**: `ssh-proxy/agent.ts`
|
|
|
|
**Issue**: Calls `http://localhost:3001` for SSH proxy
|
|
**Fix Needed**: Should call own endpoints (they're in the same service)
|
|
|
|
**Solution**:
|
|
```typescript
|
|
// In ssh-proxy/agent.ts executeCommand():
|
|
const sshProxyUrl = 'http://localhost:3001' // Same service!
|
|
```
|
|
|
|
This is actually correct since the SSH proxy calls its own `/ssh/connect` and `/ssh/exec` endpoints.
|
|
|
|
## ❌ Known Issues
|
|
|
|
### 1. Model Search Filtering
|
|
**File**: `bandit-runner-app/src/components/agent-control-panel.tsx`
|
|
|
|
**Issue**: Search box doesn't filter models
|
|
**Priority**: Low (UI polish, not critical path)
|
|
**Fix**: Add `keywords` prop to CommandItem
|
|
|
|
### 2. Missing Error Recovery
|
|
**File**: `ssh-proxy/agent.ts`
|
|
|
|
**Issue**: No retry logic in agent
|
|
**Priority**: Medium
|
|
**Impact**: Agent will fail on transient errors
|
|
|
|
**Solution Needed**:
|
|
- Add retry count tracking
|
|
- Exponential backoff
|
|
- Max retries per level (already in state)
|
|
|
|
## 📋 Testing Checklist
|
|
|
|
### Critical Path (MUST WORK)
|
|
- [ ] User clicks START
|
|
- [ ] WebSocket connects (check DevTools)
|
|
- [ ] SSH connection established (check terminal for connection message)
|
|
- [ ] LLM generates reasoning (check chat panel)
|
|
- [ ] SSH command executes (check terminal for `$ cat readme`)
|
|
- [ ] Command output appears (check terminal for readme contents)
|
|
- [ ] Password extracted
|
|
- [ ] Level advances
|
|
|
|
### Nice to Have
|
|
- [ ] ANSI colors render correctly
|
|
- [ ] Manual mode works
|
|
- [ ] Pause/resume works
|
|
- [ ] Error messages display properly
|
|
|
|
## 🏗️ Architecture Flow
|
|
|
|
```
|
|
1. User clicks START
|
|
↓
|
|
2. Frontend: handleStartRun() → fetch('/api/agent/run-123/start')
|
|
↓
|
|
3. API Route: → DO.fetch('/start')
|
|
↓
|
|
4. Durable Object:
|
|
- Initialize state
|
|
- runAgentViaProxy()
|
|
- fetch('https://bandit-ssh-proxy.fly.dev/agent/run')
|
|
↓
|
|
5. SSH Proxy (/agent/run):
|
|
- Create BanditAgent
|
|
- agent.run() starts LangGraph
|
|
- Stream JSONL events back
|
|
↓
|
|
6. Durable Object:
|
|
- Read JSONL stream
|
|
- broadcast(event) to WebSocket clients
|
|
↓
|
|
7. Frontend WebSocket:
|
|
- Receive events
|
|
- handleAgentEvent()
|
|
- Update terminal lines
|
|
- Update chat messages
|
|
↓
|
|
8. User sees:
|
|
- Terminal: "$ cat readme" + output
|
|
- Chat: "Planning: [LLM reasoning]"
|
|
```
|
|
|
|
## 🔧 Environment Variables Required
|
|
|
|
### Frontend (.dev.vars)
|
|
```env
|
|
OPENROUTER_API_KEY=sk-or-...
|
|
SSH_PROXY_URL=https://bandit-ssh-proxy.fly.dev
|
|
```
|
|
|
|
### SSH Proxy (.env or Fly.io secrets)
|
|
```env
|
|
PORT=3001
|
|
```
|
|
|
|
## 🚀 Deployment Commands
|
|
|
|
### Deploy Frontend
|
|
```bash
|
|
cd bandit-runner-app
|
|
pnpm run deploy # OpenNext build + patch + deploy
|
|
```
|
|
|
|
### Deploy SSH Proxy (if needed)
|
|
```bash
|
|
cd ssh-proxy
|
|
flyctl deploy
|
|
```
|
|
|
|
## 📊 Success Metrics
|
|
|
|
**The app is working when you see this flow**:
|
|
|
|
1. Click START
|
|
2. Chat: "Starting run - Level 0 to 5 using openai/gpt-4o-mini"
|
|
3. Chat: "Planning: I need to read the readme file..."
|
|
4. Terminal: "$ cat readme"
|
|
5. Terminal: "Congratulations on your first steps into..."
|
|
6. Chat: "Password found: [32-char password]"
|
|
7. Terminal: "$ ssh bandit1@bandit.labs.overthewire.org"
|
|
8. Chat: "Planning: Now on level 1..."
|
|
|
|
**If you see all 8 steps, the core functionality is WORKING** 🎉
|
|
|
|
## 🐛 Debugging
|
|
|
|
### WebSocket Not Connecting
|
|
1. Check browser DevTools → Network → WS filter
|
|
2. Look for `/api/agent/run-xxx/ws`
|
|
3. Check status: should be 101 Switching Protocols
|
|
4. If 500: Check Durable Object is exported
|
|
5. If 404: Check route.ts exists
|
|
|
|
### No Terminal Output
|
|
1. Open browser console
|
|
2. Look for WebSocket messages
|
|
3. Check if events are being received
|
|
4. Check `useAgentWebSocket` is processing events
|
|
5. Check `wsTerminalLines` is being rendered
|
|
|
|
### No Chat Messages
|
|
1. Same as terminal debugging
|
|
2. Check `agent_message` and `thinking` events
|
|
3. Check `wsChatMessages` state
|
|
4. Verify `handleAgentEvent` case statements
|
|
|
|
### SSH Connection Fails
|
|
1. Check SSH proxy logs: `flyctl logs -a bandit-ssh-proxy`
|
|
2. Verify password is correct (bandit0 for level 0)
|
|
3. Check Bandit server is accessible
|
|
4. Test manually: `ssh bandit0@bandit.labs.overthewire.org -p 2220`
|
|
|
|
## 📝 Next Steps
|
|
|
|
1. **Deploy and test** - Most critical
|
|
2. **Fix any deployment issues**
|
|
3. **Test end-to-end flow**
|
|
4. **Add error recovery** - Medium priority
|
|
5. **Polish UI** - Low priority (model search, etc.)
|
|
|
|
## 💡 Key Insights
|
|
|
|
**What Changed from Original Plan**:
|
|
- ❌ Running LangGraph in DO doesn't work (Node.js APIs needed)
|
|
- ✅ SSH Proxy runs full LangGraph agent
|
|
- ✅ DO is lightweight coordinator + WebSocket server
|
|
- ✅ JSONL streaming over HTTP works great
|
|
- ✅ Architecture is correct and deployable
|
|
|
|
**Why This Works**:
|
|
- Durable Objects are perfect for WebSocket management
|
|
- SSH Proxy (Node.js on Fly.io) can run LangGraph
|
|
- HTTP streaming is simpler than complex DO↔Worker communication
|
|
- Clean separation of concerns
|
|
|
|
---
|
|
|
|
**Status**: Ready for deployment and testing
|
|
**Risk**: Medium (untested in production)
|
|
**Confidence**: High (architecture is sound)
|
|
|
|
|