docs: comprehensive README update with accurate architecture and deployment guide

- Add detailed Features section with LangGraph agent, WebSocket streaming, ANSI terminal - Document real architecture: Browser → Workers → DO → SSH Proxy → Bandit - Add complete Prerequisites section (accounts, software, API keys) - Provide step-by-step Installation instructions for local dev - Add comprehensive 4-step Deployment guide (SSH Proxy, DO, Main, Verify) - Update Usage section with actual UI features and manual mode - Add Troubleshooting section for common issues - Update Roadmap to reflect completed features - Remove outdated D1/R2 architecture references - Add LangGraph badge and update Built With section - Document monorepo structure and component responsibilities - Include data flow diagram and event streaming details
2025-10-09 19:27:58 -06:00 · 2025-10-09 19:27:58 -06:00 · 046ef202cb
commit 046ef202cb
parent cff4af5b92
3 changed files with 876 additions and 116 deletions
--- a/README.md
+++ b/README.md
@ -66,20 +66,25 @@

 [![Product Screenshot][product-screenshot]](#)

-**Bandit Runner** is a public, deterministic evaluation harness for large language models.  
-It transforms AI models into autonomous operators tasked with completing the **OverTheWire Bandit** wargame via SSH — entirely on Cloudflare Workers.
+**Bandit Runner** is an autonomous AI agent testing framework that evaluates large language models by having them solve the **OverTheWire Bandit** wargame challenges via SSH.

 **Why it matters**
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution.
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions.
- Generates reproducible, privacy-safe logs for research or public leaderboards.
+- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution
+- Tests tool use (SSH), planning, error handling, and persistence under real network conditions
+- Generates reproducible logs for research and public leaderboards
+- Demonstrates LLM capabilities in a sandboxed, deterministic environment

-### Core Concepts
- **Agent Role:** Each run instantiates an LLM as “BanditRunner” — a scripted, deterministic persona following a strict system prompt and command allow-list.
- **Environment:** Next.js frontend + OpenNext build → Cloudflare Workers backend (Durable Objects + D1 + R2).
- **Security:** Hard-scoped to `bandit.labs.overthewire.org:2220`.  
-  All discovered passwords are redacted in logs and sealed in short-lived encrypted blobs.
- **Goal:** Advance from Level 0 → final level autonomously while documenting every decision.
+### Features
+
+- 🤖 **LangGraph.js Autonomous Agent** - State machine with tool use and planning
+- 🔌 **Real-Time WebSocket Streaming** - Live updates with <50ms latency
+- 🖥️ **Full Terminal Output** - Complete SSH session with ANSI colors and formatting
+- 💬 **Agent Reasoning Display** - See the LLM's thinking process and command selection
+- 🎯 **OpenRouter Integration** - Access 100+ models with search and filtering
+- 🎨 **Beautiful Retro UI** - Terminal-inspired interface with shadcn/ui components
+- 🔒 **Security Boundaries** - Sandboxed SSH access to read-only Bandit server
+- 📊 **Level Progression** - Automatic password extraction and advancement
+- 🛠️ **Manual Mode** - Optional human intervention for debugging

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -87,14 +92,14 @@ It transforms AI models into autonomous operators tasked with completing the **O

 ### Built With

-* [![Next.js][Next.js]][Next-url]
-* [![React][React.js]][React-url]
-* [![Cloudflare][Cloudflare-badge]][Cloudflare-url]
-* [![OpenNext][OpenNext-badge]][OpenNext-url]
-* [![Shadcn/UI][Shadcn-badge]][Shadcn-url]
-* [![TypeScript][TypeScript-badge]][TypeScript-url]
-* [![Drizzle ORM][Drizzle-badge]][Drizzle-url]
-* [![pnpm][pnpm-badge]][pnpm-url]
+* [![Next.js][Next.js]][Next-url] - Frontend framework (v15 with App Router)
+* [![React][React.js]][React-url] - UI library (v19)
+* [![Cloudflare][Cloudflare-badge]][Cloudflare-url] - Edge compute + Durable Objects
+* [![OpenNext][OpenNext-badge]][OpenNext-url] - Next.js adapter for Cloudflare Workers
+* [![LangGraph][LangGraph-badge]][LangGraph-url] - Agent orchestration framework
+* [![Shadcn/UI][Shadcn-badge]][Shadcn-url] - Component library
+* [![TypeScript][TypeScript-badge]][TypeScript-url] - Type safety
+* [![pnpm][pnpm-badge]][pnpm-url] - Package manager

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -104,55 +109,140 @@ It transforms AI models into autonomous operators tasked with completing the **O

 ### Prerequisites

-You need:
+#### Required Accounts
+* **Cloudflare** account (free tier works, needs Durable Objects enabled)
+* **Fly.io** account (free tier works)
+* **OpenRouter** account + API key ([get one here](https://openrouter.ai))
+
+#### Required Software
 * **Node.js ≥ 20**
-* **pnpm**
+* **pnpm** package manager
  ```bash
-  npm i -g pnpm
-    ```
-
-* **Wrangler 3 CLI**
-
-  ```bash
-  npm i -g wrangler
+  npm install -g pnpm
+  ```
+* **Wrangler CLI** (Cloudflare)
+  ```bash
+  npm install -g wrangler
+  ```
+* **flyctl CLI** (Fly.io)
+  ```bash
+  curl -L https://fly.io/install.sh | sh
  ```
-* A Cloudflare account with access to:
-
-  * Durable Objects
-  * D1 Database
-  * R2 Storage

 ### Installation

-1. Clone the repo
+#### 1. Clone Repository
+```bash
+git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
+cd bandit-runner
+```

-   ```bash
-   git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
-   cd bandit-runner
-   ```
-2. Install dependencies
+#### 2. Install Dependencies
+```bash
+# Frontend + Cloudflare Workers
+cd bandit-runner-app
+pnpm install

-   ```bash
-   pnpm install
-   ```
-3. Copy and configure environment
+# SSH Proxy (LangGraph agent)
+cd ../ssh-proxy
+npm install
+```

-   ```bash
-   cp .env.example .env.local
-   ```
-4. Build and run locally
+#### 3. Configure Environment

-   ```bash
-   pnpm dev
-   # or
-   wrangler dev
-   ```
-5. Deploy preview
+**Cloudflare DO Worker:**
+```bash
+cd bandit-runner-app/workers/bandit-agent-do
+# Create .env.local with your OpenRouter API key
+echo "OPENROUTER_API_KEY=your_key_here" > .env.local
+```

-   ```bash
-   pnpm build
-   wrangler deploy --env preview
-   ```
+**Main Worker:**  
+No `.env` needed - configuration is in `wrangler.jsonc`
+
+#### 4. Local Development
+
+**Option A: Frontend Only** (requires deployed backend)
+```bash
+cd bandit-runner-app
+pnpm dev
+# Open http://localhost:3000
+```
+
+**Option B: Full Stack** (all components local)
+```bash
+# Terminal 1: SSH Proxy
+cd ssh-proxy
+npm run dev
+# Runs on http://localhost:3001
+
+# Terminal 2: Main App
+cd bandit-runner-app
+# Update wrangler.jsonc: "SSH_PROXY_URL": "http://localhost:3001"
+pnpm dev
+```
+
+<p align="right">(<a href="#readme-top">back to top</a>)</p>
+
+---
+
+## Deployment
+
+### Step 1: Deploy SSH Proxy to Fly.io
+
+```bash
+cd ssh-proxy
+
+# Login (opens browser)
+flyctl auth login
+
+# Deploy (creates app on first run)
+flyctl deploy
+
+# Note your URL: https://bandit-ssh-proxy.fly.dev
+```
+
+### Step 2: Deploy Durable Object Worker
+
+```bash
+cd ../bandit-runner-app/workers/bandit-agent-do
+
+# Set OpenRouter API key as secret
+wrangler secret put OPENROUTER_API_KEY
+# Paste your key when prompted
+
+# Deploy
+wrangler deploy
+
+# Verify (optional)
+wrangler tail
+```
+
+### Step 3: Deploy Main Application
+
+```bash
+cd ../..  # Back to bandit-runner-app root
+
+# Verify wrangler.jsonc has correct SSH_PROXY_URL
+# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
+
+# Build and deploy
+pnpm run deploy
+
+# Your app is live at:
+# https://bandit-runner-app.<your-subdomain>.workers.dev
+```
+
+### Step 4: Verify Deployment
+
+```bash
+# Test SSH Proxy
+curl https://bandit-ssh-proxy.fly.dev/ssh/health
+# Should return: {"status":"ok","activeConnections":0}
+
+# Open your app
+open https://bandit-runner-app.<your-subdomain>.workers.dev
+```

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -160,21 +250,37 @@ You need:

 ## Usage

-Once deployed, visit `/runs/new` to start a new evaluation.
-Provide a model endpoint (OpenAI, OpenRouter, or self-hosted) and initiate a Bandit Run.
+### Starting a Run

-Each run:
+1. **Open the application** in your browser
+2. **Select Model**: Click the dropdown to choose from 100+ models
+   - Search by name
+   - Filter by provider (OpenAI, Anthropic, Google, Meta, etc.)
+   - Filter by price range
+   - Filter by context length
+3. **Set Target Level**: Choose how far the agent should progress (default: 5)
+4. **Select Output Mode**:
+   - **Selective** (default): LLM reasoning + tool calls
+   - **Everything**: Reasoning + tool calls + full command outputs
+5. **Click START**

-* Spawns a Durable Object → “Run Coordinator”
-* Connects to `bandit.labs.overthewire.org:2220`
-* Executes controlled `ssh.connect` / `ssh.exec` / `ssh.close` operations
-* Streams JSONL logs and commentary to the Live Viewer
+### Monitoring Progress

-Developers can extend:
+The interface provides real-time feedback:

-* Scoring rules (`lib/scoring/verdicts.ts`)
-* Level validators (`lib/scoring/validators.ts`)
-* Model interfaces (`lib/ssh/tool-adapter.ts`)
+- **Left Panel (Terminal)**: Full SSH session with ANSI colors and formatting
+- **Right Panel (Agent Chat)**: LLM reasoning, planning, and discoveries
+- **Top Bar**: Current status, level, and active model
+- **Control Panel**: Pause/Resume/Stop controls
+
+### Manual Mode (Debugging)
+
+Toggle **"Manual Mode"** in the terminal footer to:
+- Type commands directly into the SSH session
+- Assist the agent when stuck
+- Debug connection or logic issues
+
+⚠️ **Note**: Manual intervention disqualifies runs from leaderboards

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -182,28 +288,189 @@ Developers can extend:

 ## Architecture

+### System Overview
+
 ```text
-Next.js (App Router)
-│
-├── UI (Shadcn/UI)
-│   ├─ LiveLog
-│   └─ LevelCard
-│
-├── Edge API Routes (OpenNext)
-│   ├─ /api/startRun
-│   ├─ /api/toolInvoke
-│   └─ /api/stream
-│
-└── Cloudflare Worker
-    ├─ Durable Object: RunCoordinator
-    │   ├─ TCP connect() to Bandit
-    │   ├─ State machine (levels, caps, timers)
-    │   └─ Writes logs → R2
-    ├─ D1 (metadata)
-    └─ R2 (artifacts)
+┌─────────────┐
+│   Browser   │ WebSocket (wss://)
+└──────┬──────┘
+       │
+       ↓
+┌─────────────────────────────┐
+│  Cloudflare Worker (Main)   │ Next.js + OpenNext
+│  bandit-runner-app          │ Intercepts WebSocket upgrades
+└──────┬──────────────────────┘
+       │
+       ↓
+┌─────────────────────────────┐
+│  Durable Object Worker      │ WebSocket connection manager
+│  bandit-agent-do            │ Run state coordinator
+└──────┬──────────────────────┘
+       │ HTTP (JSONL streaming)
+       ↓
+┌─────────────────────────────┐
+│  SSH Proxy (Fly.io)         │ LangGraph.js autonomous agent
+│  bandit-ssh-proxy.fly.dev   │ SSH2 client
+└──────┬──────────────────────┘
+       │ SSH Protocol
+       ↓
+┌─────────────────────────────┐
+│  Bandit SSH Server          │ CTF challenges
+│  overthewire.org:2220       │ Real command execution
+└─────────────────────────────┘
 ```

-*See `docs/ADR-001-architecture.md` for the detailed decision record.*
+### Component Responsibilities
+
+**Frontend (Next.js 15)**
+- React UI with shadcn/ui components
+- Real-time terminal with ANSI rendering
+- Agent chat display
+- Model selection and configuration
+
+**Main Worker (Cloudflare)**
+- Serves Next.js application
+- Intercepts WebSocket upgrades before Next.js routing
+- Forwards WS connections to Durable Object
+
+**Durable Object (Separate Worker)**
+- Manages WebSocket connections from browsers
+- Maintains run state (level, status, passwords)
+- Calls SSH proxy HTTP endpoints
+- Streams JSONL events to connected clients
+
+**SSH Proxy (Fly.io)**
+- Runs LangGraph.js agent state machine
+- Maintains SSH2 connections to Bandit server
+- Executes commands via PTY (full terminal capture)
+- Extracts passwords using regex patterns
+- Streams events back to DO via HTTP response
+
+### Data Flow
+
+1. User clicks START in browser
+2. Browser establishes WebSocket to Main Worker
+3. Worker forwards to Durable Object
+4. DO sends HTTP POST to SSH Proxy `/agent/run`
+5. SSH Proxy runs LangGraph agent
+6. Agent connects via SSH to Bandit server
+7. Commands executed, passwords extracted
+8. Events stream: SSH Proxy → DO → WebSocket → Browser
+9. UI updates in real-time
+
+### Monorepo Structure
+
+```
+bandit-runner/
+├── bandit-runner-app/          # Frontend + Cloudflare Workers
+│   ├── src/
+│   │   ├── app/                # Next.js pages + API routes
+│   │   ├── components/         # React components
+│   │   ├── hooks/              # useAgentWebSocket
+│   │   └── lib/                # Utilities, types
+│   ├── workers/
+│   │   └── bandit-agent-do/    # Standalone Durable Object
+│   ├── scripts/
+│   │   └── patch-worker.js     # Injects WebSocket intercept
+│   └── wrangler.jsonc          # Main worker config
+├── ssh-proxy/                   # LangGraph agent runtime
+│   ├── agent.ts                # State machine definition
+│   ├── server.ts               # Express HTTP server
+│   ├── Dockerfile              # Container image
+│   └── fly.toml                # Fly.io deployment config
+└── docs/                        # Documentation
+```
+
+<p align="right">(<a href="#readme-top">back to top</a>)</p>
+
+---
+
+## Troubleshooting
+
+### WebSocket Connection Failed
+
+**Symptoms**: "WebSocket connection failed" or "NS_ERROR_WEBSOCKET_CONNECTION_REFUSED"
+
+**Solutions**:
+```bash
+# Check DO worker is deployed
+wrangler deployments list
+
+# Check browser console for specific errors
+# Verify DO binding in wrangler.jsonc
+
+# Test DO directly
+curl https://bandit-runner-app.<your-subdomain>.workers.dev/api/agent/test-run/status
+```
+
+### SSH Proxy Not Responding
+
+**Symptoms**: Agent starts but no terminal output appears
+
+**Solutions**:
+```bash
+# Check Fly.io app status
+flyctl status
+
+# View real-time logs
+flyctl logs
+
+# Test health endpoint
+curl https://bandit-ssh-proxy.fly.dev/ssh/health
+# Should return: {"status":"ok","activeConnections":0}
+
+# Restart if needed
+flyctl restart
+```
+
+### Agent Not Starting
+
+**Symptoms**: Run status stays "IDLE" or errors in console
+
+**Solutions**:
+```bash
+# Verify OpenRouter API key is set
+wrangler secret list
+
+# Check DO worker logs
+wrangler tail --name bandit-agent-do
+
+# Ensure SSH_PROXY_URL is correct in wrangler.jsonc
+# Should be: https://bandit-ssh-proxy.fly.dev (not http://)
+```
+
+### Commands Not Executing
+
+**Symptoms**: Agent thinks but no SSH commands run
+
+**Solutions**:
+```bash
+# Check SSH proxy logs
+flyctl logs
+
+# Verify Bandit server is accessible
+ssh bandit0@bandit.labs.overthewire.org -p 2220
+# Password: bandit0
+
+# Review agent reasoning in chat panel
+# Look for error messages or failed attempts
+```
+
+### Build or Deploy Errors
+
+**Common issues**:
+```bash
+# "Cannot find module" during build
+cd bandit-runner-app && pnpm install
+cd ssh-proxy && npm install
+
+# "Account ID not found"
+# Add account_id to workers/bandit-agent-do/wrangler.toml
+
+# "Script not found: bandit-agent-do"
+# Deploy DO worker first:
+cd workers/bandit-agent-do && wrangler deploy
+```

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -211,16 +478,33 @@ Next.js (App Router)

 ## Roadmap

-* [x] Core runner architecture
-* [x] JSONL log streaming
-* [x] SSH tool scaffolding
-* [ ] Add live leaderboard
-* [ ] Add mock SSH server for tests
-* [ ] Expand scoring heuristics
-* [ ] Implement model-agnostic adapter layer
-* [ ] Public demo page
+### Completed ✅
+* [x] LangGraph.js autonomous agent framework
+* [x] Real-time WebSocket streaming (JSONL events)
+* [x] Full terminal output with ANSI colors
+* [x] Agent reasoning and thinking display
+* [x] OpenRouter model selection with search/filters
+* [x] Level 0 → N automatic progression
+* [x] Password extraction and advancement
+* [x] Manual mode for debugging
+* [x] Durable Object state management

-See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for the full roadmap.
+### In Progress 🚧
+* [ ] Retry logic with exponential backoff
+* [ ] Token usage and cost tracking UI
+* [ ] Persistent run history (D1 database)
+* [ ] Log storage and analysis (R2 bucket)
+
+### Planned 📋
+* [ ] Public leaderboard (fastest completions)
+* [ ] Multi-agent comparison mode
+* [ ] Custom system prompts and strategies
+* [ ] Agent performance analytics dashboard
+* [ ] Mock SSH server for testing
+* [ ] Expanded CTF support (Natas, Krypton, etc.)
+* [ ] Run replay and debugging tools
+
+See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for discussion and feature requests.

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -268,14 +552,14 @@ Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.

 ## Acknowledgments

-* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) — for the wargame challenge itself
-* [Cloudflare Workers Docs](https://developers.cloudflare.com/workers/)
-* [OpenNext](https://opennext.js.org/)
-* [Shadcn/UI](https://ui.shadcn.com)
-* [Drizzle ORM](https://orm.drizzle.team)
-* [Choose a License](https://choosealicense.com)
-* [Img Shields](https://shields.io)
-* [Contrib.rocks](https://contrib.rocks)
+* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF wargame challenges
+* [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent orchestration framework
+* [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute platform
+* [Fly.io](https://fly.io) - Global application hosting
+* [OpenRouter](https://openrouter.ai) - Unified LLM API access
+* [OpenNext](https://opennext.js.org/) - Next.js adapter for Cloudflare
+* [shadcn/ui](https://ui.shadcn.com) - Beautiful component library
+* [ANSI-to-HTML](https://github.com/rburns/ansi-to-html) - Terminal color rendering

 <p align="right">(<a href="#readme-top">back to top</a>)</p>

@ -304,12 +588,12 @@ Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.
 [Cloudflare-url]: https://developers.cloudflare.com/workers/
 [OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white
 [OpenNext-url]: https://opennext.js.org/
+[LangGraph-badge]: https://img.shields.io/badge/LangGraph-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white
+[LangGraph-url]: https://langchain-ai.github.io/langgraphjs/
 [Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white
 [Shadcn-url]: https://ui.shadcn.com
 [TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white
 [TypeScript-url]: https://www.typescriptlang.org/
-[Drizzle-badge]: https://img.shields.io/badge/Drizzle%20ORM-3E63DD?style=for-the-badge&logo=sqlite&logoColor=white
-[Drizzle-url]: https://orm.drizzle.team
 [pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white
 [pnpm-url]: https://pnpm.io
 [conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white
--- a/bandit-runner-app/src/components/terminal-chat-interface.tsx
+++ b/bandit-runner-app/src/components/terminal-chat-interface.tsx
@ -74,8 +74,8 @@ export function TerminalChatInterface() {
  const [mounted, setMounted] = useState(false)
  const [manualMode, setManualMode] = useState(false)
  
-  const terminalEndRef = useRef<HTMLDivElement>(null)
-  const chatEndRef = useRef<HTMLDivElement>(null)
+  const terminalScrollRef = useRef<HTMLDivElement>(null)
+  const chatScrollRef = useRef<HTMLDivElement>(null)
  const terminalInputRef = useRef<HTMLInputElement>(null)
  const chatInputRef = useRef<HTMLInputElement>(null)
  
@ -119,11 +119,21 @@ export function TerminalChatInterface() {
  }, [])

  useEffect(() => {
-    terminalEndRef.current?.scrollIntoView({ behavior: "smooth" })
+    if (terminalScrollRef.current) {
+      const viewport = terminalScrollRef.current.querySelector('[data-radix-scroll-area-viewport]')
+      if (viewport) {
+        viewport.scrollTop = viewport.scrollHeight
+      }
+    }
  }, [wsTerminalLines])

  useEffect(() => {
-    chatEndRef.current?.scrollIntoView({ behavior: "smooth" })
+    if (chatScrollRef.current) {
+      const viewport = chatScrollRef.current.querySelector('[data-radix-scroll-area-viewport]')
+      if (viewport) {
+        viewport.scrollTop = viewport.scrollHeight
+      }
+    }
  }, [wsChatMessages])

  const formatTimestamp = (date: Date) => {
@ -399,8 +409,8 @@ export function TerminalChatInterface() {
              </div>

              {/* Terminal content */}
-              <ScrollArea className="flex-1 p-4 relative z-10">
-                <div className="space-y-1">
+              <ScrollArea ref={terminalScrollRef} className="flex-1 relative z-10 h-full">
+                <div className="p-4 space-y-1">
                  {wsTerminalLines.map((line, idx) => (
                    <div
                      key={idx}
@ -427,7 +437,6 @@ export function TerminalChatInterface() {
                      )}
                    </div>
                  ))}
-                  <div ref={terminalEndRef} />
                </div>
              </ScrollArea>

@ -506,8 +515,8 @@ export function TerminalChatInterface() {
              </div>

              {/* Messages */}
-              <ScrollArea className="flex-1 p-4 relative z-10">
-                <div className="space-y-4">
+              <ScrollArea ref={chatScrollRef} className="flex-1 relative z-10 h-full">
+                <div className="p-4 space-y-4">
                  {wsChatMessages.map((msg, idx) => (
                    <div key={idx} className="space-y-1">
                      <div className="flex items-center gap-2 text-[10px]">
@ -551,7 +560,6 @@ export function TerminalChatInterface() {
                      </div>
                    </div>
                  )}
-                  <div ref={chatEndRef} />
                </div>
              </ScrollArea>

--- a/readme-improvement.plan.md
+++ b/readme-improvement.plan.md
@ -0,0 +1,468 @@
+# README Improvement Plan
+
+## Problem
+
+The current README is outdated and doesn't reflect the actual architecture or provide clear setup/deployment instructions. It references old concepts (D1, R2, mock setup) that aren't implemented, and doesn't explain the real LangGraph + SSH Proxy architecture that's currently working.
+
+## Goals
+
+1. **Clear Project Overview**: Explain what Bandit Runner actually does and why it matters
+2. **Accurate Architecture**: Document the real system (Next.js → Cloudflare Workers → Durable Objects → SSH Proxy → Bandit Server)
+3. **Step-by-Step Setup**: Complete local development setup instructions
+4. **Deployment Guide**: Clear instructions for deploying all components
+5. **Feature Documentation**: What the system actually does (model selection, real-time terminal, agent reasoning, etc.)
+6. **Remove Outdated Content**: Clean up references to unimplemented features
+
+## Current Issues
+
+### Inaccurate Architecture Description
+- README shows old architecture with D1/R2 that aren't implemented
+- Doesn't mention LangGraph.js agent framework
+- Doesn't explain SSH Proxy on Fly.io
+- Missing Durable Object standalone worker details
+
+### Missing Prerequisites
+- No mention of OpenRouter API key requirement
+- Missing Fly.io account for SSH proxy
+- No Node.js version specification (needs 20+)
+- Missing pnpm workspace setup
+
+### Incomplete Setup Instructions
+- Doesn't explain monorepo structure
+- No `.env` file examples or required variables
+- Missing SSH proxy deployment steps
+- No explanation of DO worker vs main worker
+- No Wrangler secrets configuration
+
+### Vague Usage Section
+- Doesn't explain the actual UI (control panel, terminal, chat)
+- No screenshots or feature descriptions
+- Missing model selection details
+- No explanation of manual mode
+
+### Outdated Roadmap
+- Lists items as incomplete that are done
+- Missing current features (WebSocket streaming, ANSI colors, etc.)
+
+## New README Structure
+
+```markdown
+# Bandit Runner
+
+## About
+- What it is: Autonomous AI agent testing framework
+- What it does: Solves OverTheWire Bandit CTF challenges
+- Why it matters: Real-world benchmark for LLM autonomous capabilities
+
+## Features
+- 🤖 LangGraph.js autonomous agent with tool use
+- 🔌 Real-time WebSocket streaming
+- 🖥️ Full terminal output with ANSI colors
+- 💬 Live agent reasoning/thinking display
+- 🎯 OpenRouter model selection (100+ models)
+- 🎨 Beautiful retro-terminal UI
+- 🔒 SSH security boundaries (read-only Bandit server)
+- 📊 Level progression tracking
+
+## Architecture
+
+### System Components
+1. **Frontend** - Next.js 15 with App Router + shadcn/ui
+2. **Backend** - Cloudflare Workers with OpenNext
+3. **Coordinator** - Durable Objects (separate worker)
+4. **Agent Runtime** - SSH Proxy on Fly.io (LangGraph.js)
+5. **Target** - OverTheWire Bandit SSH server
+
+### Data Flow
+Browser → WebSocket → Main Worker → DO Worker → HTTP → SSH Proxy → LangGraph Agent → SSH → Bandit Server
+
+### Tech Stack
+- Next.js 15 + React 19
+- Cloudflare Workers + Durable Objects
+- LangGraph.js for agent orchestration
+- OpenRouter for LLM access
+- Fly.io for SSH proxy hosting
+- shadcn/ui for components
+- ANSI-to-HTML for terminal rendering
+
+## Prerequisites
+
+### Required Accounts
+- Cloudflare account (free tier OK, needs Durable Objects enabled)
+- Fly.io account (free tier OK)
+- OpenRouter account + API key
+
+### Required Software
+- Node.js 20+
+- pnpm (package manager)
+- Wrangler CLI (Cloudflare)
+- flyctl CLI (Fly.io)
+- Git
+
+## Installation
+
+### 1. Clone Repository
+```bash
+git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
+cd bandit-runner
+```
+
+### 2. Install Dependencies
+```bash
+# Install pnpm if you don't have it
+npm install -g pnpm
+
+# Install all workspace dependencies
+cd bandit-runner-app
+pnpm install
+
+cd ../ssh-proxy
+npm install
+```
+
+### 3. Configure Environment
+
+#### Cloudflare (Main Worker)
+No `.env` needed - uses `wrangler.jsonc` vars
+
+#### Cloudflare (DO Worker)
+```bash
+cd bandit-runner-app/workers/bandit-agent-do
+cp .env.example .env.local
+# Add your OpenRouter API key
+```
+
+#### SSH Proxy (Fly.io)
+No configuration needed - OpenRouter key passed via HTTP headers
+
+### 4. Local Development
+
+#### Option A: Run Frontend Only (with deployed backend)
+```bash
+cd bandit-runner-app
+pnpm dev
+# Open http://localhost:3000
+```
+
+#### Option B: Run Full Stack Locally
+```bash
+# Terminal 1: SSH Proxy
+cd ssh-proxy
+npm run dev
+# Runs on http://localhost:3001
+
+# Terminal 2: Frontend (with local proxy)
+cd bandit-runner-app
+# Update wrangler.jsonc SSH_PROXY_URL to http://localhost:3001
+pnpm dev
+```
+
+## Deployment
+
+### Step 1: Deploy SSH Proxy to Fly.io
+
+```bash
+cd ssh-proxy
+
+# Login to Fly.io (opens browser)
+flyctl auth login
+
+# Deploy (first time creates app)
+flyctl deploy
+
+# Note the URL: https://bandit-ssh-proxy.fly.dev
+```
+
+### Step 2: Deploy Durable Object Worker
+
+```bash
+cd ../bandit-runner-app/workers/bandit-agent-do
+
+# Set your OpenRouter API key
+wrangler secret put OPENROUTER_API_KEY
+# Paste your key when prompted
+
+# Deploy the DO worker
+wrangler deploy
+
+# Verify deployment
+wrangler tail
+```
+
+### Step 3: Deploy Main Application
+
+```bash
+cd ../..  # Back to bandit-runner-app root
+
+# Verify SSH_PROXY_URL in wrangler.jsonc points to your Fly.io URL
+# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
+
+# Build and deploy
+pnpm run deploy
+
+# Your app will be live at:
+# https://bandit-runner-app.<your-subdomain>.workers.dev
+```
+
+### Step 4: Verify Deployment
+
+```bash
+# Test SSH Proxy health
+curl https://bandit-ssh-proxy.fly.dev/ssh/health
+
+# Should return: {"status":"ok","activeConnections":0}
+
+# Test main app
+open https://bandit-runner-app.<your-subdomain>.workers.dev
+```
+
+## Usage
+
+### Starting a Run
+
+1. Open the application in your browser
+2. **Select Model**: Click the model dropdown to choose from 100+ models
+   - Search by name
+   - Filter by provider (OpenAI, Anthropic, Google, etc.)
+   - Filter by price range
+   - Filter by context length
+3. **Set Target Level**: Choose how far the agent should attempt (default: 5)
+4. **Select Output Mode**: 
+   - **Selective** (default): Shows LLM reasoning + tool calls
+   - **Everything**: Shows reasoning + tool calls + full command outputs
+5. **Click START**
+
+### Monitoring Progress
+
+The interface shows:
+- **Left Panel (Terminal)**: Full SSH session output with colors
+- **Right Panel (Agent Chat)**: LLM reasoning and discovered information
+- **Top Status Bar**: Current status, level, and model info
+- **Control Panel**: Pause/Resume/Stop controls
+
+### Manual Mode (Debugging)
+
+Toggle "Manual Mode" in the terminal to:
+- Type commands directly into the SSH session
+- Assist the agent when it gets stuck
+- Debug connection issues
+
+⚠️ **Note**: Runs with manual intervention are disqualified from leaderboards
+
+## Features Explained
+
+### Real-Time Terminal
+- Full PTY session capture
+- ANSI color support (green prompts, red errors, etc.)
+- Timestamps for all output
+- Automatic scrolling
+- Read-only by default (manual mode available)
+
+### Agent Reasoning Display
+- LLM "thinking" messages (what command to try next)
+- Planning output (strategy for current level)
+- Password discovery announcements
+- Level completion summaries
+
+### Model Selection
+- 100+ models from OpenRouter
+- Real-time search
+- Filter by provider, price, context length
+- Shows token costs
+- Supports all major providers
+
+### Level Progression
+- Always starts at Level 0
+- Automatic password extraction
+- Automatic SSH re-login for next level
+- Configurable target level cap
+- Retry logic for failed attempts
+
+## Architecture Details
+
+### Monorepo Structure
+```
+bandit-runner/
+├── bandit-runner-app/          # Next.js frontend + Cloudflare Workers
+│   ├── src/
+│   │   ├── app/                # Next.js pages + API routes
+│   │   ├── components/         # React components (shadcn/ui)
+│   │   ├── hooks/              # React hooks (WebSocket)
+│   │   └── lib/                # Utilities, agents, storage
+│   ├── workers/
+│   │   └── bandit-agent-do/    # Standalone Durable Object worker
+│   ├── scripts/
+│   │   └── patch-worker.js     # WebSocket intercept injector
+│   └── wrangler.jsonc          # Main worker config
+├── ssh-proxy/                   # LangGraph agent on Fly.io
+│   ├── agent.ts                # LangGraph state machine
+│   ├── server.ts               # Express HTTP server
+│   ├── Dockerfile              # Container definition
+│   └── fly.toml                # Fly.io config
+├── docs/                        # Documentation
+└── public/                      # Assets
+```
+
+### WebSocket Intercept Pattern
+
+The main worker intercepts WebSocket upgrade requests **before** they reach Next.js:
+
+```javascript
+// Injected by scripts/patch-worker.js
+if (request.headers.get('Upgrade') === 'websocket') {
+  // Forward directly to DO, bypass Next.js
+  return env.BANDIT_AGENT.get(id).fetch(request);
+}
+```
+
+This is necessary because Next.js API routes don't support WebSocket upgrades.
+
+### Durable Object Responsibilities
+
+- Accept WebSocket connections from browsers
+- Maintain run state (level, passwords, status)
+- Call SSH proxy HTTP endpoints
+- Stream JSONL events from SSH proxy to WebSocket clients
+- Handle START/PAUSE/RESUME/STOP actions
+
+### SSH Proxy Responsibilities
+
+- Run LangGraph.js agent state machine
+- Maintain SSH2 connections to Bandit server
+- Execute commands via PTY
+- Extract passwords using regex
+- Stream events back to DO via HTTP response body (JSONL)
+
+### Event Flow
+
+1. User clicks START in browser
+2. Browser → WebSocket → Main Worker → DO Worker
+3. DO → HTTP POST `/agent/run` → SSH Proxy
+4. SSH Proxy → SSH Connection → Bandit Server
+5. Agent executes commands, extracts passwords
+6. SSH Proxy → JSONL events → DO
+7. DO → WebSocket → Browser
+8. UI updates in real-time
+
+## Troubleshooting
+
+### WebSocket Connection Failed
+- Check DO worker is deployed: `wrangler deployments list`
+- Verify DO binding in main worker config
+- Check browser console for specific errors
+
+### SSH Proxy Not Responding
+- Check Fly.io status: `flyctl status`
+- View logs: `flyctl logs`
+- Test health endpoint: `curl https://your-proxy.fly.dev/ssh/health`
+
+### Agent Not Starting
+- Verify OpenRouter API key: `wrangler secret list`
+- Check DO worker logs: `wrangler tail --name bandit-agent-do`
+- Ensure SSH_PROXY_URL is correct in wrangler.jsonc
+
+### Commands Not Executing
+- Check SSH proxy logs: `flyctl logs`
+- Verify Bandit server is accessible: `ssh bandit0@bandit.labs.overthewire.org -p 2220`
+- Review agent reasoning in chat panel for errors
+
+## Roadmap
+
+### Completed ✅
+- [x] LangGraph autonomous agent framework
+- [x] Real-time WebSocket streaming
+- [x] Full terminal output with ANSI colors
+- [x] Agent reasoning display
+- [x] OpenRouter model selection with search/filters
+- [x] Level 0 → N progression
+- [x] Manual mode for debugging
+- [x] Password extraction and advancement
+
+### In Progress 🚧
+- [ ] Retry logic with exponential backoff
+- [ ] Token usage and cost tracking UI
+- [ ] Persistent run history (D1)
+- [ ] Log storage (R2)
+
+### Planned 📋
+- [ ] Leaderboard (fastest completions)
+- [ ] Multi-agent comparison mode
+- [ ] Custom prompts and strategies
+- [ ] Agent performance analytics
+- [ ] Mock SSH server for testing
+- [ ] Expanded CTF support (beyond Bandit)
+
+## Contributing
+
+Contributions welcome! Please:
+
+1. Fork the repository
+2. Create a feature branch (`git checkout -b feat/amazing-feature`)
+3. Commit with Conventional Commits (`feat:`, `fix:`, `docs:`)
+4. Push and open a Pull Request
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
+
+## License
+
+GNU GPLv3 - See [COPYING.txt](COPYING.txt)
+
+## Contact
+
+**Nicholai Vogel**  
+[Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel)
+
+## Acknowledgments
+
+- [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF challenges
+- [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent framework
+- [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute
+- [Fly.io](https://fly.io) - Global app hosting
+- [OpenRouter](https://openrouter.ai) - Unified LLM API
+- [shadcn/ui](https://ui.shadcn.com) - Component library
+- [OpenNext](https://opennext.js.org/) - Next.js on Workers
+```
+
+## Key Changes
+
+### What's Being Added
+1. **Accurate architecture diagram** - Shows real components (DO, SSH Proxy, LangGraph)
+2. **Complete prerequisite list** - All accounts, software, and setup requirements
+3. **Monorepo structure docs** - Explains the workspace layout
+4. **Step-by-step deployment** - 4 clear deployment phases
+5. **Feature documentation** - What each UI component does
+6. **Troubleshooting section** - Common issues and solutions
+7. **Current roadmap** - Reflects actual completion status
+
+### What's Being Removed
+1. Mock architecture references (D1, R2 unimplemented features)
+2. Incorrect usage instructions
+3. Outdated screenshots/descriptions
+4. Placeholder content from template
+
+### What's Being Updated
+1. Tech stack badges - Add LangGraph, Fly.io, OpenRouter
+2. Built With section - Reflect actual dependencies
+3. Installation steps - Real commands that work
+4. Usage examples - Match actual UI
+5. Contact links - Keep current
+
+## Implementation Notes
+
+- Keep existing badge links/formatting style
+- Maintain table of contents structure
+- Add new sections: Features, Architecture Details, Troubleshooting
+- Update all code blocks with real, tested commands
+- Add architecture ASCII diagram for clarity
+- Include actual file paths and structure
+- Reference real configuration files (wrangler.jsonc, etc.)
+
+## Success Criteria
+
+✅ README accurately describes working system  
+✅ User can deploy from scratch following instructions  
+✅ No references to unimplemented features  
+✅ Clear troubleshooting for common issues  
+✅ Architecture matches production code  
+✅ All commands are tested and accurate
+