14 KiB
README Improvement Plan
Problem
The current README is outdated and doesn't reflect the actual architecture or provide clear setup/deployment instructions. It references old concepts (D1, R2, mock setup) that aren't implemented, and doesn't explain the real LangGraph + SSH Proxy architecture that's currently working.
Goals
- Clear Project Overview: Explain what Bandit Runner actually does and why it matters
- Accurate Architecture: Document the real system (Next.js → Cloudflare Workers → Durable Objects → SSH Proxy → Bandit Server)
- Step-by-Step Setup: Complete local development setup instructions
- Deployment Guide: Clear instructions for deploying all components
- Feature Documentation: What the system actually does (model selection, real-time terminal, agent reasoning, etc.)
- Remove Outdated Content: Clean up references to unimplemented features
Current Issues
Inaccurate Architecture Description
- README shows old architecture with D1/R2 that aren't implemented
- Doesn't mention LangGraph.js agent framework
- Doesn't explain SSH Proxy on Fly.io
- Missing Durable Object standalone worker details
Missing Prerequisites
- No mention of OpenRouter API key requirement
- Missing Fly.io account for SSH proxy
- No Node.js version specification (needs 20+)
- Missing pnpm workspace setup
Incomplete Setup Instructions
- Doesn't explain monorepo structure
- No
.envfile examples or required variables - Missing SSH proxy deployment steps
- No explanation of DO worker vs main worker
- No Wrangler secrets configuration
Vague Usage Section
- Doesn't explain the actual UI (control panel, terminal, chat)
- No screenshots or feature descriptions
- Missing model selection details
- No explanation of manual mode
Outdated Roadmap
- Lists items as incomplete that are done
- Missing current features (WebSocket streaming, ANSI colors, etc.)
New README Structure
# Bandit Runner
## About
- What it is: Autonomous AI agent testing framework
- What it does: Solves OverTheWire Bandit CTF challenges
- Why it matters: Real-world benchmark for LLM autonomous capabilities
## Features
- 🤖 LangGraph.js autonomous agent with tool use
- 🔌 Real-time WebSocket streaming
- 🖥️ Full terminal output with ANSI colors
- 💬 Live agent reasoning/thinking display
- 🎯 OpenRouter model selection (100+ models)
- 🎨 Beautiful retro-terminal UI
- 🔒 SSH security boundaries (read-only Bandit server)
- 📊 Level progression tracking
## Architecture
### System Components
1. **Frontend** - Next.js 15 with App Router + shadcn/ui
2. **Backend** - Cloudflare Workers with OpenNext
3. **Coordinator** - Durable Objects (separate worker)
4. **Agent Runtime** - SSH Proxy on Fly.io (LangGraph.js)
5. **Target** - OverTheWire Bandit SSH server
### Data Flow
Browser → WebSocket → Main Worker → DO Worker → HTTP → SSH Proxy → LangGraph Agent → SSH → Bandit Server
### Tech Stack
- Next.js 15 + React 19
- Cloudflare Workers + Durable Objects
- LangGraph.js for agent orchestration
- OpenRouter for LLM access
- Fly.io for SSH proxy hosting
- shadcn/ui for components
- ANSI-to-HTML for terminal rendering
## Prerequisites
### Required Accounts
- Cloudflare account (free tier OK, needs Durable Objects enabled)
- Fly.io account (free tier OK)
- OpenRouter account + API key
### Required Software
- Node.js 20+
- pnpm (package manager)
- Wrangler CLI (Cloudflare)
- flyctl CLI (Fly.io)
- Git
## Installation
### 1. Clone Repository
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
2. Install Dependencies
# Install pnpm if you don't have it
npm install -g pnpm
# Install all workspace dependencies
cd bandit-runner-app
pnpm install
cd ../ssh-proxy
npm install
3. Configure Environment
Cloudflare (Main Worker)
No .env needed - uses wrangler.jsonc vars
Cloudflare (DO Worker)
cd bandit-runner-app/workers/bandit-agent-do
cp .env.example .env.local
# Add your OpenRouter API key
SSH Proxy (Fly.io)
No configuration needed - OpenRouter key passed via HTTP headers
4. Local Development
Option A: Run Frontend Only (with deployed backend)
cd bandit-runner-app
pnpm dev
# Open http://localhost:3000
Option B: Run Full Stack Locally
# Terminal 1: SSH Proxy
cd ssh-proxy
npm run dev
# Runs on http://localhost:3001
# Terminal 2: Frontend (with local proxy)
cd bandit-runner-app
# Update wrangler.jsonc SSH_PROXY_URL to http://localhost:3001
pnpm dev
Deployment
Step 1: Deploy SSH Proxy to Fly.io
cd ssh-proxy
# Login to Fly.io (opens browser)
flyctl auth login
# Deploy (first time creates app)
flyctl deploy
# Note the URL: https://bandit-ssh-proxy.fly.dev
Step 2: Deploy Durable Object Worker
cd ../bandit-runner-app/workers/bandit-agent-do
# Set your OpenRouter API key
wrangler secret put OPENROUTER_API_KEY
# Paste your key when prompted
# Deploy the DO worker
wrangler deploy
# Verify deployment
wrangler tail
Step 3: Deploy Main Application
cd ../.. # Back to bandit-runner-app root
# Verify SSH_PROXY_URL in wrangler.jsonc points to your Fly.io URL
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
# Build and deploy
pnpm run deploy
# Your app will be live at:
# https://bandit-runner-app.<your-subdomain>.workers.dev
Step 4: Verify Deployment
# Test SSH Proxy health
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Test main app
open https://bandit-runner-app.<your-subdomain>.workers.dev
Usage
Starting a Run
- Open the application in your browser
- Select Model: Click the model dropdown to choose from 100+ models
- Search by name
- Filter by provider (OpenAI, Anthropic, Google, etc.)
- Filter by price range
- Filter by context length
- Set Target Level: Choose how far the agent should attempt (default: 5)
- Select Output Mode:
- Selective (default): Shows LLM reasoning + tool calls
- Everything: Shows reasoning + tool calls + full command outputs
- Click START
Monitoring Progress
The interface shows:
- Left Panel (Terminal): Full SSH session output with colors
- Right Panel (Agent Chat): LLM reasoning and discovered information
- Top Status Bar: Current status, level, and model info
- Control Panel: Pause/Resume/Stop controls
Manual Mode (Debugging)
Toggle "Manual Mode" in the terminal to:
- Type commands directly into the SSH session
- Assist the agent when it gets stuck
- Debug connection issues
⚠️ Note: Runs with manual intervention are disqualified from leaderboards
Features Explained
Real-Time Terminal
- Full PTY session capture
- ANSI color support (green prompts, red errors, etc.)
- Timestamps for all output
- Automatic scrolling
- Read-only by default (manual mode available)
Agent Reasoning Display
- LLM "thinking" messages (what command to try next)
- Planning output (strategy for current level)
- Password discovery announcements
- Level completion summaries
Model Selection
- 100+ models from OpenRouter
- Real-time search
- Filter by provider, price, context length
- Shows token costs
- Supports all major providers
Level Progression
- Always starts at Level 0
- Automatic password extraction
- Automatic SSH re-login for next level
- Configurable target level cap
- Retry logic for failed attempts
Architecture Details
Monorepo Structure
bandit-runner/
├── bandit-runner-app/ # Next.js frontend + Cloudflare Workers
│ ├── src/
│ │ ├── app/ # Next.js pages + API routes
│ │ ├── components/ # React components (shadcn/ui)
│ │ ├── hooks/ # React hooks (WebSocket)
│ │ └── lib/ # Utilities, agents, storage
│ ├── workers/
│ │ └── bandit-agent-do/ # Standalone Durable Object worker
│ ├── scripts/
│ │ └── patch-worker.js # WebSocket intercept injector
│ └── wrangler.jsonc # Main worker config
├── ssh-proxy/ # LangGraph agent on Fly.io
│ ├── agent.ts # LangGraph state machine
│ ├── server.ts # Express HTTP server
│ ├── Dockerfile # Container definition
│ └── fly.toml # Fly.io config
├── docs/ # Documentation
└── public/ # Assets
WebSocket Intercept Pattern
The main worker intercepts WebSocket upgrade requests before they reach Next.js:
// Injected by scripts/patch-worker.js
if (request.headers.get('Upgrade') === 'websocket') {
// Forward directly to DO, bypass Next.js
return env.BANDIT_AGENT.get(id).fetch(request);
}
This is necessary because Next.js API routes don't support WebSocket upgrades.
Durable Object Responsibilities
- Accept WebSocket connections from browsers
- Maintain run state (level, passwords, status)
- Call SSH proxy HTTP endpoints
- Stream JSONL events from SSH proxy to WebSocket clients
- Handle START/PAUSE/RESUME/STOP actions
SSH Proxy Responsibilities
- Run LangGraph.js agent state machine
- Maintain SSH2 connections to Bandit server
- Execute commands via PTY
- Extract passwords using regex
- Stream events back to DO via HTTP response body (JSONL)
Event Flow
- User clicks START in browser
- Browser → WebSocket → Main Worker → DO Worker
- DO → HTTP POST
/agent/run→ SSH Proxy - SSH Proxy → SSH Connection → Bandit Server
- Agent executes commands, extracts passwords
- SSH Proxy → JSONL events → DO
- DO → WebSocket → Browser
- UI updates in real-time
Troubleshooting
WebSocket Connection Failed
- Check DO worker is deployed:
wrangler deployments list - Verify DO binding in main worker config
- Check browser console for specific errors
SSH Proxy Not Responding
- Check Fly.io status:
flyctl status - View logs:
flyctl logs - Test health endpoint:
curl https://your-proxy.fly.dev/ssh/health
Agent Not Starting
- Verify OpenRouter API key:
wrangler secret list - Check DO worker logs:
wrangler tail --name bandit-agent-do - Ensure SSH_PROXY_URL is correct in wrangler.jsonc
Commands Not Executing
- Check SSH proxy logs:
flyctl logs - Verify Bandit server is accessible:
ssh bandit0@bandit.labs.overthewire.org -p 2220 - Review agent reasoning in chat panel for errors
Roadmap
Completed ✅
- LangGraph autonomous agent framework
- Real-time WebSocket streaming
- Full terminal output with ANSI colors
- Agent reasoning display
- OpenRouter model selection with search/filters
- Level 0 → N progression
- Manual mode for debugging
- Password extraction and advancement
In Progress 🚧
- Retry logic with exponential backoff
- Token usage and cost tracking UI
- Persistent run history (D1)
- Log storage (R2)
Planned 📋
- Leaderboard (fastest completions)
- Multi-agent comparison mode
- Custom prompts and strategies
- Agent performance analytics
- Mock SSH server for testing
- Expanded CTF support (beyond Bandit)
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feat/amazing-feature) - Commit with Conventional Commits (
feat:,fix:,docs:) - Push and open a Pull Request
See CONTRIBUTING.md for detailed guidelines.
License
GNU GPLv3 - See COPYING.txt
Contact
Nicholai Vogel
Website • LinkedIn
Acknowledgments
- OverTheWire Bandit - CTF challenges
- LangGraph.js - Agent framework
- Cloudflare Workers - Edge compute
- Fly.io - Global app hosting
- OpenRouter - Unified LLM API
- shadcn/ui - Component library
- OpenNext - Next.js on Workers
## Key Changes
### What's Being Added
1. **Accurate architecture diagram** - Shows real components (DO, SSH Proxy, LangGraph)
2. **Complete prerequisite list** - All accounts, software, and setup requirements
3. **Monorepo structure docs** - Explains the workspace layout
4. **Step-by-step deployment** - 4 clear deployment phases
5. **Feature documentation** - What each UI component does
6. **Troubleshooting section** - Common issues and solutions
7. **Current roadmap** - Reflects actual completion status
### What's Being Removed
1. Mock architecture references (D1, R2 unimplemented features)
2. Incorrect usage instructions
3. Outdated screenshots/descriptions
4. Placeholder content from template
### What's Being Updated
1. Tech stack badges - Add LangGraph, Fly.io, OpenRouter
2. Built With section - Reflect actual dependencies
3. Installation steps - Real commands that work
4. Usage examples - Match actual UI
5. Contact links - Keep current
## Implementation Notes
- Keep existing badge links/formatting style
- Maintain table of contents structure
- Add new sections: Features, Architecture Details, Troubleshooting
- Update all code blocks with real, tested commands
- Add architecture ASCII diagram for clarity
- Include actual file paths and structure
- Reference real configuration files (wrangler.jsonc, etc.)
## Success Criteria
✅ README accurately describes working system
✅ User can deploy from scratch following instructions
✅ No references to unimplemented features
✅ Clear troubleshooting for common issues
✅ Architecture matches production code
✅ All commands are tested and accurate