# README Improvement Plan ## Problem The current README is outdated and doesn't reflect the actual architecture or provide clear setup/deployment instructions. It references old concepts (D1, R2, mock setup) that aren't implemented, and doesn't explain the real LangGraph + SSH Proxy architecture that's currently working. ## Goals 1. **Clear Project Overview**: Explain what Bandit Runner actually does and why it matters 2. **Accurate Architecture**: Document the real system (Next.js → Cloudflare Workers → Durable Objects → SSH Proxy → Bandit Server) 3. **Step-by-Step Setup**: Complete local development setup instructions 4. **Deployment Guide**: Clear instructions for deploying all components 5. **Feature Documentation**: What the system actually does (model selection, real-time terminal, agent reasoning, etc.) 6. **Remove Outdated Content**: Clean up references to unimplemented features ## Current Issues ### Inaccurate Architecture Description - README shows old architecture with D1/R2 that aren't implemented - Doesn't mention LangGraph.js agent framework - Doesn't explain SSH Proxy on Fly.io - Missing Durable Object standalone worker details ### Missing Prerequisites - No mention of OpenRouter API key requirement - Missing Fly.io account for SSH proxy - No Node.js version specification (needs 20+) - Missing pnpm workspace setup ### Incomplete Setup Instructions - Doesn't explain monorepo structure - No `.env` file examples or required variables - Missing SSH proxy deployment steps - No explanation of DO worker vs main worker - No Wrangler secrets configuration ### Vague Usage Section - Doesn't explain the actual UI (control panel, terminal, chat) - No screenshots or feature descriptions - Missing model selection details - No explanation of manual mode ### Outdated Roadmap - Lists items as incomplete that are done - Missing current features (WebSocket streaming, ANSI colors, etc.) ## New README Structure ```markdown # Bandit Runner ## About - What it is: Autonomous AI agent testing framework - What it does: Solves OverTheWire Bandit CTF challenges - Why it matters: Real-world benchmark for LLM autonomous capabilities ## Features - 🤖 LangGraph.js autonomous agent with tool use - 🔌 Real-time WebSocket streaming - 🖥️ Full terminal output with ANSI colors - 💬 Live agent reasoning/thinking display - 🎯 OpenRouter model selection (100+ models) - 🎨 Beautiful retro-terminal UI - 🔒 SSH security boundaries (read-only Bandit server) - 📊 Level progression tracking ## Architecture ### System Components 1. **Frontend** - Next.js 15 with App Router + shadcn/ui 2. **Backend** - Cloudflare Workers with OpenNext 3. **Coordinator** - Durable Objects (separate worker) 4. **Agent Runtime** - SSH Proxy on Fly.io (LangGraph.js) 5. **Target** - OverTheWire Bandit SSH server ### Data Flow Browser → WebSocket → Main Worker → DO Worker → HTTP → SSH Proxy → LangGraph Agent → SSH → Bandit Server ### Tech Stack - Next.js 15 + React 19 - Cloudflare Workers + Durable Objects - LangGraph.js for agent orchestration - OpenRouter for LLM access - Fly.io for SSH proxy hosting - shadcn/ui for components - ANSI-to-HTML for terminal rendering ## Prerequisites ### Required Accounts - Cloudflare account (free tier OK, needs Durable Objects enabled) - Fly.io account (free tier OK) - OpenRouter account + API key ### Required Software - Node.js 20+ - pnpm (package manager) - Wrangler CLI (Cloudflare) - flyctl CLI (Fly.io) - Git ## Installation ### 1. Clone Repository ```bash git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git cd bandit-runner ``` ### 2. Install Dependencies ```bash # Install pnpm if you don't have it npm install -g pnpm # Install all workspace dependencies cd bandit-runner-app pnpm install cd ../ssh-proxy npm install ``` ### 3. Configure Environment #### Cloudflare (Main Worker) No `.env` needed - uses `wrangler.jsonc` vars #### Cloudflare (DO Worker) ```bash cd bandit-runner-app/workers/bandit-agent-do cp .env.example .env.local # Add your OpenRouter API key ``` #### SSH Proxy (Fly.io) No configuration needed - OpenRouter key passed via HTTP headers ### 4. Local Development #### Option A: Run Frontend Only (with deployed backend) ```bash cd bandit-runner-app pnpm dev # Open http://localhost:3000 ``` #### Option B: Run Full Stack Locally ```bash # Terminal 1: SSH Proxy cd ssh-proxy npm run dev # Runs on http://localhost:3001 # Terminal 2: Frontend (with local proxy) cd bandit-runner-app # Update wrangler.jsonc SSH_PROXY_URL to http://localhost:3001 pnpm dev ``` ## Deployment ### Step 1: Deploy SSH Proxy to Fly.io ```bash cd ssh-proxy # Login to Fly.io (opens browser) flyctl auth login # Deploy (first time creates app) flyctl deploy # Note the URL: https://bandit-ssh-proxy.fly.dev ``` ### Step 2: Deploy Durable Object Worker ```bash cd ../bandit-runner-app/workers/bandit-agent-do # Set your OpenRouter API key wrangler secret put OPENROUTER_API_KEY # Paste your key when prompted # Deploy the DO worker wrangler deploy # Verify deployment wrangler tail ``` ### Step 3: Deploy Main Application ```bash cd ../.. # Back to bandit-runner-app root # Verify SSH_PROXY_URL in wrangler.jsonc points to your Fly.io URL # Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev" # Build and deploy pnpm run deploy # Your app will be live at: # https://bandit-runner-app..workers.dev ``` ### Step 4: Verify Deployment ```bash # Test SSH Proxy health curl https://bandit-ssh-proxy.fly.dev/ssh/health # Should return: {"status":"ok","activeConnections":0} # Test main app open https://bandit-runner-app..workers.dev ``` ## Usage ### Starting a Run 1. Open the application in your browser 2. **Select Model**: Click the model dropdown to choose from 100+ models - Search by name - Filter by provider (OpenAI, Anthropic, Google, etc.) - Filter by price range - Filter by context length 3. **Set Target Level**: Choose how far the agent should attempt (default: 5) 4. **Select Output Mode**: - **Selective** (default): Shows LLM reasoning + tool calls - **Everything**: Shows reasoning + tool calls + full command outputs 5. **Click START** ### Monitoring Progress The interface shows: - **Left Panel (Terminal)**: Full SSH session output with colors - **Right Panel (Agent Chat)**: LLM reasoning and discovered information - **Top Status Bar**: Current status, level, and model info - **Control Panel**: Pause/Resume/Stop controls ### Manual Mode (Debugging) Toggle "Manual Mode" in the terminal to: - Type commands directly into the SSH session - Assist the agent when it gets stuck - Debug connection issues ⚠️ **Note**: Runs with manual intervention are disqualified from leaderboards ## Features Explained ### Real-Time Terminal - Full PTY session capture - ANSI color support (green prompts, red errors, etc.) - Timestamps for all output - Automatic scrolling - Read-only by default (manual mode available) ### Agent Reasoning Display - LLM "thinking" messages (what command to try next) - Planning output (strategy for current level) - Password discovery announcements - Level completion summaries ### Model Selection - 100+ models from OpenRouter - Real-time search - Filter by provider, price, context length - Shows token costs - Supports all major providers ### Level Progression - Always starts at Level 0 - Automatic password extraction - Automatic SSH re-login for next level - Configurable target level cap - Retry logic for failed attempts ## Architecture Details ### Monorepo Structure ``` bandit-runner/ ├── bandit-runner-app/ # Next.js frontend + Cloudflare Workers │ ├── src/ │ │ ├── app/ # Next.js pages + API routes │ │ ├── components/ # React components (shadcn/ui) │ │ ├── hooks/ # React hooks (WebSocket) │ │ └── lib/ # Utilities, agents, storage │ ├── workers/ │ │ └── bandit-agent-do/ # Standalone Durable Object worker │ ├── scripts/ │ │ └── patch-worker.js # WebSocket intercept injector │ └── wrangler.jsonc # Main worker config ├── ssh-proxy/ # LangGraph agent on Fly.io │ ├── agent.ts # LangGraph state machine │ ├── server.ts # Express HTTP server │ ├── Dockerfile # Container definition │ └── fly.toml # Fly.io config ├── docs/ # Documentation └── public/ # Assets ``` ### WebSocket Intercept Pattern The main worker intercepts WebSocket upgrade requests **before** they reach Next.js: ```javascript // Injected by scripts/patch-worker.js if (request.headers.get('Upgrade') === 'websocket') { // Forward directly to DO, bypass Next.js return env.BANDIT_AGENT.get(id).fetch(request); } ``` This is necessary because Next.js API routes don't support WebSocket upgrades. ### Durable Object Responsibilities - Accept WebSocket connections from browsers - Maintain run state (level, passwords, status) - Call SSH proxy HTTP endpoints - Stream JSONL events from SSH proxy to WebSocket clients - Handle START/PAUSE/RESUME/STOP actions ### SSH Proxy Responsibilities - Run LangGraph.js agent state machine - Maintain SSH2 connections to Bandit server - Execute commands via PTY - Extract passwords using regex - Stream events back to DO via HTTP response body (JSONL) ### Event Flow 1. User clicks START in browser 2. Browser → WebSocket → Main Worker → DO Worker 3. DO → HTTP POST `/agent/run` → SSH Proxy 4. SSH Proxy → SSH Connection → Bandit Server 5. Agent executes commands, extracts passwords 6. SSH Proxy → JSONL events → DO 7. DO → WebSocket → Browser 8. UI updates in real-time ## Troubleshooting ### WebSocket Connection Failed - Check DO worker is deployed: `wrangler deployments list` - Verify DO binding in main worker config - Check browser console for specific errors ### SSH Proxy Not Responding - Check Fly.io status: `flyctl status` - View logs: `flyctl logs` - Test health endpoint: `curl https://your-proxy.fly.dev/ssh/health` ### Agent Not Starting - Verify OpenRouter API key: `wrangler secret list` - Check DO worker logs: `wrangler tail --name bandit-agent-do` - Ensure SSH_PROXY_URL is correct in wrangler.jsonc ### Commands Not Executing - Check SSH proxy logs: `flyctl logs` - Verify Bandit server is accessible: `ssh bandit0@bandit.labs.overthewire.org -p 2220` - Review agent reasoning in chat panel for errors ## Roadmap ### Completed ✅ - [x] LangGraph autonomous agent framework - [x] Real-time WebSocket streaming - [x] Full terminal output with ANSI colors - [x] Agent reasoning display - [x] OpenRouter model selection with search/filters - [x] Level 0 → N progression - [x] Manual mode for debugging - [x] Password extraction and advancement ### In Progress 🚧 - [ ] Retry logic with exponential backoff - [ ] Token usage and cost tracking UI - [ ] Persistent run history (D1) - [ ] Log storage (R2) ### Planned 📋 - [ ] Leaderboard (fastest completions) - [ ] Multi-agent comparison mode - [ ] Custom prompts and strategies - [ ] Agent performance analytics - [ ] Mock SSH server for testing - [ ] Expanded CTF support (beyond Bandit) ## Contributing Contributions welcome! Please: 1. Fork the repository 2. Create a feature branch (`git checkout -b feat/amazing-feature`) 3. Commit with Conventional Commits (`feat:`, `fix:`, `docs:`) 4. Push and open a Pull Request See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines. ## License GNU GPLv3 - See [COPYING.txt](COPYING.txt) ## Contact **Nicholai Vogel** [Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel) ## Acknowledgments - [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF challenges - [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent framework - [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute - [Fly.io](https://fly.io) - Global app hosting - [OpenRouter](https://openrouter.ai) - Unified LLM API - [shadcn/ui](https://ui.shadcn.com) - Component library - [OpenNext](https://opennext.js.org/) - Next.js on Workers ``` ## Key Changes ### What's Being Added 1. **Accurate architecture diagram** - Shows real components (DO, SSH Proxy, LangGraph) 2. **Complete prerequisite list** - All accounts, software, and setup requirements 3. **Monorepo structure docs** - Explains the workspace layout 4. **Step-by-step deployment** - 4 clear deployment phases 5. **Feature documentation** - What each UI component does 6. **Troubleshooting section** - Common issues and solutions 7. **Current roadmap** - Reflects actual completion status ### What's Being Removed 1. Mock architecture references (D1, R2 unimplemented features) 2. Incorrect usage instructions 3. Outdated screenshots/descriptions 4. Placeholder content from template ### What's Being Updated 1. Tech stack badges - Add LangGraph, Fly.io, OpenRouter 2. Built With section - Reflect actual dependencies 3. Installation steps - Real commands that work 4. Usage examples - Match actual UI 5. Contact links - Keep current ## Implementation Notes - Keep existing badge links/formatting style - Maintain table of contents structure - Add new sections: Features, Architecture Details, Troubleshooting - Update all code blocks with real, tested commands - Add architecture ASCII diagram for clarity - Include actual file paths and structure - Reference real configuration files (wrangler.jsonc, etc.) ## Success Criteria ✅ README accurately describes working system ✅ User can deploy from scratch following instructions ✅ No references to unimplemented features ✅ Clear troubleshooting for common issues ✅ Architecture matches production code ✅ All commands are tested and accurate