[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url] [![GPLv3 License][license-shield]][license-url] [![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org) [![LinkedIn][linkedin-shield]][linkedin-url]
Logo

Bandit Runner

A deterministic AI testing rig for LLMs-as-agents โ€” built on Next.js, OpenNext, and Cloudflare Workers.
Explore the docs ยป

View Demo · Report Bug · Request Feature

---
Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Architecture
  5. Roadmap
  6. Contributing
  7. License
  8. Contact
  9. Acknowledgments
--- ## About The Project [![Product Screenshot][product-screenshot]](#) **Bandit Runner** is an autonomous AI agent testing framework that evaluates large language models by having them solve the **OverTheWire Bandit** wargame challenges via SSH. **Why it matters** - Provides a real-world, hands-on benchmark for autonomous reasoning and command execution - Tests tool use (SSH), planning, error handling, and persistence under real network conditions - Generates reproducible logs for research and public leaderboards - Demonstrates LLM capabilities in a sandboxed, deterministic environment ### Features - ๐Ÿค– **LangGraph.js Autonomous Agent** - State machine with tool use and planning - ๐Ÿ”Œ **Real-Time WebSocket Streaming** - Live updates with <50ms latency - ๐Ÿ–ฅ๏ธ **Full Terminal Output** - Complete SSH session with ANSI colors and formatting - ๐Ÿ’ฌ **Agent Reasoning Display** - See the LLM's thinking process and command selection - ๐ŸŽฏ **OpenRouter Integration** - Access 100+ models with search and filtering - ๐ŸŽจ **Beautiful Retro UI** - Terminal-inspired interface with shadcn/ui components - ๐Ÿ”’ **Security Boundaries** - Sandboxed SSH access to read-only Bandit server - ๐Ÿ“Š **Level Progression** - Automatic password extraction and advancement - ๐Ÿ› ๏ธ **Manual Mode** - Optional human intervention for debugging

(back to top)

--- ### Built With * [![Next.js][Next.js]][Next-url] - Frontend framework (v15 with App Router) * [![React][React.js]][React-url] - UI library (v19) * [![Cloudflare][Cloudflare-badge]][Cloudflare-url] - Edge compute + Durable Objects * [![OpenNext][OpenNext-badge]][OpenNext-url] - Next.js adapter for Cloudflare Workers * [![LangGraph][LangGraph-badge]][LangGraph-url] - Agent orchestration framework * [![Shadcn/UI][Shadcn-badge]][Shadcn-url] - Component library * [![TypeScript][TypeScript-badge]][TypeScript-url] - Type safety * [![pnpm][pnpm-badge]][pnpm-url] - Package manager

(back to top)

--- ## Getting Started ### Prerequisites #### Required Accounts * **Cloudflare** account (free tier works, needs Durable Objects enabled) * **Fly.io** account (free tier works) * **OpenRouter** account + API key ([get one here](https://openrouter.ai)) #### Required Software * **Node.js โ‰ฅ 20** * **pnpm** package manager ```bash npm install -g pnpm ``` * **Wrangler CLI** (Cloudflare) ```bash npm install -g wrangler ``` * **flyctl CLI** (Fly.io) ```bash curl -L https://fly.io/install.sh | sh ``` ### Installation #### 1. Clone Repository ```bash git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git cd bandit-runner ``` #### 2. Install Dependencies ```bash # Frontend + Cloudflare Workers cd bandit-runner-app pnpm install # SSH Proxy (LangGraph agent) cd ../ssh-proxy npm install ``` #### 3. Configure Environment **Cloudflare DO Worker:** ```bash cd bandit-runner-app/workers/bandit-agent-do # Create .env.local with your OpenRouter API key echo "OPENROUTER_API_KEY=your_key_here" > .env.local ``` **Main Worker:** No `.env` needed - configuration is in `wrangler.jsonc` #### 4. Local Development **Option A: Frontend Only** (requires deployed backend) ```bash cd bandit-runner-app pnpm dev # Open http://localhost:3000 ``` **Option B: Full Stack** (all components local) ```bash # Terminal 1: SSH Proxy cd ssh-proxy npm run dev # Runs on http://localhost:3001 # Terminal 2: Main App cd bandit-runner-app # Update wrangler.jsonc: "SSH_PROXY_URL": "http://localhost:3001" pnpm dev ```

(back to top)

--- ## Deployment ### Step 1: Deploy SSH Proxy to Fly.io ```bash cd ssh-proxy # Login (opens browser) flyctl auth login # Deploy (creates app on first run) flyctl deploy # Note your URL: https://bandit-ssh-proxy.fly.dev ``` ### Step 2: Deploy Durable Object Worker ```bash cd ../bandit-runner-app/workers/bandit-agent-do # Set OpenRouter API key as secret wrangler secret put OPENROUTER_API_KEY # Paste your key when prompted # Deploy wrangler deploy # Verify (optional) wrangler tail ``` ### Step 3: Deploy Main Application ```bash cd ../.. # Back to bandit-runner-app root # Verify wrangler.jsonc has correct SSH_PROXY_URL # Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev" # Build and deploy pnpm run deploy # Your app is live at: # https://bandit-runner-app..workers.dev ``` ### Step 4: Verify Deployment ```bash # Test SSH Proxy curl https://bandit-ssh-proxy.fly.dev/ssh/health # Should return: {"status":"ok","activeConnections":0} # Open your app open https://bandit-runner-app..workers.dev ```

(back to top)

--- ## Usage ### Starting a Run 1. **Open the application** in your browser 2. **Select Model**: Click the dropdown to choose from 100+ models - Search by name - Filter by provider (OpenAI, Anthropic, Google, Meta, etc.) - Filter by price range - Filter by context length 3. **Set Target Level**: Choose how far the agent should progress (default: 5) 4. **Select Output Mode**: - **Selective** (default): LLM reasoning + tool calls - **Everything**: Reasoning + tool calls + full command outputs 5. **Click START** ### Monitoring Progress The interface provides real-time feedback: - **Left Panel (Terminal)**: Full SSH session with ANSI colors and formatting - **Right Panel (Agent Chat)**: LLM reasoning, planning, and discoveries - **Top Bar**: Current status, level, and active model - **Control Panel**: Pause/Resume/Stop controls ### Manual Mode (Debugging) Toggle **"Manual Mode"** in the terminal footer to: - Type commands directly into the SSH session - Assist the agent when stuck - Debug connection or logic issues โš ๏ธ **Note**: Manual intervention disqualifies runs from leaderboards

(back to top)

--- ## Architecture ### System Overview ```text โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Browser โ”‚ WebSocket (wss://) โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Cloudflare Worker (Main) โ”‚ Next.js + OpenNext โ”‚ bandit-runner-app โ”‚ Intercepts WebSocket upgrades โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Durable Object Worker โ”‚ WebSocket connection manager โ”‚ bandit-agent-do โ”‚ Run state coordinator โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ HTTP (JSONL streaming) โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ SSH Proxy (Fly.io) โ”‚ LangGraph.js autonomous agent โ”‚ bandit-ssh-proxy.fly.dev โ”‚ SSH2 client โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ SSH Protocol โ†“ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Bandit SSH Server โ”‚ CTF challenges โ”‚ overthewire.org:2220 โ”‚ Real command execution โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` ### Component Responsibilities **Frontend (Next.js 15)** - React UI with shadcn/ui components - Real-time terminal with ANSI rendering - Agent chat display - Model selection and configuration **Main Worker (Cloudflare)** - Serves Next.js application - Intercepts WebSocket upgrades before Next.js routing - Forwards WS connections to Durable Object **Durable Object (Separate Worker)** - Manages WebSocket connections from browsers - Maintains run state (level, status, passwords) - Calls SSH proxy HTTP endpoints - Streams JSONL events to connected clients **SSH Proxy (Fly.io)** - Runs LangGraph.js agent state machine - Maintains SSH2 connections to Bandit server - Executes commands via PTY (full terminal capture) - Extracts passwords using regex patterns - Streams events back to DO via HTTP response ### Data Flow 1. User clicks START in browser 2. Browser establishes WebSocket to Main Worker 3. Worker forwards to Durable Object 4. DO sends HTTP POST to SSH Proxy `/agent/run` 5. SSH Proxy runs LangGraph agent 6. Agent connects via SSH to Bandit server 7. Commands executed, passwords extracted 8. Events stream: SSH Proxy โ†’ DO โ†’ WebSocket โ†’ Browser 9. UI updates in real-time ### Monorepo Structure ``` bandit-runner/ โ”œโ”€โ”€ bandit-runner-app/ # Frontend + Cloudflare Workers โ”‚ โ”œโ”€โ”€ src/ โ”‚ โ”‚ โ”œโ”€โ”€ app/ # Next.js pages + API routes โ”‚ โ”‚ โ”œโ”€โ”€ components/ # React components โ”‚ โ”‚ โ”œโ”€โ”€ hooks/ # useAgentWebSocket โ”‚ โ”‚ โ””โ”€โ”€ lib/ # Utilities, types โ”‚ โ”œโ”€โ”€ workers/ โ”‚ โ”‚ โ””โ”€โ”€ bandit-agent-do/ # Standalone Durable Object โ”‚ โ”œโ”€โ”€ scripts/ โ”‚ โ”‚ โ””โ”€โ”€ patch-worker.js # Injects WebSocket intercept โ”‚ โ””โ”€โ”€ wrangler.jsonc # Main worker config โ”œโ”€โ”€ ssh-proxy/ # LangGraph agent runtime โ”‚ โ”œโ”€โ”€ agent.ts # State machine definition โ”‚ โ”œโ”€โ”€ server.ts # Express HTTP server โ”‚ โ”œโ”€โ”€ Dockerfile # Container image โ”‚ โ””โ”€โ”€ fly.toml # Fly.io deployment config โ””โ”€โ”€ docs/ # Documentation ```

(back to top)

--- ## Troubleshooting ### WebSocket Connection Failed **Symptoms**: "WebSocket connection failed" or "NS_ERROR_WEBSOCKET_CONNECTION_REFUSED" **Solutions**: ```bash # Check DO worker is deployed wrangler deployments list # Check browser console for specific errors # Verify DO binding in wrangler.jsonc # Test DO directly curl https://bandit-runner-app..workers.dev/api/agent/test-run/status ``` ### SSH Proxy Not Responding **Symptoms**: Agent starts but no terminal output appears **Solutions**: ```bash # Check Fly.io app status flyctl status # View real-time logs flyctl logs # Test health endpoint curl https://bandit-ssh-proxy.fly.dev/ssh/health # Should return: {"status":"ok","activeConnections":0} # Restart if needed flyctl restart ``` ### Agent Not Starting **Symptoms**: Run status stays "IDLE" or errors in console **Solutions**: ```bash # Verify OpenRouter API key is set wrangler secret list # Check DO worker logs wrangler tail --name bandit-agent-do # Ensure SSH_PROXY_URL is correct in wrangler.jsonc # Should be: https://bandit-ssh-proxy.fly.dev (not http://) ``` ### Commands Not Executing **Symptoms**: Agent thinks but no SSH commands run **Solutions**: ```bash # Check SSH proxy logs flyctl logs # Verify Bandit server is accessible ssh bandit0@bandit.labs.overthewire.org -p 2220 # Password: bandit0 # Review agent reasoning in chat panel # Look for error messages or failed attempts ``` ### Build or Deploy Errors **Common issues**: ```bash # "Cannot find module" during build cd bandit-runner-app && pnpm install cd ssh-proxy && npm install # "Account ID not found" # Add account_id to workers/bandit-agent-do/wrangler.toml # "Script not found: bandit-agent-do" # Deploy DO worker first: cd workers/bandit-agent-do && wrangler deploy ```

(back to top)

--- ## Roadmap ### Completed โœ… * [x] LangGraph.js autonomous agent framework * [x] Real-time WebSocket streaming (JSONL events) * [x] Full terminal output with ANSI colors * [x] Agent reasoning and thinking display * [x] OpenRouter model selection with search/filters * [x] Level 0 โ†’ N automatic progression * [x] Password extraction and advancement * [x] Manual mode for debugging * [x] Durable Object state management ### In Progress ๐Ÿšง * [ ] Retry logic with exponential backoff * [ ] Token usage and cost tracking UI * [ ] Persistent run history (D1 database) * [ ] Log storage and analysis (R2 bucket) ### Planned ๐Ÿ“‹ * [ ] Public leaderboard (fastest completions) * [ ] Multi-agent comparison mode * [ ] Custom system prompts and strategies * [ ] Agent performance analytics dashboard * [ ] Mock SSH server for testing * [ ] Expanded CTF support (Natas, Krypton, etc.) * [ ] Run replay and debugging tools See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for discussion and feature requests.

(back to top)

--- ## Contributing Contributions are welcome. 1. Fork the Project 2. Create your Feature Branch (`git checkout -b feat/amazing`) 3. Commit (`pnpm commit`) using Conventional Commits 4. Push (`git push origin feat/amazing`) 5. Open a Pull Request ### Top Contributors Contributors

(back to top)

--- ## License Distributed under the **GNU GPLv3** License. See `LICENSE` for details.

(back to top)

--- ## Contact **Nicholai Vogel** [Website](https://nicholai.work) โ€ข [LinkedIn](https://linkedin.com/in/nicholai-vogel) โ€ข [Instagram](https://instagram.com/nicholai.exe) Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)

(back to top)

--- ## Acknowledgments * [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF wargame challenges * [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent orchestration framework * [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute platform * [Fly.io](https://fly.io) - Global application hosting * [OpenRouter](https://openrouter.ai) - Unified LLM API access * [OpenNext](https://opennext.js.org/) - Next.js adapter for Cloudflare * [shadcn/ui](https://ui.shadcn.com) - Beautiful component library * [ANSI-to-HTML](https://github.com/rburns/ansi-to-html) - Terminal color rendering

(back to top)

--- [contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge [contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors [forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge [forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members [stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge [stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers [issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge [issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues [license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge [license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555 [linkedin-url]: https://linkedin.com/in/nicholai-vogel [product-screenshot]: public/screenshot.png [Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white [Next-url]: https://nextjs.org/ [React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB [React-url]: https://react.dev/ [Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white [Cloudflare-url]: https://developers.cloudflare.com/workers/ [OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white [OpenNext-url]: https://opennext.js.org/ [LangGraph-badge]: https://img.shields.io/badge/LangGraph-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white [LangGraph-url]: https://langchain-ai.github.io/langgraphjs/ [Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white [Shadcn-url]: https://ui.shadcn.com [TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white [TypeScript-url]: https://www.typescriptlang.org/ [pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white [pnpm-url]: https://pnpm.io [conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white