[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![GPLv3 License][license-shield]][license-url]
[![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org)
[![LinkedIn][linkedin-shield]][linkedin-url]
---
Table of Contents
- About The Project
- Getting Started
- Usage
- Architecture
- Roadmap
- Contributing
- License
- Contact
- Acknowledgments
---
## About The Project
[![Product Screenshot][product-screenshot]](#)
**Bandit Runner** is an autonomous AI agent testing framework that evaluates large language models by having them solve the **OverTheWire Bandit** wargame challenges via SSH.
**Why it matters**
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions
- Generates reproducible logs for research and public leaderboards
- Demonstrates LLM capabilities in a sandboxed, deterministic environment
### Features
- ๐ค **LangGraph.js Autonomous Agent** - State machine with tool use and planning
- ๐ **Real-Time WebSocket Streaming** - Live updates with <50ms latency
- ๐ฅ๏ธ **Full Terminal Output** - Complete SSH session with ANSI colors and formatting
- ๐ฌ **Agent Reasoning Display** - See the LLM's thinking process and command selection
- ๐ฏ **OpenRouter Integration** - Access 100+ models with search and filtering
- ๐จ **Beautiful Retro UI** - Terminal-inspired interface with shadcn/ui components
- ๐ **Security Boundaries** - Sandboxed SSH access to read-only Bandit server
- ๐ **Level Progression** - Automatic password extraction and advancement
- ๐ ๏ธ **Manual Mode** - Optional human intervention for debugging
(back to top)
---
### Built With
* [![Next.js][Next.js]][Next-url] - Frontend framework (v15 with App Router)
* [![React][React.js]][React-url] - UI library (v19)
* [![Cloudflare][Cloudflare-badge]][Cloudflare-url] - Edge compute + Durable Objects
* [![OpenNext][OpenNext-badge]][OpenNext-url] - Next.js adapter for Cloudflare Workers
* [![LangGraph][LangGraph-badge]][LangGraph-url] - Agent orchestration framework
* [![Shadcn/UI][Shadcn-badge]][Shadcn-url] - Component library
* [![TypeScript][TypeScript-badge]][TypeScript-url] - Type safety
* [![pnpm][pnpm-badge]][pnpm-url] - Package manager
(back to top)
---
## Getting Started
### Prerequisites
#### Required Accounts
* **Cloudflare** account (free tier works, needs Durable Objects enabled)
* **Fly.io** account (free tier works)
* **OpenRouter** account + API key ([get one here](https://openrouter.ai))
#### Required Software
* **Node.js โฅ 20**
* **pnpm** package manager
```bash
npm install -g pnpm
```
* **Wrangler CLI** (Cloudflare)
```bash
npm install -g wrangler
```
* **flyctl CLI** (Fly.io)
```bash
curl -L https://fly.io/install.sh | sh
```
### Installation
#### 1. Clone Repository
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
```
#### 2. Install Dependencies
```bash
# Frontend + Cloudflare Workers
cd bandit-runner-app
pnpm install
# SSH Proxy (LangGraph agent)
cd ../ssh-proxy
npm install
```
#### 3. Configure Environment
**Cloudflare DO Worker:**
```bash
cd bandit-runner-app/workers/bandit-agent-do
# Create .env.local with your OpenRouter API key
echo "OPENROUTER_API_KEY=your_key_here" > .env.local
```
**Main Worker:**
No `.env` needed - configuration is in `wrangler.jsonc`
#### 4. Local Development
**Option A: Frontend Only** (requires deployed backend)
```bash
cd bandit-runner-app
pnpm dev
# Open http://localhost:3000
```
**Option B: Full Stack** (all components local)
```bash
# Terminal 1: SSH Proxy
cd ssh-proxy
npm run dev
# Runs on http://localhost:3001
# Terminal 2: Main App
cd bandit-runner-app
# Update wrangler.jsonc: "SSH_PROXY_URL": "http://localhost:3001"
pnpm dev
```
(back to top)
---
## Deployment
### Step 1: Deploy SSH Proxy to Fly.io
```bash
cd ssh-proxy
# Login (opens browser)
flyctl auth login
# Deploy (creates app on first run)
flyctl deploy
# Note your URL: https://bandit-ssh-proxy.fly.dev
```
### Step 2: Deploy Durable Object Worker
```bash
cd ../bandit-runner-app/workers/bandit-agent-do
# Set OpenRouter API key as secret
wrangler secret put OPENROUTER_API_KEY
# Paste your key when prompted
# Deploy
wrangler deploy
# Verify (optional)
wrangler tail
```
### Step 3: Deploy Main Application
```bash
cd ../.. # Back to bandit-runner-app root
# Verify wrangler.jsonc has correct SSH_PROXY_URL
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
# Build and deploy
pnpm run deploy
# Your app is live at:
# https://bandit-runner-app..workers.dev
```
### Step 4: Verify Deployment
```bash
# Test SSH Proxy
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Open your app
open https://bandit-runner-app..workers.dev
```
(back to top)
---
## Usage
### Starting a Run
1. **Open the application** in your browser
2. **Select Model**: Click the dropdown to choose from 100+ models
- Search by name
- Filter by provider (OpenAI, Anthropic, Google, Meta, etc.)
- Filter by price range
- Filter by context length
3. **Set Target Level**: Choose how far the agent should progress (default: 5)
4. **Select Output Mode**:
- **Selective** (default): LLM reasoning + tool calls
- **Everything**: Reasoning + tool calls + full command outputs
5. **Click START**
### Monitoring Progress
The interface provides real-time feedback:
- **Left Panel (Terminal)**: Full SSH session with ANSI colors and formatting
- **Right Panel (Agent Chat)**: LLM reasoning, planning, and discoveries
- **Top Bar**: Current status, level, and active model
- **Control Panel**: Pause/Resume/Stop controls
### Manual Mode (Debugging)
Toggle **"Manual Mode"** in the terminal footer to:
- Type commands directly into the SSH session
- Assist the agent when stuck
- Debug connection or logic issues
โ ๏ธ **Note**: Manual intervention disqualifies runs from leaderboards
(back to top)
---
## Architecture
### System Overview
```text
โโโโโโโโโโโโโโโ
โ Browser โ WebSocket (wss://)
โโโโโโโโฌโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Cloudflare Worker (Main) โ Next.js + OpenNext
โ bandit-runner-app โ Intercepts WebSocket upgrades
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Durable Object Worker โ WebSocket connection manager
โ bandit-agent-do โ Run state coordinator
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ HTTP (JSONL streaming)
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ SSH Proxy (Fly.io) โ LangGraph.js autonomous agent
โ bandit-ssh-proxy.fly.dev โ SSH2 client
โโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโ
โ SSH Protocol
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Bandit SSH Server โ CTF challenges
โ overthewire.org:2220 โ Real command execution
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Component Responsibilities
**Frontend (Next.js 15)**
- React UI with shadcn/ui components
- Real-time terminal with ANSI rendering
- Agent chat display
- Model selection and configuration
**Main Worker (Cloudflare)**
- Serves Next.js application
- Intercepts WebSocket upgrades before Next.js routing
- Forwards WS connections to Durable Object
**Durable Object (Separate Worker)**
- Manages WebSocket connections from browsers
- Maintains run state (level, status, passwords)
- Calls SSH proxy HTTP endpoints
- Streams JSONL events to connected clients
**SSH Proxy (Fly.io)**
- Runs LangGraph.js agent state machine
- Maintains SSH2 connections to Bandit server
- Executes commands via PTY (full terminal capture)
- Extracts passwords using regex patterns
- Streams events back to DO via HTTP response
### Data Flow
1. User clicks START in browser
2. Browser establishes WebSocket to Main Worker
3. Worker forwards to Durable Object
4. DO sends HTTP POST to SSH Proxy `/agent/run`
5. SSH Proxy runs LangGraph agent
6. Agent connects via SSH to Bandit server
7. Commands executed, passwords extracted
8. Events stream: SSH Proxy โ DO โ WebSocket โ Browser
9. UI updates in real-time
### Monorepo Structure
```
bandit-runner/
โโโ bandit-runner-app/ # Frontend + Cloudflare Workers
โ โโโ src/
โ โ โโโ app/ # Next.js pages + API routes
โ โ โโโ components/ # React components
โ โ โโโ hooks/ # useAgentWebSocket
โ โ โโโ lib/ # Utilities, types
โ โโโ workers/
โ โ โโโ bandit-agent-do/ # Standalone Durable Object
โ โโโ scripts/
โ โ โโโ patch-worker.js # Injects WebSocket intercept
โ โโโ wrangler.jsonc # Main worker config
โโโ ssh-proxy/ # LangGraph agent runtime
โ โโโ agent.ts # State machine definition
โ โโโ server.ts # Express HTTP server
โ โโโ Dockerfile # Container image
โ โโโ fly.toml # Fly.io deployment config
โโโ docs/ # Documentation
```
(back to top)
---
## Troubleshooting
### WebSocket Connection Failed
**Symptoms**: "WebSocket connection failed" or "NS_ERROR_WEBSOCKET_CONNECTION_REFUSED"
**Solutions**:
```bash
# Check DO worker is deployed
wrangler deployments list
# Check browser console for specific errors
# Verify DO binding in wrangler.jsonc
# Test DO directly
curl https://bandit-runner-app..workers.dev/api/agent/test-run/status
```
### SSH Proxy Not Responding
**Symptoms**: Agent starts but no terminal output appears
**Solutions**:
```bash
# Check Fly.io app status
flyctl status
# View real-time logs
flyctl logs
# Test health endpoint
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Restart if needed
flyctl restart
```
### Agent Not Starting
**Symptoms**: Run status stays "IDLE" or errors in console
**Solutions**:
```bash
# Verify OpenRouter API key is set
wrangler secret list
# Check DO worker logs
wrangler tail --name bandit-agent-do
# Ensure SSH_PROXY_URL is correct in wrangler.jsonc
# Should be: https://bandit-ssh-proxy.fly.dev (not http://)
```
### Commands Not Executing
**Symptoms**: Agent thinks but no SSH commands run
**Solutions**:
```bash
# Check SSH proxy logs
flyctl logs
# Verify Bandit server is accessible
ssh bandit0@bandit.labs.overthewire.org -p 2220
# Password: bandit0
# Review agent reasoning in chat panel
# Look for error messages or failed attempts
```
### Build or Deploy Errors
**Common issues**:
```bash
# "Cannot find module" during build
cd bandit-runner-app && pnpm install
cd ssh-proxy && npm install
# "Account ID not found"
# Add account_id to workers/bandit-agent-do/wrangler.toml
# "Script not found: bandit-agent-do"
# Deploy DO worker first:
cd workers/bandit-agent-do && wrangler deploy
```
(back to top)
---
## Roadmap
### Completed โ
* [x] LangGraph.js autonomous agent framework
* [x] Real-time WebSocket streaming (JSONL events)
* [x] Full terminal output with ANSI colors
* [x] Agent reasoning and thinking display
* [x] OpenRouter model selection with search/filters
* [x] Level 0 โ N automatic progression
* [x] Password extraction and advancement
* [x] Manual mode for debugging
* [x] Durable Object state management
### In Progress ๐ง
* [ ] Retry logic with exponential backoff
* [ ] Token usage and cost tracking UI
* [ ] Persistent run history (D1 database)
* [ ] Log storage and analysis (R2 bucket)
### Planned ๐
* [ ] Public leaderboard (fastest completions)
* [ ] Multi-agent comparison mode
* [ ] Custom system prompts and strategies
* [ ] Agent performance analytics dashboard
* [ ] Mock SSH server for testing
* [ ] Expanded CTF support (Natas, Krypton, etc.)
* [ ] Run replay and debugging tools
See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for discussion and feature requests.
(back to top)
---
## Contributing
Contributions are welcome.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feat/amazing`)
3. Commit (`pnpm commit`) using Conventional Commits
4. Push (`git push origin feat/amazing`)
5. Open a Pull Request
### Top Contributors
(back to top)
---
## License
Distributed under the **GNU GPLv3** License.
See `LICENSE` for details.
(back to top)
---
## Contact
**Nicholai Vogel**
[Website](https://nicholai.work) โข [LinkedIn](https://linkedin.com/in/nicholai-vogel) โข [Instagram](https://instagram.com/nicholai.exe)
Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)
(back to top)
---
## Acknowledgments
* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF wargame challenges
* [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent orchestration framework
* [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute platform
* [Fly.io](https://fly.io) - Global application hosting
* [OpenRouter](https://openrouter.ai) - Unified LLM API access
* [OpenNext](https://opennext.js.org/) - Next.js adapter for Cloudflare
* [shadcn/ui](https://ui.shadcn.com) - Beautiful component library
* [ANSI-to-HTML](https://github.com/rburns/ansi-to-html) - Terminal color rendering
(back to top)
---
[contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge
[contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge
[forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members
[stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge
[stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers
[issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge
[issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues
[license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge
[license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/nicholai-vogel
[product-screenshot]: public/screenshot.png
[Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://react.dev/
[Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white
[Cloudflare-url]: https://developers.cloudflare.com/workers/
[OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white
[OpenNext-url]: https://opennext.js.org/
[LangGraph-badge]: https://img.shields.io/badge/LangGraph-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white
[LangGraph-url]: https://langchain-ai.github.io/langgraphjs/
[Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white
[Shadcn-url]: https://ui.shadcn.com
[TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white
[TypeScript-url]: https://www.typescriptlang.org/
[pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white
[pnpm-url]: https://pnpm.io
[conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white