Bandit Runner
A deterministic AI testing rig for LLMs-as-agents — built on Next.js, OpenNext, and Cloudflare Workers.
Explore the docs »
View Demo
·
Report Bug
·
Request Feature
Table of Contents
About The Project
Bandit Runner is an autonomous AI agent testing framework that evaluates large language models by having them solve the OverTheWire Bandit wargame challenges via SSH.
Why it matters
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions
- Generates reproducible logs for research and public leaderboards
- Demonstrates LLM capabilities in a sandboxed, deterministic environment
Features
- 🤖 LangGraph.js Autonomous Agent - State machine with tool use and planning
- 🔌 Real-Time WebSocket Streaming - Live updates with <50ms latency
- 🖥️ Full Terminal Output - Complete SSH session with ANSI colors and formatting
- 💬 Agent Reasoning Display - See the LLM's thinking process and command selection
- 🎯 OpenRouter Integration - Access 100+ models with search and filtering
- 🎨 Beautiful Retro UI - Terminal-inspired interface with shadcn/ui components
- 🔒 Security Boundaries - Sandboxed SSH access to read-only Bandit server
- 📊 Level Progression - Automatic password extraction and advancement
- 🛠️ Manual Mode - Optional human intervention for debugging
Built With
- Frontend framework (v15 with App Router)
- UI library (v19)
- Edge compute + Durable Objects
- Next.js adapter for Cloudflare Workers
- Agent orchestration framework
- Component library
- Type safety
- Package manager
Getting Started
Prerequisites
Required Accounts
- Cloudflare account (free tier works, needs Durable Objects enabled)
- Fly.io account (free tier works)
- OpenRouter account + API key (get one here)
Required Software
- Node.js ≥ 20
- pnpm package manager
npm install -g pnpm - Wrangler CLI (Cloudflare)
npm install -g wrangler - flyctl CLI (Fly.io)
curl -L https://fly.io/install.sh | sh
Installation
1. Clone Repository
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
2. Install Dependencies
# Frontend + Cloudflare Workers
cd bandit-runner-app
pnpm install
# SSH Proxy (LangGraph agent)
cd ../ssh-proxy
npm install
3. Configure Environment
Cloudflare DO Worker:
cd bandit-runner-app/workers/bandit-agent-do
# Create .env.local with your OpenRouter API key
echo "OPENROUTER_API_KEY=your_key_here" > .env.local
Main Worker:
No .env needed - configuration is in wrangler.jsonc
4. Local Development
Option A: Frontend Only (requires deployed backend)
cd bandit-runner-app
pnpm dev
# Open http://localhost:3000
Option B: Full Stack (all components local)
# Terminal 1: SSH Proxy
cd ssh-proxy
npm run dev
# Runs on http://localhost:3001
# Terminal 2: Main App
cd bandit-runner-app
# Update wrangler.jsonc: "SSH_PROXY_URL": "http://localhost:3001"
pnpm dev
Deployment
Step 1: Deploy SSH Proxy to Fly.io
cd ssh-proxy
# Login (opens browser)
flyctl auth login
# Deploy (creates app on first run)
flyctl deploy
# Note your URL: https://bandit-ssh-proxy.fly.dev
Step 2: Deploy Durable Object Worker
cd ../bandit-runner-app/workers/bandit-agent-do
# Set OpenRouter API key as secret
wrangler secret put OPENROUTER_API_KEY
# Paste your key when prompted
# Deploy
wrangler deploy
# Verify (optional)
wrangler tail
Step 3: Deploy Main Application
cd ../.. # Back to bandit-runner-app root
# Verify wrangler.jsonc has correct SSH_PROXY_URL
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
# Build and deploy
pnpm run deploy
# Your app is live at:
# https://bandit-runner-app.<your-subdomain>.workers.dev
Step 4: Verify Deployment
# Test SSH Proxy
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Open your app
open https://bandit-runner-app.<your-subdomain>.workers.dev
Usage
Starting a Run
- Open the application in your browser
- Select Model: Click the dropdown to choose from 100+ models
- Search by name
- Filter by provider (OpenAI, Anthropic, Google, Meta, etc.)
- Filter by price range
- Filter by context length
- Set Target Level: Choose how far the agent should progress (default: 5)
- Select Output Mode:
- Selective (default): LLM reasoning + tool calls
- Everything: Reasoning + tool calls + full command outputs
- Click START
Monitoring Progress
The interface provides real-time feedback:
- Left Panel (Terminal): Full SSH session with ANSI colors and formatting
- Right Panel (Agent Chat): LLM reasoning, planning, and discoveries
- Top Bar: Current status, level, and active model
- Control Panel: Pause/Resume/Stop controls
Manual Mode (Debugging)
Toggle "Manual Mode" in the terminal footer to:
- Type commands directly into the SSH session
- Assist the agent when stuck
- Debug connection or logic issues
⚠️ Note: Manual intervention disqualifies runs from leaderboards
Architecture
System Overview
┌─────────────┐
│ Browser │ WebSocket (wss://)
└──────┬──────┘
│
↓
┌─────────────────────────────┐
│ Cloudflare Worker (Main) │ Next.js + OpenNext
│ bandit-runner-app │ Intercepts WebSocket upgrades
└──────┬──────────────────────┘
│
↓
┌─────────────────────────────┐
│ Durable Object Worker │ WebSocket connection manager
│ bandit-agent-do │ Run state coordinator
└──────┬──────────────────────┘
│ HTTP (JSONL streaming)
↓
┌─────────────────────────────┐
│ SSH Proxy (Fly.io) │ LangGraph.js autonomous agent
│ bandit-ssh-proxy.fly.dev │ SSH2 client
└──────┬──────────────────────┘
│ SSH Protocol
↓
┌─────────────────────────────┐
│ Bandit SSH Server │ CTF challenges
│ overthewire.org:2220 │ Real command execution
└─────────────────────────────┘
Component Responsibilities
Frontend (Next.js 15)
- React UI with shadcn/ui components
- Real-time terminal with ANSI rendering
- Agent chat display
- Model selection and configuration
Main Worker (Cloudflare)
- Serves Next.js application
- Intercepts WebSocket upgrades before Next.js routing
- Forwards WS connections to Durable Object
Durable Object (Separate Worker)
- Manages WebSocket connections from browsers
- Maintains run state (level, status, passwords)
- Calls SSH proxy HTTP endpoints
- Streams JSONL events to connected clients
SSH Proxy (Fly.io)
- Runs LangGraph.js agent state machine
- Maintains SSH2 connections to Bandit server
- Executes commands via PTY (full terminal capture)
- Extracts passwords using regex patterns
- Streams events back to DO via HTTP response
Data Flow
- User clicks START in browser
- Browser establishes WebSocket to Main Worker
- Worker forwards to Durable Object
- DO sends HTTP POST to SSH Proxy
/agent/run - SSH Proxy runs LangGraph agent
- Agent connects via SSH to Bandit server
- Commands executed, passwords extracted
- Events stream: SSH Proxy → DO → WebSocket → Browser
- UI updates in real-time
Monorepo Structure
bandit-runner/
├── bandit-runner-app/ # Frontend + Cloudflare Workers
│ ├── src/
│ │ ├── app/ # Next.js pages + API routes
│ │ ├── components/ # React components
│ │ ├── hooks/ # useAgentWebSocket
│ │ └── lib/ # Utilities, types
│ ├── workers/
│ │ └── bandit-agent-do/ # Standalone Durable Object
│ ├── scripts/
│ │ └── patch-worker.js # Injects WebSocket intercept
│ └── wrangler.jsonc # Main worker config
├── ssh-proxy/ # LangGraph agent runtime
│ ├── agent.ts # State machine definition
│ ├── server.ts # Express HTTP server
│ ├── Dockerfile # Container image
│ └── fly.toml # Fly.io deployment config
└── docs/ # Documentation
Troubleshooting
WebSocket Connection Failed
Symptoms: "WebSocket connection failed" or "NS_ERROR_WEBSOCKET_CONNECTION_REFUSED"
Solutions:
# Check DO worker is deployed
wrangler deployments list
# Check browser console for specific errors
# Verify DO binding in wrangler.jsonc
# Test DO directly
curl https://bandit-runner-app.<your-subdomain>.workers.dev/api/agent/test-run/status
SSH Proxy Not Responding
Symptoms: Agent starts but no terminal output appears
Solutions:
# Check Fly.io app status
flyctl status
# View real-time logs
flyctl logs
# Test health endpoint
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Restart if needed
flyctl restart
Agent Not Starting
Symptoms: Run status stays "IDLE" or errors in console
Solutions:
# Verify OpenRouter API key is set
wrangler secret list
# Check DO worker logs
wrangler tail --name bandit-agent-do
# Ensure SSH_PROXY_URL is correct in wrangler.jsonc
# Should be: https://bandit-ssh-proxy.fly.dev (not http://)
Commands Not Executing
Symptoms: Agent thinks but no SSH commands run
Solutions:
# Check SSH proxy logs
flyctl logs
# Verify Bandit server is accessible
ssh bandit0@bandit.labs.overthewire.org -p 2220
# Password: bandit0
# Review agent reasoning in chat panel
# Look for error messages or failed attempts
Build or Deploy Errors
Common issues:
# "Cannot find module" during build
cd bandit-runner-app && pnpm install
cd ssh-proxy && npm install
# "Account ID not found"
# Add account_id to workers/bandit-agent-do/wrangler.toml
# "Script not found: bandit-agent-do"
# Deploy DO worker first:
cd workers/bandit-agent-do && wrangler deploy
Roadmap
Completed ✅
- LangGraph.js autonomous agent framework
- Real-time WebSocket streaming (JSONL events)
- Full terminal output with ANSI colors
- Agent reasoning and thinking display
- OpenRouter model selection with search/filters
- Level 0 → N automatic progression
- Password extraction and advancement
- Manual mode for debugging
- Durable Object state management
In Progress 🚧
- Retry logic with exponential backoff
- Token usage and cost tracking UI
- Persistent run history (D1 database)
- Log storage and analysis (R2 bucket)
Planned 📋
- Public leaderboard (fastest completions)
- Multi-agent comparison mode
- Custom system prompts and strategies
- Agent performance analytics dashboard
- Mock SSH server for testing
- Expanded CTF support (Natas, Krypton, etc.)
- Run replay and debugging tools
See the open issues for discussion and feature requests.
Contributing
Contributions are welcome.
- Fork the Project
- Create your Feature Branch (
git checkout -b feat/amazing) - Commit (
pnpm commit) using Conventional Commits - Push (
git push origin feat/amazing) - Open a Pull Request
Top Contributors
License
Distributed under the GNU GPLv3 License.
See LICENSE for details.
Contact
Nicholai Vogel Website • LinkedIn • Instagram
Project Link: https://git.biohazardvfx.com/Nicholai/bandit-runner
Acknowledgments
- OverTheWire Bandit - CTF wargame challenges
- LangGraph.js - Agent orchestration framework
- Cloudflare Workers - Edge compute platform
- Fly.io - Global application hosting
- OpenRouter - Unified LLM API access
- OpenNext - Next.js adapter for Cloudflare
- shadcn/ui - Beautiful component library
- ANSI-to-HTML - Terminal color rendering
