bandit-runner/README.md

<a id="readme-top"></a>

<!-- PROJECT SHIELDS -->
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![GPLv3 License][license-shield]][license-url]
[![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org)
[![LinkedIn][linkedin-shield]][linkedin-url]

<!-- PROJECT LOGO -->
<br />
<div align="center">
  <a href="https://git.biohazardvfx.com/Nicholai/bandit-runner">
    <img src="public/bandit-logo.png" alt="Logo" width="100" height="100">
  </a>

  <h3 align="center">Bandit Runner</h3>

  <p align="center">
    A deterministic AI testing rig for LLMs-as-agents — built on Next.js, OpenNext, and Cloudflare Workers.
    <br />
    <a href="https://git.biohazardvfx.com/Nicholai/bandit-runner"><strong>Explore the docs »</strong></a>
    <br />
    <br />
    <a href="#">View Demo</a>
    &middot;
    <a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/issues/new?labels=bug">Report Bug</a>
    &middot;
    <a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/issues/new?labels=enhancement">Request Feature</a>
  </p>
</div>

---

<!-- TABLE OF CONTENTS -->
<details>
  <summary>Table of Contents</summary>
  <ol>
    <li><a href="#about-the-project">About The Project</a>
      <ul>
        <li><a href="#core-concepts">Core Concepts</a></li>
        <li><a href="#built-with">Built With</a></li>
      </ul>
    </li>
    <li><a href="#getting-started">Getting Started</a>
      <ul>
        <li><a href="#prerequisites">Prerequisites</a></li>
        <li><a href="#installation">Installation</a></li>
      </ul>
    </li>
    <li><a href="#usage">Usage</a></li>
    <li><a href="#architecture">Architecture</a></li>
    <li><a href="#roadmap">Roadmap</a></li>
    <li><a href="#contributing">Contributing</a></li>
    <li><a href="#license">License</a></li>
    <li><a href="#contact">Contact</a></li>
    <li><a href="#acknowledgments">Acknowledgments</a></li>
  </ol>
</details>

---

## About The Project

[![Product Screenshot][product-screenshot]](#)

**Bandit Runner** is an autonomous AI agent testing framework that evaluates large language models by having them solve the **OverTheWire Bandit** wargame challenges via SSH.

**Why it matters**
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions
- Generates reproducible logs for research and public leaderboards
- Demonstrates LLM capabilities in a sandboxed, deterministic environment

### Features

- 🤖 **LangGraph.js Autonomous Agent** - State machine with tool use and planning
- 🔌 **Real-Time WebSocket Streaming** - Live updates with <50ms latency
- 🖥️ **Full Terminal Output** - Complete SSH session with ANSI colors and formatting
- 💬 **Agent Reasoning Display** - See the LLM's thinking process and command selection
- 🎯 **OpenRouter Integration** - Access 100+ models with search and filtering
- 🎨 **Beautiful Retro UI** - Terminal-inspired interface with shadcn/ui components
- 🔒 **Security Boundaries** - Sandboxed SSH access to read-only Bandit server
- 📊 **Level Progression** - Automatic password extraction and advancement
- 🛠️ **Manual Mode** - Optional human intervention for debugging

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

### Built With

* [![Next.js][Next.js]][Next-url] - Frontend framework (v15 with App Router)
* [![React][React.js]][React-url] - UI library (v19)
* [![Cloudflare][Cloudflare-badge]][Cloudflare-url] - Edge compute + Durable Objects
* [![OpenNext][OpenNext-badge]][OpenNext-url] - Next.js adapter for Cloudflare Workers
* [![LangGraph][LangGraph-badge]][LangGraph-url] - Agent orchestration framework
* [![Shadcn/UI][Shadcn-badge]][Shadcn-url] - Component library
* [![TypeScript][TypeScript-badge]][TypeScript-url] - Type safety
* [![pnpm][pnpm-badge]][pnpm-url] - Package manager

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Getting Started

### Prerequisites

#### Required Accounts
* **Cloudflare** account (free tier works, needs Durable Objects enabled)
* **Fly.io** account (free tier works)
* **OpenRouter** account + API key ([get one here](https://openrouter.ai))

#### Required Software
* **Node.js ≥ 20**
* **pnpm** package manager
  ```bash
  npm install -g pnpm
  ```
* **Wrangler CLI** (Cloudflare)
  ```bash
  npm install -g wrangler
  ```
* **flyctl CLI** (Fly.io)
  ```bash
  curl -L https://fly.io/install.sh | sh
  ```

### Installation

#### 1. Clone Repository
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
```

#### 2. Install Dependencies
```bash
# Frontend + Cloudflare Workers
cd bandit-runner-app
pnpm install

# SSH Proxy (LangGraph agent)
cd ../ssh-proxy
npm install
```

#### 3. Configure Environment

**Cloudflare DO Worker:**
```bash
cd bandit-runner-app/workers/bandit-agent-do
# Create .env.local with your OpenRouter API key
echo "OPENROUTER_API_KEY=your_key_here" > .env.local
```

**Main Worker:**
No `.env` needed - configuration is in `wrangler.jsonc`

#### 4. Local Development

**Option A: Frontend Only** (requires deployed backend)
```bash
cd bandit-runner-app
pnpm dev
# Open http://localhost:3000
```

**Option B: Full Stack** (all components local)
```bash
# Terminal 1: SSH Proxy
cd ssh-proxy
npm run dev
# Runs on http://localhost:3001

# Terminal 2: Main App
cd bandit-runner-app
# Update wrangler.jsonc: "SSH_PROXY_URL": "http://localhost:3001"
pnpm dev
```

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Deployment

### Step 1: Deploy SSH Proxy to Fly.io

```bash
cd ssh-proxy

# Login (opens browser)
flyctl auth login

# Deploy (creates app on first run)
flyctl deploy

# Note your URL: https://bandit-ssh-proxy.fly.dev
```

### Step 2: Deploy Durable Object Worker

```bash
cd ../bandit-runner-app/workers/bandit-agent-do

# Set OpenRouter API key as secret
wrangler secret put OPENROUTER_API_KEY
# Paste your key when prompted

# Deploy
wrangler deploy

# Verify (optional)
wrangler tail
```

### Step 3: Deploy Main Application

```bash
cd ../..  # Back to bandit-runner-app root

# Verify wrangler.jsonc has correct SSH_PROXY_URL
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"

# Build and deploy
pnpm run deploy

# Your app is live at:
# https://bandit-runner-app.<your-subdomain>.workers.dev
```

### Step 4: Verify Deployment

```bash
# Test SSH Proxy
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}

# Open your app
open https://bandit-runner-app.<your-subdomain>.workers.dev
```

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Usage

### Starting a Run

1. **Open the application** in your browser
2. **Select Model**: Click the dropdown to choose from 100+ models
   - Search by name
   - Filter by provider (OpenAI, Anthropic, Google, Meta, etc.)
   - Filter by price range
   - Filter by context length
3. **Set Target Level**: Choose how far the agent should progress (default: 5)
4. **Select Output Mode**:
   - **Selective** (default): LLM reasoning + tool calls
   - **Everything**: Reasoning + tool calls + full command outputs
5. **Click START**

### Monitoring Progress

The interface provides real-time feedback:

- **Left Panel (Terminal)**: Full SSH session with ANSI colors and formatting
- **Right Panel (Agent Chat)**: LLM reasoning, planning, and discoveries
- **Top Bar**: Current status, level, and active model
- **Control Panel**: Pause/Resume/Stop controls

### Manual Mode (Debugging)

Toggle **"Manual Mode"** in the terminal footer to:
- Type commands directly into the SSH session
- Assist the agent when stuck
- Debug connection or logic issues

⚠️ **Note**: Manual intervention disqualifies runs from leaderboards

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Architecture

### System Overview

```text
┌─────────────┐
│   Browser   │ WebSocket (wss://)
└──────┬──────┘
       │
       ↓
┌─────────────────────────────┐
│  Cloudflare Worker (Main)   │ Next.js + OpenNext
│  bandit-runner-app          │ Intercepts WebSocket upgrades
└──────┬──────────────────────┘
       │
       ↓
┌─────────────────────────────┐
│  Durable Object Worker      │ WebSocket connection manager
│  bandit-agent-do            │ Run state coordinator
└──────┬──────────────────────┘
       │ HTTP (JSONL streaming)
       ↓
┌─────────────────────────────┐
│  SSH Proxy (Fly.io)         │ LangGraph.js autonomous agent
│  bandit-ssh-proxy.fly.dev   │ SSH2 client
└──────┬──────────────────────┘
       │ SSH Protocol
       ↓
┌─────────────────────────────┐
│  Bandit SSH Server          │ CTF challenges
│  overthewire.org:2220       │ Real command execution
└─────────────────────────────┘
```

### Component Responsibilities

**Frontend (Next.js 15)**
- React UI with shadcn/ui components
- Real-time terminal with ANSI rendering
- Agent chat display
- Model selection and configuration

**Main Worker (Cloudflare)**
- Serves Next.js application
- Intercepts WebSocket upgrades before Next.js routing
- Forwards WS connections to Durable Object

**Durable Object (Separate Worker)**
- Manages WebSocket connections from browsers
- Maintains run state (level, status, passwords)
- Calls SSH proxy HTTP endpoints
- Streams JSONL events to connected clients

**SSH Proxy (Fly.io)**
- Runs LangGraph.js agent state machine
- Maintains SSH2 connections to Bandit server
- Executes commands via PTY (full terminal capture)
- Extracts passwords using regex patterns
- Streams events back to DO via HTTP response

### Data Flow

1. User clicks START in browser
2. Browser establishes WebSocket to Main Worker
3. Worker forwards to Durable Object
4. DO sends HTTP POST to SSH Proxy `/agent/run`
5. SSH Proxy runs LangGraph agent
6. Agent connects via SSH to Bandit server
7. Commands executed, passwords extracted
8. Events stream: SSH Proxy → DO → WebSocket → Browser
9. UI updates in real-time

### Monorepo Structure

```
bandit-runner/
├── bandit-runner-app/          # Frontend + Cloudflare Workers
│   ├── src/
│   │   ├── app/                # Next.js pages + API routes
│   │   ├── components/         # React components
│   │   ├── hooks/              # useAgentWebSocket
│   │   └── lib/                # Utilities, types
│   ├── workers/
│   │   └── bandit-agent-do/    # Standalone Durable Object
│   ├── scripts/
│   │   └── patch-worker.js     # Injects WebSocket intercept
│   └── wrangler.jsonc          # Main worker config
├── ssh-proxy/                   # LangGraph agent runtime
│   ├── agent.ts                # State machine definition
│   ├── server.ts               # Express HTTP server
│   ├── Dockerfile              # Container image
│   └── fly.toml                # Fly.io deployment config
└── docs/                        # Documentation
```

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Troubleshooting

### WebSocket Connection Failed

**Symptoms**: "WebSocket connection failed" or "NS_ERROR_WEBSOCKET_CONNECTION_REFUSED"

**Solutions**:
```bash
# Check DO worker is deployed
wrangler deployments list

# Check browser console for specific errors
# Verify DO binding in wrangler.jsonc

# Test DO directly
curl https://bandit-runner-app.<your-subdomain>.workers.dev/api/agent/test-run/status
```

### SSH Proxy Not Responding

**Symptoms**: Agent starts but no terminal output appears

**Solutions**:
```bash
# Check Fly.io app status
flyctl status

# View real-time logs
flyctl logs

# Test health endpoint
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}

# Restart if needed
flyctl restart
```

### Agent Not Starting

**Symptoms**: Run status stays "IDLE" or errors in console

**Solutions**:
```bash
# Verify OpenRouter API key is set
wrangler secret list

# Check DO worker logs
wrangler tail --name bandit-agent-do

# Ensure SSH_PROXY_URL is correct in wrangler.jsonc
# Should be: https://bandit-ssh-proxy.fly.dev (not http://)
```

### Commands Not Executing

**Symptoms**: Agent thinks but no SSH commands run

**Solutions**:
```bash
# Check SSH proxy logs
flyctl logs

# Verify Bandit server is accessible
ssh bandit0@bandit.labs.overthewire.org -p 2220
# Password: bandit0

# Review agent reasoning in chat panel
# Look for error messages or failed attempts
```

### Build or Deploy Errors

**Common issues**:
```bash
# "Cannot find module" during build
cd bandit-runner-app && pnpm install
cd ssh-proxy && npm install

# "Account ID not found"
# Add account_id to workers/bandit-agent-do/wrangler.toml

# "Script not found: bandit-agent-do"
# Deploy DO worker first:
cd workers/bandit-agent-do && wrangler deploy
```

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Roadmap

### Completed ✅
* [x] LangGraph.js autonomous agent framework
* [x] Real-time WebSocket streaming (JSONL events)
* [x] Full terminal output with ANSI colors
* [x] Agent reasoning and thinking display
* [x] OpenRouter model selection with search/filters
* [x] Level 0 → N automatic progression
* [x] Password extraction and advancement
* [x] Manual mode for debugging
* [x] Durable Object state management

### In Progress 🚧
* [ ] Retry logic with exponential backoff
* [ ] Token usage and cost tracking UI
* [ ] Persistent run history (D1 database)
* [ ] Log storage and analysis (R2 bucket)

### Planned 📋
* [ ] Public leaderboard (fastest completions)
* [ ] Multi-agent comparison mode
* [ ] Custom system prompts and strategies
* [ ] Agent performance analytics dashboard
* [ ] Mock SSH server for testing
* [ ] Expanded CTF support (Natas, Krypton, etc.)
* [ ] Run replay and debugging tools

See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for discussion and feature requests.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Contributing

Contributions are welcome.

1. Fork the Project
2. Create your Feature Branch (`git checkout -b feat/amazing`)
3. Commit (`pnpm commit`) using Conventional Commits
4. Push (`git push origin feat/amazing`)
5. Open a Pull Request

### Top Contributors

<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=Nicholai/bandit-runner" alt="Contributors" />
</a>

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## License

Distributed under the **GNU GPLv3** License.
See `LICENSE` for details.

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Contact

**Nicholai Vogel**
[Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel) • [Instagram](https://instagram.com/nicholai.exe)

Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

## Acknowledgments

* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF wargame challenges
* [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent orchestration framework
* [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute platform
* [Fly.io](https://fly.io) - Global application hosting
* [OpenRouter](https://openrouter.ai) - Unified LLM API access
* [OpenNext](https://opennext.js.org/) - Next.js adapter for Cloudflare
* [shadcn/ui](https://ui.shadcn.com) - Beautiful component library
* [ANSI-to-HTML](https://github.com/rburns/ansi-to-html) - Terminal color rendering

<p align="right">(<a href="#readme-top">back to top</a>)</p>

---

<!-- MARKDOWN LINKS & IMAGES -->

[contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge
[contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge
[forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members
[stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge
[stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers
[issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge
[issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues
[license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge
[license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/nicholai-vogel
[product-screenshot]: public/screenshot.png
[Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://react.dev/
[Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white
[Cloudflare-url]: https://developers.cloudflare.com/workers/
[OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white
[OpenNext-url]: https://opennext.js.org/
[LangGraph-badge]: https://img.shields.io/badge/LangGraph-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white
[LangGraph-url]: https://langchain-ai.github.io/langgraphjs/
[Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white
[Shadcn-url]: https://ui.shadcn.com
[TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white
[TypeScript-url]: https://www.typescriptlang.org/
[pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white
[pnpm-url]: https://pnpm.io
[conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white