bandit-runner/README.md
nicholai 046ef202cb docs: comprehensive README update with accurate architecture and deployment guide
- Add detailed Features section with LangGraph agent, WebSocket streaming, ANSI terminal
- Document real architecture: Browser → Workers → DO → SSH Proxy → Bandit
- Add complete Prerequisites section (accounts, software, API keys)
- Provide step-by-step Installation instructions for local dev
- Add comprehensive 4-step Deployment guide (SSH Proxy, DO, Main, Verify)
- Update Usage section with actual UI features and manual mode
- Add Troubleshooting section for common issues
- Update Roadmap to reflect completed features
- Remove outdated D1/R2 architecture references
- Add LangGraph badge and update Built With section
- Document monorepo structure and component responsibilities
- Include data flow diagram and event streaming details
2025-10-09 19:27:58 -06:00

601 lines
19 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<a id="readme-top"></a>
<!-- PROJECT SHIELDS -->
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![GPLv3 License][license-shield]][license-url]
[![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org)
[![LinkedIn][linkedin-shield]][linkedin-url]
<!-- PROJECT LOGO -->
<br />
<div align="center">
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner">
<img src="public/bandit-logo.png" alt="Logo" width="100" height="100">
</a>
<h3 align="center">Bandit Runner</h3>
<p align="center">
A deterministic AI testing rig for LLMs-as-agents — built on Next.js, OpenNext, and Cloudflare Workers.
<br />
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner"><strong>Explore the docs »</strong></a>
<br />
<br />
<a href="#">View Demo</a>
&middot;
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/issues/new?labels=bug">Report Bug</a>
&middot;
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/issues/new?labels=enhancement">Request Feature</a>
</p>
</div>
---
<!-- TABLE OF CONTENTS -->
<details>
<summary>Table of Contents</summary>
<ol>
<li><a href="#about-the-project">About The Project</a>
<ul>
<li><a href="#core-concepts">Core Concepts</a></li>
<li><a href="#built-with">Built With</a></li>
</ul>
</li>
<li><a href="#getting-started">Getting Started</a>
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#installation">Installation</a></li>
</ul>
</li>
<li><a href="#usage">Usage</a></li>
<li><a href="#architecture">Architecture</a></li>
<li><a href="#roadmap">Roadmap</a></li>
<li><a href="#contributing">Contributing</a></li>
<li><a href="#license">License</a></li>
<li><a href="#contact">Contact</a></li>
<li><a href="#acknowledgments">Acknowledgments</a></li>
</ol>
</details>
---
## About The Project
[![Product Screenshot][product-screenshot]](#)
**Bandit Runner** is an autonomous AI agent testing framework that evaluates large language models by having them solve the **OverTheWire Bandit** wargame challenges via SSH.
**Why it matters**
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions
- Generates reproducible logs for research and public leaderboards
- Demonstrates LLM capabilities in a sandboxed, deterministic environment
### Features
- 🤖 **LangGraph.js Autonomous Agent** - State machine with tool use and planning
- 🔌 **Real-Time WebSocket Streaming** - Live updates with <50ms latency
- 🖥 **Full Terminal Output** - Complete SSH session with ANSI colors and formatting
- 💬 **Agent Reasoning Display** - See the LLM's thinking process and command selection
- 🎯 **OpenRouter Integration** - Access 100+ models with search and filtering
- 🎨 **Beautiful Retro UI** - Terminal-inspired interface with shadcn/ui components
- 🔒 **Security Boundaries** - Sandboxed SSH access to read-only Bandit server
- 📊 **Level Progression** - Automatic password extraction and advancement
- 🛠 **Manual Mode** - Optional human intervention for debugging
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
### Built With
* [![Next.js][Next.js]][Next-url] - Frontend framework (v15 with App Router)
* [![React][React.js]][React-url] - UI library (v19)
* [![Cloudflare][Cloudflare-badge]][Cloudflare-url] - Edge compute + Durable Objects
* [![OpenNext][OpenNext-badge]][OpenNext-url] - Next.js adapter for Cloudflare Workers
* [![LangGraph][LangGraph-badge]][LangGraph-url] - Agent orchestration framework
* [![Shadcn/UI][Shadcn-badge]][Shadcn-url] - Component library
* [![TypeScript][TypeScript-badge]][TypeScript-url] - Type safety
* [![pnpm][pnpm-badge]][pnpm-url] - Package manager
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Getting Started
### Prerequisites
#### Required Accounts
* **Cloudflare** account (free tier works, needs Durable Objects enabled)
* **Fly.io** account (free tier works)
* **OpenRouter** account + API key ([get one here](https://openrouter.ai))
#### Required Software
* **Node.js ≥ 20**
* **pnpm** package manager
```bash
npm install -g pnpm
```
* **Wrangler CLI** (Cloudflare)
```bash
npm install -g wrangler
```
* **flyctl CLI** (Fly.io)
```bash
curl -L https://fly.io/install.sh | sh
```
### Installation
#### 1. Clone Repository
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
```
#### 2. Install Dependencies
```bash
# Frontend + Cloudflare Workers
cd bandit-runner-app
pnpm install
# SSH Proxy (LangGraph agent)
cd ../ssh-proxy
npm install
```
#### 3. Configure Environment
**Cloudflare DO Worker:**
```bash
cd bandit-runner-app/workers/bandit-agent-do
# Create .env.local with your OpenRouter API key
echo "OPENROUTER_API_KEY=your_key_here" > .env.local
```
**Main Worker:**
No `.env` needed - configuration is in `wrangler.jsonc`
#### 4. Local Development
**Option A: Frontend Only** (requires deployed backend)
```bash
cd bandit-runner-app
pnpm dev
# Open http://localhost:3000
```
**Option B: Full Stack** (all components local)
```bash
# Terminal 1: SSH Proxy
cd ssh-proxy
npm run dev
# Runs on http://localhost:3001
# Terminal 2: Main App
cd bandit-runner-app
# Update wrangler.jsonc: "SSH_PROXY_URL": "http://localhost:3001"
pnpm dev
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Deployment
### Step 1: Deploy SSH Proxy to Fly.io
```bash
cd ssh-proxy
# Login (opens browser)
flyctl auth login
# Deploy (creates app on first run)
flyctl deploy
# Note your URL: https://bandit-ssh-proxy.fly.dev
```
### Step 2: Deploy Durable Object Worker
```bash
cd ../bandit-runner-app/workers/bandit-agent-do
# Set OpenRouter API key as secret
wrangler secret put OPENROUTER_API_KEY
# Paste your key when prompted
# Deploy
wrangler deploy
# Verify (optional)
wrangler tail
```
### Step 3: Deploy Main Application
```bash
cd ../.. # Back to bandit-runner-app root
# Verify wrangler.jsonc has correct SSH_PROXY_URL
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
# Build and deploy
pnpm run deploy
# Your app is live at:
# https://bandit-runner-app.<your-subdomain>.workers.dev
```
### Step 4: Verify Deployment
```bash
# Test SSH Proxy
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Open your app
open https://bandit-runner-app.<your-subdomain>.workers.dev
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Usage
### Starting a Run
1. **Open the application** in your browser
2. **Select Model**: Click the dropdown to choose from 100+ models
- Search by name
- Filter by provider (OpenAI, Anthropic, Google, Meta, etc.)
- Filter by price range
- Filter by context length
3. **Set Target Level**: Choose how far the agent should progress (default: 5)
4. **Select Output Mode**:
- **Selective** (default): LLM reasoning + tool calls
- **Everything**: Reasoning + tool calls + full command outputs
5. **Click START**
### Monitoring Progress
The interface provides real-time feedback:
- **Left Panel (Terminal)**: Full SSH session with ANSI colors and formatting
- **Right Panel (Agent Chat)**: LLM reasoning, planning, and discoveries
- **Top Bar**: Current status, level, and active model
- **Control Panel**: Pause/Resume/Stop controls
### Manual Mode (Debugging)
Toggle **"Manual Mode"** in the terminal footer to:
- Type commands directly into the SSH session
- Assist the agent when stuck
- Debug connection or logic issues
⚠️ **Note**: Manual intervention disqualifies runs from leaderboards
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Architecture
### System Overview
```text
┌─────────────┐
│ Browser │ WebSocket (wss://)
└──────┬──────┘
┌─────────────────────────────┐
│ Cloudflare Worker (Main) │ Next.js + OpenNext
│ bandit-runner-app │ Intercepts WebSocket upgrades
└──────┬──────────────────────┘
┌─────────────────────────────┐
│ Durable Object Worker │ WebSocket connection manager
│ bandit-agent-do │ Run state coordinator
└──────┬──────────────────────┘
│ HTTP (JSONL streaming)
┌─────────────────────────────┐
│ SSH Proxy (Fly.io) │ LangGraph.js autonomous agent
│ bandit-ssh-proxy.fly.dev │ SSH2 client
└──────┬──────────────────────┘
│ SSH Protocol
┌─────────────────────────────┐
│ Bandit SSH Server │ CTF challenges
│ overthewire.org:2220 │ Real command execution
└─────────────────────────────┘
```
### Component Responsibilities
**Frontend (Next.js 15)**
- React UI with shadcn/ui components
- Real-time terminal with ANSI rendering
- Agent chat display
- Model selection and configuration
**Main Worker (Cloudflare)**
- Serves Next.js application
- Intercepts WebSocket upgrades before Next.js routing
- Forwards WS connections to Durable Object
**Durable Object (Separate Worker)**
- Manages WebSocket connections from browsers
- Maintains run state (level, status, passwords)
- Calls SSH proxy HTTP endpoints
- Streams JSONL events to connected clients
**SSH Proxy (Fly.io)**
- Runs LangGraph.js agent state machine
- Maintains SSH2 connections to Bandit server
- Executes commands via PTY (full terminal capture)
- Extracts passwords using regex patterns
- Streams events back to DO via HTTP response
### Data Flow
1. User clicks START in browser
2. Browser establishes WebSocket to Main Worker
3. Worker forwards to Durable Object
4. DO sends HTTP POST to SSH Proxy `/agent/run`
5. SSH Proxy runs LangGraph agent
6. Agent connects via SSH to Bandit server
7. Commands executed, passwords extracted
8. Events stream: SSH Proxy → DO → WebSocket → Browser
9. UI updates in real-time
### Monorepo Structure
```
bandit-runner/
├── bandit-runner-app/ # Frontend + Cloudflare Workers
│ ├── src/
│ │ ├── app/ # Next.js pages + API routes
│ │ ├── components/ # React components
│ │ ├── hooks/ # useAgentWebSocket
│ │ └── lib/ # Utilities, types
│ ├── workers/
│ │ └── bandit-agent-do/ # Standalone Durable Object
│ ├── scripts/
│ │ └── patch-worker.js # Injects WebSocket intercept
│ └── wrangler.jsonc # Main worker config
├── ssh-proxy/ # LangGraph agent runtime
│ ├── agent.ts # State machine definition
│ ├── server.ts # Express HTTP server
│ ├── Dockerfile # Container image
│ └── fly.toml # Fly.io deployment config
└── docs/ # Documentation
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Troubleshooting
### WebSocket Connection Failed
**Symptoms**: "WebSocket connection failed" or "NS_ERROR_WEBSOCKET_CONNECTION_REFUSED"
**Solutions**:
```bash
# Check DO worker is deployed
wrangler deployments list
# Check browser console for specific errors
# Verify DO binding in wrangler.jsonc
# Test DO directly
curl https://bandit-runner-app.<your-subdomain>.workers.dev/api/agent/test-run/status
```
### SSH Proxy Not Responding
**Symptoms**: Agent starts but no terminal output appears
**Solutions**:
```bash
# Check Fly.io app status
flyctl status
# View real-time logs
flyctl logs
# Test health endpoint
curl https://bandit-ssh-proxy.fly.dev/ssh/health
# Should return: {"status":"ok","activeConnections":0}
# Restart if needed
flyctl restart
```
### Agent Not Starting
**Symptoms**: Run status stays "IDLE" or errors in console
**Solutions**:
```bash
# Verify OpenRouter API key is set
wrangler secret list
# Check DO worker logs
wrangler tail --name bandit-agent-do
# Ensure SSH_PROXY_URL is correct in wrangler.jsonc
# Should be: https://bandit-ssh-proxy.fly.dev (not http://)
```
### Commands Not Executing
**Symptoms**: Agent thinks but no SSH commands run
**Solutions**:
```bash
# Check SSH proxy logs
flyctl logs
# Verify Bandit server is accessible
ssh bandit0@bandit.labs.overthewire.org -p 2220
# Password: bandit0
# Review agent reasoning in chat panel
# Look for error messages or failed attempts
```
### Build or Deploy Errors
**Common issues**:
```bash
# "Cannot find module" during build
cd bandit-runner-app && pnpm install
cd ssh-proxy && npm install
# "Account ID not found"
# Add account_id to workers/bandit-agent-do/wrangler.toml
# "Script not found: bandit-agent-do"
# Deploy DO worker first:
cd workers/bandit-agent-do && wrangler deploy
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Roadmap
### Completed ✅
* [x] LangGraph.js autonomous agent framework
* [x] Real-time WebSocket streaming (JSONL events)
* [x] Full terminal output with ANSI colors
* [x] Agent reasoning and thinking display
* [x] OpenRouter model selection with search/filters
* [x] Level 0 → N automatic progression
* [x] Password extraction and advancement
* [x] Manual mode for debugging
* [x] Durable Object state management
### In Progress 🚧
* [ ] Retry logic with exponential backoff
* [ ] Token usage and cost tracking UI
* [ ] Persistent run history (D1 database)
* [ ] Log storage and analysis (R2 bucket)
### Planned 📋
* [ ] Public leaderboard (fastest completions)
* [ ] Multi-agent comparison mode
* [ ] Custom system prompts and strategies
* [ ] Agent performance analytics dashboard
* [ ] Mock SSH server for testing
* [ ] Expanded CTF support (Natas, Krypton, etc.)
* [ ] Run replay and debugging tools
See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for discussion and feature requests.
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Contributing
Contributions are welcome.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feat/amazing`)
3. Commit (`pnpm commit`) using Conventional Commits
4. Push (`git push origin feat/amazing`)
5. Open a Pull Request
### Top Contributors
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors">
<img src="https://contrib.rocks/image?repo=Nicholai/bandit-runner" alt="Contributors" />
</a>
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## License
Distributed under the **GNU GPLv3** License.
See `LICENSE` for details.
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Contact
**Nicholai Vogel**
[Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel) • [Instagram](https://instagram.com/nicholai.exe)
Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Acknowledgments
* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF wargame challenges
* [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent orchestration framework
* [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute platform
* [Fly.io](https://fly.io) - Global application hosting
* [OpenRouter](https://openrouter.ai) - Unified LLM API access
* [OpenNext](https://opennext.js.org/) - Next.js adapter for Cloudflare
* [shadcn/ui](https://ui.shadcn.com) - Beautiful component library
* [ANSI-to-HTML](https://github.com/rburns/ansi-to-html) - Terminal color rendering
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
<!-- MARKDOWN LINKS & IMAGES -->
[contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge
[contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge
[forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members
[stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge
[stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers
[issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge
[issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues
[license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge
[license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/nicholai-vogel
[product-screenshot]: public/screenshot.png
[Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://react.dev/
[Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white
[Cloudflare-url]: https://developers.cloudflare.com/workers/
[OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white
[OpenNext-url]: https://opennext.js.org/
[LangGraph-badge]: https://img.shields.io/badge/LangGraph-1C3C3C?style=for-the-badge&logo=langchain&logoColor=white
[LangGraph-url]: https://langchain-ai.github.io/langgraphjs/
[Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white
[Shadcn-url]: https://ui.shadcn.com
[TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white
[TypeScript-url]: https://www.typescriptlang.org/
[pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white
[pnpm-url]: https://pnpm.io
[conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white