bandit-runner/docs/development_documentation/readme-improvement.plan.md
2025-10-09 22:03:37 -06:00

14 KiB

README Improvement Plan

Problem

The current README is outdated and doesn't reflect the actual architecture or provide clear setup/deployment instructions. It references old concepts (D1, R2, mock setup) that aren't implemented, and doesn't explain the real LangGraph + SSH Proxy architecture that's currently working.

Goals

  1. Clear Project Overview: Explain what Bandit Runner actually does and why it matters
  2. Accurate Architecture: Document the real system (Next.js → Cloudflare Workers → Durable Objects → SSH Proxy → Bandit Server)
  3. Step-by-Step Setup: Complete local development setup instructions
  4. Deployment Guide: Clear instructions for deploying all components
  5. Feature Documentation: What the system actually does (model selection, real-time terminal, agent reasoning, etc.)
  6. Remove Outdated Content: Clean up references to unimplemented features

Current Issues

Inaccurate Architecture Description

  • README shows old architecture with D1/R2 that aren't implemented
  • Doesn't mention LangGraph.js agent framework
  • Doesn't explain SSH Proxy on Fly.io
  • Missing Durable Object standalone worker details

Missing Prerequisites

  • No mention of OpenRouter API key requirement
  • Missing Fly.io account for SSH proxy
  • No Node.js version specification (needs 20+)
  • Missing pnpm workspace setup

Incomplete Setup Instructions

  • Doesn't explain monorepo structure
  • No .env file examples or required variables
  • Missing SSH proxy deployment steps
  • No explanation of DO worker vs main worker
  • No Wrangler secrets configuration

Vague Usage Section

  • Doesn't explain the actual UI (control panel, terminal, chat)
  • No screenshots or feature descriptions
  • Missing model selection details
  • No explanation of manual mode

Outdated Roadmap

  • Lists items as incomplete that are done
  • Missing current features (WebSocket streaming, ANSI colors, etc.)

New README Structure

# Bandit Runner

## About
- What it is: Autonomous AI agent testing framework
- What it does: Solves OverTheWire Bandit CTF challenges
- Why it matters: Real-world benchmark for LLM autonomous capabilities

## Features
- 🤖 LangGraph.js autonomous agent with tool use
- 🔌 Real-time WebSocket streaming
- 🖥️ Full terminal output with ANSI colors
- 💬 Live agent reasoning/thinking display
- 🎯 OpenRouter model selection (100+ models)
- 🎨 Beautiful retro-terminal UI
- 🔒 SSH security boundaries (read-only Bandit server)
- 📊 Level progression tracking

## Architecture

### System Components
1. **Frontend** - Next.js 15 with App Router + shadcn/ui
2. **Backend** - Cloudflare Workers with OpenNext
3. **Coordinator** - Durable Objects (separate worker)
4. **Agent Runtime** - SSH Proxy on Fly.io (LangGraph.js)
5. **Target** - OverTheWire Bandit SSH server

### Data Flow
Browser → WebSocket → Main Worker → DO Worker → HTTP → SSH Proxy → LangGraph Agent → SSH → Bandit Server

### Tech Stack
- Next.js 15 + React 19
- Cloudflare Workers + Durable Objects
- LangGraph.js for agent orchestration
- OpenRouter for LLM access
- Fly.io for SSH proxy hosting
- shadcn/ui for components
- ANSI-to-HTML for terminal rendering

## Prerequisites

### Required Accounts
- Cloudflare account (free tier OK, needs Durable Objects enabled)
- Fly.io account (free tier OK)
- OpenRouter account + API key

### Required Software
- Node.js 20+
- pnpm (package manager)
- Wrangler CLI (Cloudflare)
- flyctl CLI (Fly.io)
- Git

## Installation

### 1. Clone Repository
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner

2. Install Dependencies

# Install pnpm if you don't have it
npm install -g pnpm

# Install all workspace dependencies
cd bandit-runner-app
pnpm install

cd ../ssh-proxy
npm install

3. Configure Environment

Cloudflare (Main Worker)

No .env needed - uses wrangler.jsonc vars

Cloudflare (DO Worker)

cd bandit-runner-app/workers/bandit-agent-do
cp .env.example .env.local
# Add your OpenRouter API key

SSH Proxy (Fly.io)

No configuration needed - OpenRouter key passed via HTTP headers

4. Local Development

Option A: Run Frontend Only (with deployed backend)

cd bandit-runner-app
pnpm dev
# Open http://localhost:3000

Option B: Run Full Stack Locally

# Terminal 1: SSH Proxy
cd ssh-proxy
npm run dev
# Runs on http://localhost:3001

# Terminal 2: Frontend (with local proxy)
cd bandit-runner-app
# Update wrangler.jsonc SSH_PROXY_URL to http://localhost:3001
pnpm dev

Deployment

Step 1: Deploy SSH Proxy to Fly.io

cd ssh-proxy

# Login to Fly.io (opens browser)
flyctl auth login

# Deploy (first time creates app)
flyctl deploy

# Note the URL: https://bandit-ssh-proxy.fly.dev

Step 2: Deploy Durable Object Worker

cd ../bandit-runner-app/workers/bandit-agent-do

# Set your OpenRouter API key
wrangler secret put OPENROUTER_API_KEY
# Paste your key when prompted

# Deploy the DO worker
wrangler deploy

# Verify deployment
wrangler tail

Step 3: Deploy Main Application

cd ../..  # Back to bandit-runner-app root

# Verify SSH_PROXY_URL in wrangler.jsonc points to your Fly.io URL
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"

# Build and deploy
pnpm run deploy

# Your app will be live at:
# https://bandit-runner-app.<your-subdomain>.workers.dev

Step 4: Verify Deployment

# Test SSH Proxy health
curl https://bandit-ssh-proxy.fly.dev/ssh/health

# Should return: {"status":"ok","activeConnections":0}

# Test main app
open https://bandit-runner-app.<your-subdomain>.workers.dev

Usage

Starting a Run

  1. Open the application in your browser
  2. Select Model: Click the model dropdown to choose from 100+ models
    • Search by name
    • Filter by provider (OpenAI, Anthropic, Google, etc.)
    • Filter by price range
    • Filter by context length
  3. Set Target Level: Choose how far the agent should attempt (default: 5)
  4. Select Output Mode:
    • Selective (default): Shows LLM reasoning + tool calls
    • Everything: Shows reasoning + tool calls + full command outputs
  5. Click START

Monitoring Progress

The interface shows:

  • Left Panel (Terminal): Full SSH session output with colors
  • Right Panel (Agent Chat): LLM reasoning and discovered information
  • Top Status Bar: Current status, level, and model info
  • Control Panel: Pause/Resume/Stop controls

Manual Mode (Debugging)

Toggle "Manual Mode" in the terminal to:

  • Type commands directly into the SSH session
  • Assist the agent when it gets stuck
  • Debug connection issues

⚠️ Note: Runs with manual intervention are disqualified from leaderboards

Features Explained

Real-Time Terminal

  • Full PTY session capture
  • ANSI color support (green prompts, red errors, etc.)
  • Timestamps for all output
  • Automatic scrolling
  • Read-only by default (manual mode available)

Agent Reasoning Display

  • LLM "thinking" messages (what command to try next)
  • Planning output (strategy for current level)
  • Password discovery announcements
  • Level completion summaries

Model Selection

  • 100+ models from OpenRouter
  • Real-time search
  • Filter by provider, price, context length
  • Shows token costs
  • Supports all major providers

Level Progression

  • Always starts at Level 0
  • Automatic password extraction
  • Automatic SSH re-login for next level
  • Configurable target level cap
  • Retry logic for failed attempts

Architecture Details

Monorepo Structure

bandit-runner/
├── bandit-runner-app/          # Next.js frontend + Cloudflare Workers
│   ├── src/
│   │   ├── app/                # Next.js pages + API routes
│   │   ├── components/         # React components (shadcn/ui)
│   │   ├── hooks/              # React hooks (WebSocket)
│   │   └── lib/                # Utilities, agents, storage
│   ├── workers/
│   │   └── bandit-agent-do/    # Standalone Durable Object worker
│   ├── scripts/
│   │   └── patch-worker.js     # WebSocket intercept injector
│   └── wrangler.jsonc          # Main worker config
├── ssh-proxy/                   # LangGraph agent on Fly.io
│   ├── agent.ts                # LangGraph state machine
│   ├── server.ts               # Express HTTP server
│   ├── Dockerfile              # Container definition
│   └── fly.toml                # Fly.io config
├── docs/                        # Documentation
└── public/                      # Assets

WebSocket Intercept Pattern

The main worker intercepts WebSocket upgrade requests before they reach Next.js:

// Injected by scripts/patch-worker.js
if (request.headers.get('Upgrade') === 'websocket') {
  // Forward directly to DO, bypass Next.js
  return env.BANDIT_AGENT.get(id).fetch(request);
}

This is necessary because Next.js API routes don't support WebSocket upgrades.

Durable Object Responsibilities

  • Accept WebSocket connections from browsers
  • Maintain run state (level, passwords, status)
  • Call SSH proxy HTTP endpoints
  • Stream JSONL events from SSH proxy to WebSocket clients
  • Handle START/PAUSE/RESUME/STOP actions

SSH Proxy Responsibilities

  • Run LangGraph.js agent state machine
  • Maintain SSH2 connections to Bandit server
  • Execute commands via PTY
  • Extract passwords using regex
  • Stream events back to DO via HTTP response body (JSONL)

Event Flow

  1. User clicks START in browser
  2. Browser → WebSocket → Main Worker → DO Worker
  3. DO → HTTP POST /agent/run → SSH Proxy
  4. SSH Proxy → SSH Connection → Bandit Server
  5. Agent executes commands, extracts passwords
  6. SSH Proxy → JSONL events → DO
  7. DO → WebSocket → Browser
  8. UI updates in real-time

Troubleshooting

WebSocket Connection Failed

  • Check DO worker is deployed: wrangler deployments list
  • Verify DO binding in main worker config
  • Check browser console for specific errors

SSH Proxy Not Responding

  • Check Fly.io status: flyctl status
  • View logs: flyctl logs
  • Test health endpoint: curl https://your-proxy.fly.dev/ssh/health

Agent Not Starting

  • Verify OpenRouter API key: wrangler secret list
  • Check DO worker logs: wrangler tail --name bandit-agent-do
  • Ensure SSH_PROXY_URL is correct in wrangler.jsonc

Commands Not Executing

  • Check SSH proxy logs: flyctl logs
  • Verify Bandit server is accessible: ssh bandit0@bandit.labs.overthewire.org -p 2220
  • Review agent reasoning in chat panel for errors

Roadmap

Completed

  • LangGraph autonomous agent framework
  • Real-time WebSocket streaming
  • Full terminal output with ANSI colors
  • Agent reasoning display
  • OpenRouter model selection with search/filters
  • Level 0 → N progression
  • Manual mode for debugging
  • Password extraction and advancement

In Progress 🚧

  • Retry logic with exponential backoff
  • Token usage and cost tracking UI
  • Persistent run history (D1)
  • Log storage (R2)

Planned 📋

  • Leaderboard (fastest completions)
  • Multi-agent comparison mode
  • Custom prompts and strategies
  • Agent performance analytics
  • Mock SSH server for testing
  • Expanded CTF support (beyond Bandit)

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feat/amazing-feature)
  3. Commit with Conventional Commits (feat:, fix:, docs:)
  4. Push and open a Pull Request

See CONTRIBUTING.md for detailed guidelines.

License

GNU GPLv3 - See COPYING.txt

Contact

Nicholai Vogel
WebsiteLinkedIn

Acknowledgments


## Key Changes

### What's Being Added
1. **Accurate architecture diagram** - Shows real components (DO, SSH Proxy, LangGraph)
2. **Complete prerequisite list** - All accounts, software, and setup requirements
3. **Monorepo structure docs** - Explains the workspace layout
4. **Step-by-step deployment** - 4 clear deployment phases
5. **Feature documentation** - What each UI component does
6. **Troubleshooting section** - Common issues and solutions
7. **Current roadmap** - Reflects actual completion status

### What's Being Removed
1. Mock architecture references (D1, R2 unimplemented features)
2. Incorrect usage instructions
3. Outdated screenshots/descriptions
4. Placeholder content from template

### What's Being Updated
1. Tech stack badges - Add LangGraph, Fly.io, OpenRouter
2. Built With section - Reflect actual dependencies
3. Installation steps - Real commands that work
4. Usage examples - Match actual UI
5. Contact links - Keep current

## Implementation Notes

- Keep existing badge links/formatting style
- Maintain table of contents structure
- Add new sections: Features, Architecture Details, Troubleshooting
- Update all code blocks with real, tested commands
- Add architecture ASCII diagram for clarity
- Include actual file paths and structure
- Reference real configuration files (wrangler.jsonc, etc.)

## Success Criteria

✅ README accurately describes working system  
✅ User can deploy from scratch following instructions  
✅ No references to unimplemented features  
✅ Clear troubleshooting for common issues  
✅ Architecture matches production code  
✅ All commands are tested and accurate