469 lines
14 KiB
Markdown
469 lines
14 KiB
Markdown
# README Improvement Plan
|
|
|
|
## Problem
|
|
|
|
The current README is outdated and doesn't reflect the actual architecture or provide clear setup/deployment instructions. It references old concepts (D1, R2, mock setup) that aren't implemented, and doesn't explain the real LangGraph + SSH Proxy architecture that's currently working.
|
|
|
|
## Goals
|
|
|
|
1. **Clear Project Overview**: Explain what Bandit Runner actually does and why it matters
|
|
2. **Accurate Architecture**: Document the real system (Next.js → Cloudflare Workers → Durable Objects → SSH Proxy → Bandit Server)
|
|
3. **Step-by-Step Setup**: Complete local development setup instructions
|
|
4. **Deployment Guide**: Clear instructions for deploying all components
|
|
5. **Feature Documentation**: What the system actually does (model selection, real-time terminal, agent reasoning, etc.)
|
|
6. **Remove Outdated Content**: Clean up references to unimplemented features
|
|
|
|
## Current Issues
|
|
|
|
### Inaccurate Architecture Description
|
|
- README shows old architecture with D1/R2 that aren't implemented
|
|
- Doesn't mention LangGraph.js agent framework
|
|
- Doesn't explain SSH Proxy on Fly.io
|
|
- Missing Durable Object standalone worker details
|
|
|
|
### Missing Prerequisites
|
|
- No mention of OpenRouter API key requirement
|
|
- Missing Fly.io account for SSH proxy
|
|
- No Node.js version specification (needs 20+)
|
|
- Missing pnpm workspace setup
|
|
|
|
### Incomplete Setup Instructions
|
|
- Doesn't explain monorepo structure
|
|
- No `.env` file examples or required variables
|
|
- Missing SSH proxy deployment steps
|
|
- No explanation of DO worker vs main worker
|
|
- No Wrangler secrets configuration
|
|
|
|
### Vague Usage Section
|
|
- Doesn't explain the actual UI (control panel, terminal, chat)
|
|
- No screenshots or feature descriptions
|
|
- Missing model selection details
|
|
- No explanation of manual mode
|
|
|
|
### Outdated Roadmap
|
|
- Lists items as incomplete that are done
|
|
- Missing current features (WebSocket streaming, ANSI colors, etc.)
|
|
|
|
## New README Structure
|
|
|
|
```markdown
|
|
# Bandit Runner
|
|
|
|
## About
|
|
- What it is: Autonomous AI agent testing framework
|
|
- What it does: Solves OverTheWire Bandit CTF challenges
|
|
- Why it matters: Real-world benchmark for LLM autonomous capabilities
|
|
|
|
## Features
|
|
- 🤖 LangGraph.js autonomous agent with tool use
|
|
- 🔌 Real-time WebSocket streaming
|
|
- 🖥️ Full terminal output with ANSI colors
|
|
- 💬 Live agent reasoning/thinking display
|
|
- 🎯 OpenRouter model selection (100+ models)
|
|
- 🎨 Beautiful retro-terminal UI
|
|
- 🔒 SSH security boundaries (read-only Bandit server)
|
|
- 📊 Level progression tracking
|
|
|
|
## Architecture
|
|
|
|
### System Components
|
|
1. **Frontend** - Next.js 15 with App Router + shadcn/ui
|
|
2. **Backend** - Cloudflare Workers with OpenNext
|
|
3. **Coordinator** - Durable Objects (separate worker)
|
|
4. **Agent Runtime** - SSH Proxy on Fly.io (LangGraph.js)
|
|
5. **Target** - OverTheWire Bandit SSH server
|
|
|
|
### Data Flow
|
|
Browser → WebSocket → Main Worker → DO Worker → HTTP → SSH Proxy → LangGraph Agent → SSH → Bandit Server
|
|
|
|
### Tech Stack
|
|
- Next.js 15 + React 19
|
|
- Cloudflare Workers + Durable Objects
|
|
- LangGraph.js for agent orchestration
|
|
- OpenRouter for LLM access
|
|
- Fly.io for SSH proxy hosting
|
|
- shadcn/ui for components
|
|
- ANSI-to-HTML for terminal rendering
|
|
|
|
## Prerequisites
|
|
|
|
### Required Accounts
|
|
- Cloudflare account (free tier OK, needs Durable Objects enabled)
|
|
- Fly.io account (free tier OK)
|
|
- OpenRouter account + API key
|
|
|
|
### Required Software
|
|
- Node.js 20+
|
|
- pnpm (package manager)
|
|
- Wrangler CLI (Cloudflare)
|
|
- flyctl CLI (Fly.io)
|
|
- Git
|
|
|
|
## Installation
|
|
|
|
### 1. Clone Repository
|
|
```bash
|
|
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
|
|
cd bandit-runner
|
|
```
|
|
|
|
### 2. Install Dependencies
|
|
```bash
|
|
# Install pnpm if you don't have it
|
|
npm install -g pnpm
|
|
|
|
# Install all workspace dependencies
|
|
cd bandit-runner-app
|
|
pnpm install
|
|
|
|
cd ../ssh-proxy
|
|
npm install
|
|
```
|
|
|
|
### 3. Configure Environment
|
|
|
|
#### Cloudflare (Main Worker)
|
|
No `.env` needed - uses `wrangler.jsonc` vars
|
|
|
|
#### Cloudflare (DO Worker)
|
|
```bash
|
|
cd bandit-runner-app/workers/bandit-agent-do
|
|
cp .env.example .env.local
|
|
# Add your OpenRouter API key
|
|
```
|
|
|
|
#### SSH Proxy (Fly.io)
|
|
No configuration needed - OpenRouter key passed via HTTP headers
|
|
|
|
### 4. Local Development
|
|
|
|
#### Option A: Run Frontend Only (with deployed backend)
|
|
```bash
|
|
cd bandit-runner-app
|
|
pnpm dev
|
|
# Open http://localhost:3000
|
|
```
|
|
|
|
#### Option B: Run Full Stack Locally
|
|
```bash
|
|
# Terminal 1: SSH Proxy
|
|
cd ssh-proxy
|
|
npm run dev
|
|
# Runs on http://localhost:3001
|
|
|
|
# Terminal 2: Frontend (with local proxy)
|
|
cd bandit-runner-app
|
|
# Update wrangler.jsonc SSH_PROXY_URL to http://localhost:3001
|
|
pnpm dev
|
|
```
|
|
|
|
## Deployment
|
|
|
|
### Step 1: Deploy SSH Proxy to Fly.io
|
|
|
|
```bash
|
|
cd ssh-proxy
|
|
|
|
# Login to Fly.io (opens browser)
|
|
flyctl auth login
|
|
|
|
# Deploy (first time creates app)
|
|
flyctl deploy
|
|
|
|
# Note the URL: https://bandit-ssh-proxy.fly.dev
|
|
```
|
|
|
|
### Step 2: Deploy Durable Object Worker
|
|
|
|
```bash
|
|
cd ../bandit-runner-app/workers/bandit-agent-do
|
|
|
|
# Set your OpenRouter API key
|
|
wrangler secret put OPENROUTER_API_KEY
|
|
# Paste your key when prompted
|
|
|
|
# Deploy the DO worker
|
|
wrangler deploy
|
|
|
|
# Verify deployment
|
|
wrangler tail
|
|
```
|
|
|
|
### Step 3: Deploy Main Application
|
|
|
|
```bash
|
|
cd ../.. # Back to bandit-runner-app root
|
|
|
|
# Verify SSH_PROXY_URL in wrangler.jsonc points to your Fly.io URL
|
|
# Should be: "SSH_PROXY_URL": "https://bandit-ssh-proxy.fly.dev"
|
|
|
|
# Build and deploy
|
|
pnpm run deploy
|
|
|
|
# Your app will be live at:
|
|
# https://bandit-runner-app.<your-subdomain>.workers.dev
|
|
```
|
|
|
|
### Step 4: Verify Deployment
|
|
|
|
```bash
|
|
# Test SSH Proxy health
|
|
curl https://bandit-ssh-proxy.fly.dev/ssh/health
|
|
|
|
# Should return: {"status":"ok","activeConnections":0}
|
|
|
|
# Test main app
|
|
open https://bandit-runner-app.<your-subdomain>.workers.dev
|
|
```
|
|
|
|
## Usage
|
|
|
|
### Starting a Run
|
|
|
|
1. Open the application in your browser
|
|
2. **Select Model**: Click the model dropdown to choose from 100+ models
|
|
- Search by name
|
|
- Filter by provider (OpenAI, Anthropic, Google, etc.)
|
|
- Filter by price range
|
|
- Filter by context length
|
|
3. **Set Target Level**: Choose how far the agent should attempt (default: 5)
|
|
4. **Select Output Mode**:
|
|
- **Selective** (default): Shows LLM reasoning + tool calls
|
|
- **Everything**: Shows reasoning + tool calls + full command outputs
|
|
5. **Click START**
|
|
|
|
### Monitoring Progress
|
|
|
|
The interface shows:
|
|
- **Left Panel (Terminal)**: Full SSH session output with colors
|
|
- **Right Panel (Agent Chat)**: LLM reasoning and discovered information
|
|
- **Top Status Bar**: Current status, level, and model info
|
|
- **Control Panel**: Pause/Resume/Stop controls
|
|
|
|
### Manual Mode (Debugging)
|
|
|
|
Toggle "Manual Mode" in the terminal to:
|
|
- Type commands directly into the SSH session
|
|
- Assist the agent when it gets stuck
|
|
- Debug connection issues
|
|
|
|
⚠️ **Note**: Runs with manual intervention are disqualified from leaderboards
|
|
|
|
## Features Explained
|
|
|
|
### Real-Time Terminal
|
|
- Full PTY session capture
|
|
- ANSI color support (green prompts, red errors, etc.)
|
|
- Timestamps for all output
|
|
- Automatic scrolling
|
|
- Read-only by default (manual mode available)
|
|
|
|
### Agent Reasoning Display
|
|
- LLM "thinking" messages (what command to try next)
|
|
- Planning output (strategy for current level)
|
|
- Password discovery announcements
|
|
- Level completion summaries
|
|
|
|
### Model Selection
|
|
- 100+ models from OpenRouter
|
|
- Real-time search
|
|
- Filter by provider, price, context length
|
|
- Shows token costs
|
|
- Supports all major providers
|
|
|
|
### Level Progression
|
|
- Always starts at Level 0
|
|
- Automatic password extraction
|
|
- Automatic SSH re-login for next level
|
|
- Configurable target level cap
|
|
- Retry logic for failed attempts
|
|
|
|
## Architecture Details
|
|
|
|
### Monorepo Structure
|
|
```
|
|
bandit-runner/
|
|
├── bandit-runner-app/ # Next.js frontend + Cloudflare Workers
|
|
│ ├── src/
|
|
│ │ ├── app/ # Next.js pages + API routes
|
|
│ │ ├── components/ # React components (shadcn/ui)
|
|
│ │ ├── hooks/ # React hooks (WebSocket)
|
|
│ │ └── lib/ # Utilities, agents, storage
|
|
│ ├── workers/
|
|
│ │ └── bandit-agent-do/ # Standalone Durable Object worker
|
|
│ ├── scripts/
|
|
│ │ └── patch-worker.js # WebSocket intercept injector
|
|
│ └── wrangler.jsonc # Main worker config
|
|
├── ssh-proxy/ # LangGraph agent on Fly.io
|
|
│ ├── agent.ts # LangGraph state machine
|
|
│ ├── server.ts # Express HTTP server
|
|
│ ├── Dockerfile # Container definition
|
|
│ └── fly.toml # Fly.io config
|
|
├── docs/ # Documentation
|
|
└── public/ # Assets
|
|
```
|
|
|
|
### WebSocket Intercept Pattern
|
|
|
|
The main worker intercepts WebSocket upgrade requests **before** they reach Next.js:
|
|
|
|
```javascript
|
|
// Injected by scripts/patch-worker.js
|
|
if (request.headers.get('Upgrade') === 'websocket') {
|
|
// Forward directly to DO, bypass Next.js
|
|
return env.BANDIT_AGENT.get(id).fetch(request);
|
|
}
|
|
```
|
|
|
|
This is necessary because Next.js API routes don't support WebSocket upgrades.
|
|
|
|
### Durable Object Responsibilities
|
|
|
|
- Accept WebSocket connections from browsers
|
|
- Maintain run state (level, passwords, status)
|
|
- Call SSH proxy HTTP endpoints
|
|
- Stream JSONL events from SSH proxy to WebSocket clients
|
|
- Handle START/PAUSE/RESUME/STOP actions
|
|
|
|
### SSH Proxy Responsibilities
|
|
|
|
- Run LangGraph.js agent state machine
|
|
- Maintain SSH2 connections to Bandit server
|
|
- Execute commands via PTY
|
|
- Extract passwords using regex
|
|
- Stream events back to DO via HTTP response body (JSONL)
|
|
|
|
### Event Flow
|
|
|
|
1. User clicks START in browser
|
|
2. Browser → WebSocket → Main Worker → DO Worker
|
|
3. DO → HTTP POST `/agent/run` → SSH Proxy
|
|
4. SSH Proxy → SSH Connection → Bandit Server
|
|
5. Agent executes commands, extracts passwords
|
|
6. SSH Proxy → JSONL events → DO
|
|
7. DO → WebSocket → Browser
|
|
8. UI updates in real-time
|
|
|
|
## Troubleshooting
|
|
|
|
### WebSocket Connection Failed
|
|
- Check DO worker is deployed: `wrangler deployments list`
|
|
- Verify DO binding in main worker config
|
|
- Check browser console for specific errors
|
|
|
|
### SSH Proxy Not Responding
|
|
- Check Fly.io status: `flyctl status`
|
|
- View logs: `flyctl logs`
|
|
- Test health endpoint: `curl https://your-proxy.fly.dev/ssh/health`
|
|
|
|
### Agent Not Starting
|
|
- Verify OpenRouter API key: `wrangler secret list`
|
|
- Check DO worker logs: `wrangler tail --name bandit-agent-do`
|
|
- Ensure SSH_PROXY_URL is correct in wrangler.jsonc
|
|
|
|
### Commands Not Executing
|
|
- Check SSH proxy logs: `flyctl logs`
|
|
- Verify Bandit server is accessible: `ssh bandit0@bandit.labs.overthewire.org -p 2220`
|
|
- Review agent reasoning in chat panel for errors
|
|
|
|
## Roadmap
|
|
|
|
### Completed ✅
|
|
- [x] LangGraph autonomous agent framework
|
|
- [x] Real-time WebSocket streaming
|
|
- [x] Full terminal output with ANSI colors
|
|
- [x] Agent reasoning display
|
|
- [x] OpenRouter model selection with search/filters
|
|
- [x] Level 0 → N progression
|
|
- [x] Manual mode for debugging
|
|
- [x] Password extraction and advancement
|
|
|
|
### In Progress 🚧
|
|
- [ ] Retry logic with exponential backoff
|
|
- [ ] Token usage and cost tracking UI
|
|
- [ ] Persistent run history (D1)
|
|
- [ ] Log storage (R2)
|
|
|
|
### Planned 📋
|
|
- [ ] Leaderboard (fastest completions)
|
|
- [ ] Multi-agent comparison mode
|
|
- [ ] Custom prompts and strategies
|
|
- [ ] Agent performance analytics
|
|
- [ ] Mock SSH server for testing
|
|
- [ ] Expanded CTF support (beyond Bandit)
|
|
|
|
## Contributing
|
|
|
|
Contributions welcome! Please:
|
|
|
|
1. Fork the repository
|
|
2. Create a feature branch (`git checkout -b feat/amazing-feature`)
|
|
3. Commit with Conventional Commits (`feat:`, `fix:`, `docs:`)
|
|
4. Push and open a Pull Request
|
|
|
|
See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
|
|
|
|
## License
|
|
|
|
GNU GPLv3 - See [COPYING.txt](COPYING.txt)
|
|
|
|
## Contact
|
|
|
|
**Nicholai Vogel**
|
|
[Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel)
|
|
|
|
## Acknowledgments
|
|
|
|
- [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) - CTF challenges
|
|
- [LangGraph.js](https://langchain-ai.github.io/langgraphjs/) - Agent framework
|
|
- [Cloudflare Workers](https://developers.cloudflare.com/workers/) - Edge compute
|
|
- [Fly.io](https://fly.io) - Global app hosting
|
|
- [OpenRouter](https://openrouter.ai) - Unified LLM API
|
|
- [shadcn/ui](https://ui.shadcn.com) - Component library
|
|
- [OpenNext](https://opennext.js.org/) - Next.js on Workers
|
|
```
|
|
|
|
## Key Changes
|
|
|
|
### What's Being Added
|
|
1. **Accurate architecture diagram** - Shows real components (DO, SSH Proxy, LangGraph)
|
|
2. **Complete prerequisite list** - All accounts, software, and setup requirements
|
|
3. **Monorepo structure docs** - Explains the workspace layout
|
|
4. **Step-by-step deployment** - 4 clear deployment phases
|
|
5. **Feature documentation** - What each UI component does
|
|
6. **Troubleshooting section** - Common issues and solutions
|
|
7. **Current roadmap** - Reflects actual completion status
|
|
|
|
### What's Being Removed
|
|
1. Mock architecture references (D1, R2 unimplemented features)
|
|
2. Incorrect usage instructions
|
|
3. Outdated screenshots/descriptions
|
|
4. Placeholder content from template
|
|
|
|
### What's Being Updated
|
|
1. Tech stack badges - Add LangGraph, Fly.io, OpenRouter
|
|
2. Built With section - Reflect actual dependencies
|
|
3. Installation steps - Real commands that work
|
|
4. Usage examples - Match actual UI
|
|
5. Contact links - Keep current
|
|
|
|
## Implementation Notes
|
|
|
|
- Keep existing badge links/formatting style
|
|
- Maintain table of contents structure
|
|
- Add new sections: Features, Architecture Details, Troubleshooting
|
|
- Update all code blocks with real, tested commands
|
|
- Add architecture ASCII diagram for clarity
|
|
- Include actual file paths and structure
|
|
- Reference real configuration files (wrangler.jsonc, etc.)
|
|
|
|
## Success Criteria
|
|
|
|
✅ README accurately describes working system
|
|
✅ User can deploy from scratch following instructions
|
|
✅ No references to unimplemented features
|
|
✅ Clear troubleshooting for common issues
|
|
✅ Architecture matches production code
|
|
✅ All commands are tested and accurate
|
|
|