bandit-runner/docs/bandit-runner.md

# ADR 001: Bandit Runner architecture on Next.js + Cloudflare Workers

Status: Proposed
Date: 2025-10-08
Decision drivers:
- Run long-lived evals safely on Workers with Durable Objects
- Deterministic scoring and anti-abuse
- Cheap to run, easy to reason about

Context:
- We need an LLM test rig that controls SSH to OverTheWire Bandit only
- Workers runtime supports outbound TCP via connect()
- We require per-run state, timeouts, logs, and verification before advancing levels

Options:
A) Next.js on Workers + Durable Objects + D1 + R2
B) Same but relay SSH via a tiny TCP proxy you control
C) Traditional Node server on Fly/Render with WebSockets, no Workers

Decision:
- Choose A as primary. Keep B as fallback if SSH libs are incompatible with Workers runtime.

Implications:
- DO holds the socket and run state. API routes are thin. UI subscribes via WebSocket.
- Storage split: D1 for metadata, R2 for JSONL logs and artifacts.
- Strict command and network allow-lists enforced inside DO.

Security:
- Hardcode target host and port
- Redact secrets in UI, store raw in sealed R2 object with short TTL
- Rate limit run creation, per-level caps

Operations:
- One DO namespace per env
- Migrations via wrangler for D1
- Logpush or JSONL export for analysis

Follow-ups:
- ADR 002: SSH client choice for Workers
- ADR 003: Scoring and validator rules per level
- ADR 004: Data retention policy