ADR 001: Bandit Runner architecture on Next.js + Cloudflare Workers

Status: Proposed Date: 2025-10-08 Decision drivers:

Run long-lived evals safely on Workers with Durable Objects
Deterministic scoring and anti-abuse
Cheap to run, easy to reason about

Context:

We need an LLM test rig that controls SSH to OverTheWire Bandit only
Workers runtime supports outbound TCP via connect()
We require per-run state, timeouts, logs, and verification before advancing levels

Options: A) Next.js on Workers + Durable Objects + D1 + R2 B) Same but relay SSH via a tiny TCP proxy you control C) Traditional Node server on Fly/Render with WebSockets, no Workers

Decision:

Choose A as primary. Keep B as fallback if SSH libs are incompatible with Workers runtime.

Implications:

DO holds the socket and run state. API routes are thin. UI subscribes via WebSocket.
Storage split: D1 for metadata, R2 for JSONL logs and artifacts.
Strict command and network allow-lists enforced inside DO.

Security:

Hardcode target host and port
Redact secrets in UI, store raw in sealed R2 object with short TTL
Rate limit run creation, per-level caps

Operations:

One DO namespace per env
Migrations via wrangler for D1
Logpush or JSONL export for analysis

Follow-ups:

ADR 002: SSH client choice for Workers
ADR 003: Scoring and validator rules per level
ADR 004: Data retention policy

1.4 KiB Raw Blame History

ADR 001: Bandit Runner architecture on Next.js + Cloudflare Workers

1.4 KiB

Raw Blame History