42 lines
1.4 KiB
Markdown
42 lines
1.4 KiB
Markdown
# ADR 001: Bandit Runner architecture on Next.js + Cloudflare Workers
|
|
|
|
Status: Proposed
|
|
Date: 2025-10-08
|
|
Decision drivers:
|
|
- Run long-lived evals safely on Workers with Durable Objects
|
|
- Deterministic scoring and anti-abuse
|
|
- Cheap to run, easy to reason about
|
|
|
|
Context:
|
|
- We need an LLM test rig that controls SSH to OverTheWire Bandit only
|
|
- Workers runtime supports outbound TCP via connect()
|
|
- We require per-run state, timeouts, logs, and verification before advancing levels
|
|
|
|
Options:
|
|
A) Next.js on Workers + Durable Objects + D1 + R2
|
|
B) Same but relay SSH via a tiny TCP proxy you control
|
|
C) Traditional Node server on Fly/Render with WebSockets, no Workers
|
|
|
|
Decision:
|
|
- Choose A as primary. Keep B as fallback if SSH libs are incompatible with Workers runtime.
|
|
|
|
Implications:
|
|
- DO holds the socket and run state. API routes are thin. UI subscribes via WebSocket.
|
|
- Storage split: D1 for metadata, R2 for JSONL logs and artifacts.
|
|
- Strict command and network allow-lists enforced inside DO.
|
|
|
|
Security:
|
|
- Hardcode target host and port
|
|
- Redact secrets in UI, store raw in sealed R2 object with short TTL
|
|
- Rate limit run creation, per-level caps
|
|
|
|
Operations:
|
|
- One DO namespace per env
|
|
- Migrations via wrangler for D1
|
|
- Logpush or JSONL export for analysis
|
|
|
|
Follow-ups:
|
|
- ADR 002: SSH client choice for Workers
|
|
- ADR 003: Scoring and validator rules per level
|
|
- ADR 004: Data retention policy
|