[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url] [![GPLv3 License][license-shield]][license-url] [![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org) [![LinkedIn][linkedin-shield]][linkedin-url]
Logo

Bandit Runner

A deterministic AI testing rig for LLMs-as-agents — built on Next.js, OpenNext, and Cloudflare Workers.
Explore the docs »

View Demo · Report Bug · Request Feature

---
Table of Contents
  1. About The Project
  2. Getting Started
  3. Usage
  4. Architecture
  5. Roadmap
  6. Contributing
  7. License
  8. Contact
  9. Acknowledgments
--- ## About The Project [![Product Screenshot][product-screenshot]](#) **Bandit Runner** is a public, deterministic evaluation harness for large language models. It transforms AI models into autonomous operators tasked with completing the **OverTheWire Bandit** wargame via SSH — entirely on Cloudflare Workers. **Why it matters** - Provides a real-world, hands-on benchmark for autonomous reasoning and command execution. - Tests tool use (SSH), planning, error handling, and persistence under real network conditions. - Generates reproducible, privacy-safe logs for research or public leaderboards. ### Core Concepts - **Agent Role:** Each run instantiates an LLM as “BanditRunner” — a scripted, deterministic persona following a strict system prompt and command allow-list. - **Environment:** Next.js frontend + OpenNext build → Cloudflare Workers backend (Durable Objects + D1 + R2). - **Security:** Hard-scoped to `bandit.labs.overthewire.org:2220`. All discovered passwords are redacted in logs and sealed in short-lived encrypted blobs. - **Goal:** Advance from Level 0 → final level autonomously while documenting every decision.

(back to top)

--- ### Built With * [![Next.js][Next.js]][Next-url] * [![React][React.js]][React-url] * [![Cloudflare][Cloudflare-badge]][Cloudflare-url] * [![OpenNext][OpenNext-badge]][OpenNext-url] * [![Shadcn/UI][Shadcn-badge]][Shadcn-url] * [![TypeScript][TypeScript-badge]][TypeScript-url] * [![Drizzle ORM][Drizzle-badge]][Drizzle-url] * [![pnpm][pnpm-badge]][pnpm-url]

(back to top)

--- ## Getting Started ### Prerequisites You need: * **Node.js ≥ 20** * **pnpm** ```bash npm i -g pnpm ``` * **Wrangler 3 CLI** ```bash npm i -g wrangler ``` * A Cloudflare account with access to: * Durable Objects * D1 Database * R2 Storage ### Installation 1. Clone the repo ```bash git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git cd bandit-runner ``` 2. Install dependencies ```bash pnpm install ``` 3. Copy and configure environment ```bash cp .env.example .env.local ``` 4. Build and run locally ```bash pnpm dev # or wrangler dev ``` 5. Deploy preview ```bash pnpm build wrangler deploy --env preview ```

(back to top)

--- ## Usage Once deployed, visit `/runs/new` to start a new evaluation. Provide a model endpoint (OpenAI, OpenRouter, or self-hosted) and initiate a Bandit Run. Each run: * Spawns a Durable Object → “Run Coordinator” * Connects to `bandit.labs.overthewire.org:2220` * Executes controlled `ssh.connect` / `ssh.exec` / `ssh.close` operations * Streams JSONL logs and commentary to the Live Viewer Developers can extend: * Scoring rules (`lib/scoring/verdicts.ts`) * Level validators (`lib/scoring/validators.ts`) * Model interfaces (`lib/ssh/tool-adapter.ts`)

(back to top)

--- ## Architecture ```text Next.js (App Router) │ ├── UI (Shadcn/UI) │ ├─ LiveLog │ └─ LevelCard │ ├── Edge API Routes (OpenNext) │ ├─ /api/startRun │ ├─ /api/toolInvoke │ └─ /api/stream │ └── Cloudflare Worker ├─ Durable Object: RunCoordinator │ ├─ TCP connect() to Bandit │ ├─ State machine (levels, caps, timers) │ └─ Writes logs → R2 ├─ D1 (metadata) └─ R2 (artifacts) ``` *See `docs/ADR-001-architecture.md` for the detailed decision record.*

(back to top)

--- ## Roadmap * [x] Core runner architecture * [x] JSONL log streaming * [x] SSH tool scaffolding * [ ] Add live leaderboard * [ ] Add mock SSH server for tests * [ ] Expand scoring heuristics * [ ] Implement model-agnostic adapter layer * [ ] Public demo page See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for the full roadmap.

(back to top)

--- ## Contributing Contributions are welcome. 1. Fork the Project 2. Create your Feature Branch (`git checkout -b feat/amazing`) 3. Commit (`pnpm commit`) using Conventional Commits 4. Push (`git push origin feat/amazing`) 5. Open a Pull Request ### Top Contributors Contributors

(back to top)

--- ## License Distributed under the **GNU GPLv3** License. See `LICENSE` for details.

(back to top)

--- ## Contact **Nicholai Vogel** [Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel) • [Instagram](https://instagram.com/nicholai.exe) Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)

(back to top)

--- ## Acknowledgments * [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) — for the wargame challenge itself * [Cloudflare Workers Docs](https://developers.cloudflare.com/workers/) * [OpenNext](https://opennext.js.org/) * [Shadcn/UI](https://ui.shadcn.com) * [Drizzle ORM](https://orm.drizzle.team) * [Choose a License](https://choosealicense.com) * [Img Shields](https://shields.io) * [Contrib.rocks](https://contrib.rocks)

(back to top)

--- [contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge [contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors [forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge [forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members [stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge [stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers [issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge [issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues [license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge [license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt [linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555 [linkedin-url]: https://linkedin.com/in/nicholai-vogel [product-screenshot]: public/screenshot.png [Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white [Next-url]: https://nextjs.org/ [React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB [React-url]: https://react.dev/ [Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white [Cloudflare-url]: https://developers.cloudflare.com/workers/ [OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white [OpenNext-url]: https://opennext.js.org/ [Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white [Shadcn-url]: https://ui.shadcn.com [TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white [TypeScript-url]: https://www.typescriptlang.org/ [Drizzle-badge]: https://img.shields.io/badge/Drizzle%20ORM-3E63DD?style=for-the-badge&logo=sqlite&logoColor=white [Drizzle-url]: https://orm.drizzle.team [pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white [pnpm-url]: https://pnpm.io [conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white