[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![GPLv3 License][license-shield]][license-url]
[![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org)
[![LinkedIn][linkedin-shield]][linkedin-url]
---
Table of Contents
- About The Project
- Getting Started
- Usage
- Architecture
- Roadmap
- Contributing
- License
- Contact
- Acknowledgments
---
## About The Project
[![Product Screenshot][product-screenshot]](#)
**Bandit Runner** is a public, deterministic evaluation harness for large language models.
It transforms AI models into autonomous operators tasked with completing the **OverTheWire Bandit** wargame via SSH — entirely on Cloudflare Workers.
**Why it matters**
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution.
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions.
- Generates reproducible, privacy-safe logs for research or public leaderboards.
### Core Concepts
- **Agent Role:** Each run instantiates an LLM as “BanditRunner” — a scripted, deterministic persona following a strict system prompt and command allow-list.
- **Environment:** Next.js frontend + OpenNext build → Cloudflare Workers backend (Durable Objects + D1 + R2).
- **Security:** Hard-scoped to `bandit.labs.overthewire.org:2220`.
All discovered passwords are redacted in logs and sealed in short-lived encrypted blobs.
- **Goal:** Advance from Level 0 → final level autonomously while documenting every decision.
(back to top)
---
### Built With
* [![Next.js][Next.js]][Next-url]
* [![React][React.js]][React-url]
* [![Cloudflare][Cloudflare-badge]][Cloudflare-url]
* [![OpenNext][OpenNext-badge]][OpenNext-url]
* [![Shadcn/UI][Shadcn-badge]][Shadcn-url]
* [![TypeScript][TypeScript-badge]][TypeScript-url]
* [![Drizzle ORM][Drizzle-badge]][Drizzle-url]
* [![pnpm][pnpm-badge]][pnpm-url]
(back to top)
---
## Getting Started
### Prerequisites
You need:
* **Node.js ≥ 20**
* **pnpm**
```bash
npm i -g pnpm
```
* **Wrangler 3 CLI**
```bash
npm i -g wrangler
```
* A Cloudflare account with access to:
* Durable Objects
* D1 Database
* R2 Storage
### Installation
1. Clone the repo
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
```
2. Install dependencies
```bash
pnpm install
```
3. Copy and configure environment
```bash
cp .env.example .env.local
```
4. Build and run locally
```bash
pnpm dev
# or
wrangler dev
```
5. Deploy preview
```bash
pnpm build
wrangler deploy --env preview
```
(back to top)
---
## Usage
Once deployed, visit `/runs/new` to start a new evaluation.
Provide a model endpoint (OpenAI, OpenRouter, or self-hosted) and initiate a Bandit Run.
Each run:
* Spawns a Durable Object → “Run Coordinator”
* Connects to `bandit.labs.overthewire.org:2220`
* Executes controlled `ssh.connect` / `ssh.exec` / `ssh.close` operations
* Streams JSONL logs and commentary to the Live Viewer
Developers can extend:
* Scoring rules (`lib/scoring/verdicts.ts`)
* Level validators (`lib/scoring/validators.ts`)
* Model interfaces (`lib/ssh/tool-adapter.ts`)
(back to top)
---
## Architecture
```text
Next.js (App Router)
│
├── UI (Shadcn/UI)
│ ├─ LiveLog
│ └─ LevelCard
│
├── Edge API Routes (OpenNext)
│ ├─ /api/startRun
│ ├─ /api/toolInvoke
│ └─ /api/stream
│
└── Cloudflare Worker
├─ Durable Object: RunCoordinator
│ ├─ TCP connect() to Bandit
│ ├─ State machine (levels, caps, timers)
│ └─ Writes logs → R2
├─ D1 (metadata)
└─ R2 (artifacts)
```
*See `docs/ADR-001-architecture.md` for the detailed decision record.*
(back to top)
---
## Roadmap
* [x] Core runner architecture
* [x] JSONL log streaming
* [x] SSH tool scaffolding
* [ ] Add live leaderboard
* [ ] Add mock SSH server for tests
* [ ] Expand scoring heuristics
* [ ] Implement model-agnostic adapter layer
* [ ] Public demo page
See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for the full roadmap.
(back to top)
---
## Contributing
Contributions are welcome.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feat/amazing`)
3. Commit (`pnpm commit`) using Conventional Commits
4. Push (`git push origin feat/amazing`)
5. Open a Pull Request
### Top Contributors
(back to top)
---
## License
Distributed under the **GNU GPLv3** License.
See `LICENSE` for details.
(back to top)
---
## Contact
**Nicholai Vogel**
[Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel) • [Instagram](https://instagram.com/nicholai.exe)
Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)
(back to top)
---
## Acknowledgments
* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) — for the wargame challenge itself
* [Cloudflare Workers Docs](https://developers.cloudflare.com/workers/)
* [OpenNext](https://opennext.js.org/)
* [Shadcn/UI](https://ui.shadcn.com)
* [Drizzle ORM](https://orm.drizzle.team)
* [Choose a License](https://choosealicense.com)
* [Img Shields](https://shields.io)
* [Contrib.rocks](https://contrib.rocks)
(back to top)
---
[contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge
[contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge
[forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members
[stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge
[stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers
[issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge
[issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues
[license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge
[license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/nicholai-vogel
[product-screenshot]: public/screenshot.png
[Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://react.dev/
[Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white
[Cloudflare-url]: https://developers.cloudflare.com/workers/
[OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white
[OpenNext-url]: https://opennext.js.org/
[Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white
[Shadcn-url]: https://ui.shadcn.com
[TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white
[TypeScript-url]: https://www.typescriptlang.org/
[Drizzle-badge]: https://img.shields.io/badge/Drizzle%20ORM-3E63DD?style=for-the-badge&logo=sqlite&logoColor=white
[Drizzle-url]: https://orm.drizzle.team
[pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white
[pnpm-url]: https://pnpm.io
[conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white