bandit-runner/README.md
nicholai 1b95e75310
Some checks are pending
CI / build-test (push) Waiting to run
initialized repository
2025-10-09 01:39:24 -06:00

317 lines
10 KiB
Markdown

<a id="readme-top"></a>
<!-- PROJECT SHIELDS -->
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![GPLv3 License][license-shield]][license-url]
[![Conventional Commits][conventional-commits-badge]](https://conventionalcommits.org)
[![LinkedIn][linkedin-shield]][linkedin-url]
<!-- PROJECT LOGO -->
<br />
<div align="center">
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner">
<img src="public/bandit-logo.png" alt="Logo" width="100" height="100">
</a>
<h3 align="center">Bandit Runner</h3>
<p align="center">
A deterministic AI testing rig for LLMs-as-agents — built on Next.js, OpenNext, and Cloudflare Workers.
<br />
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner"><strong>Explore the docs »</strong></a>
<br />
<br />
<a href="#">View Demo</a>
&middot;
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/issues/new?labels=bug">Report Bug</a>
&middot;
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/issues/new?labels=enhancement">Request Feature</a>
</p>
</div>
---
<!-- TABLE OF CONTENTS -->
<details>
<summary>Table of Contents</summary>
<ol>
<li><a href="#about-the-project">About The Project</a>
<ul>
<li><a href="#core-concepts">Core Concepts</a></li>
<li><a href="#built-with">Built With</a></li>
</ul>
</li>
<li><a href="#getting-started">Getting Started</a>
<ul>
<li><a href="#prerequisites">Prerequisites</a></li>
<li><a href="#installation">Installation</a></li>
</ul>
</li>
<li><a href="#usage">Usage</a></li>
<li><a href="#architecture">Architecture</a></li>
<li><a href="#roadmap">Roadmap</a></li>
<li><a href="#contributing">Contributing</a></li>
<li><a href="#license">License</a></li>
<li><a href="#contact">Contact</a></li>
<li><a href="#acknowledgments">Acknowledgments</a></li>
</ol>
</details>
---
## About The Project
[![Product Screenshot][product-screenshot]](#)
**Bandit Runner** is a public, deterministic evaluation harness for large language models.
It transforms AI models into autonomous operators tasked with completing the **OverTheWire Bandit** wargame via SSH — entirely on Cloudflare Workers.
**Why it matters**
- Provides a real-world, hands-on benchmark for autonomous reasoning and command execution.
- Tests tool use (SSH), planning, error handling, and persistence under real network conditions.
- Generates reproducible, privacy-safe logs for research or public leaderboards.
### Core Concepts
- **Agent Role:** Each run instantiates an LLM as “BanditRunner” — a scripted, deterministic persona following a strict system prompt and command allow-list.
- **Environment:** Next.js frontend + OpenNext build → Cloudflare Workers backend (Durable Objects + D1 + R2).
- **Security:** Hard-scoped to `bandit.labs.overthewire.org:2220`.
All discovered passwords are redacted in logs and sealed in short-lived encrypted blobs.
- **Goal:** Advance from Level 0 → final level autonomously while documenting every decision.
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
### Built With
* [![Next.js][Next.js]][Next-url]
* [![React][React.js]][React-url]
* [![Cloudflare][Cloudflare-badge]][Cloudflare-url]
* [![OpenNext][OpenNext-badge]][OpenNext-url]
* [![Shadcn/UI][Shadcn-badge]][Shadcn-url]
* [![TypeScript][TypeScript-badge]][TypeScript-url]
* [![Drizzle ORM][Drizzle-badge]][Drizzle-url]
* [![pnpm][pnpm-badge]][pnpm-url]
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Getting Started
### Prerequisites
You need:
* **Node.js ≥ 20**
* **pnpm**
```bash
npm i -g pnpm
```
* **Wrangler 3 CLI**
```bash
npm i -g wrangler
```
* A Cloudflare account with access to:
* Durable Objects
* D1 Database
* R2 Storage
### Installation
1. Clone the repo
```bash
git clone https://git.biohazardvfx.com/Nicholai/bandit-runner.git
cd bandit-runner
```
2. Install dependencies
```bash
pnpm install
```
3. Copy and configure environment
```bash
cp .env.example .env.local
```
4. Build and run locally
```bash
pnpm dev
# or
wrangler dev
```
5. Deploy preview
```bash
pnpm build
wrangler deploy --env preview
```
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Usage
Once deployed, visit `/runs/new` to start a new evaluation.
Provide a model endpoint (OpenAI, OpenRouter, or self-hosted) and initiate a Bandit Run.
Each run:
* Spawns a Durable Object → “Run Coordinator”
* Connects to `bandit.labs.overthewire.org:2220`
* Executes controlled `ssh.connect` / `ssh.exec` / `ssh.close` operations
* Streams JSONL logs and commentary to the Live Viewer
Developers can extend:
* Scoring rules (`lib/scoring/verdicts.ts`)
* Level validators (`lib/scoring/validators.ts`)
* Model interfaces (`lib/ssh/tool-adapter.ts`)
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Architecture
```text
Next.js (App Router)
├── UI (Shadcn/UI)
│ ├─ LiveLog
│ └─ LevelCard
├── Edge API Routes (OpenNext)
│ ├─ /api/startRun
│ ├─ /api/toolInvoke
│ └─ /api/stream
└── Cloudflare Worker
├─ Durable Object: RunCoordinator
│ ├─ TCP connect() to Bandit
│ ├─ State machine (levels, caps, timers)
│ └─ Writes logs → R2
├─ D1 (metadata)
└─ R2 (artifacts)
```
*See `docs/ADR-001-architecture.md` for the detailed decision record.*
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Roadmap
* [x] Core runner architecture
* [x] JSONL log streaming
* [x] SSH tool scaffolding
* [ ] Add live leaderboard
* [ ] Add mock SSH server for tests
* [ ] Expand scoring heuristics
* [ ] Implement model-agnostic adapter layer
* [ ] Public demo page
See the [open issues](https://git.biohazardvfx.com/Nicholai/bandit-runner/issues) for the full roadmap.
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Contributing
Contributions are welcome.
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feat/amazing`)
3. Commit (`pnpm commit`) using Conventional Commits
4. Push (`git push origin feat/amazing`)
5. Open a Pull Request
### Top Contributors
<a href="https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors">
<img src="https://contrib.rocks/image?repo=Nicholai/bandit-runner" alt="Contributors" />
</a>
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## License
Distributed under the **GNU GPLv3** License.
See `LICENSE` for details.
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Contact
**Nicholai Vogel**
[Website](https://nicholai.work) • [LinkedIn](https://linkedin.com/in/nicholai-vogel) • [Instagram](https://instagram.com/nicholai.exe)
Project Link: [https://git.biohazardvfx.com/Nicholai/bandit-runner](https://git.biohazardvfx.com/Nicholai/bandit-runner)
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
## Acknowledgments
* [OverTheWire Bandit](https://overthewire.org/wargames/bandit/) — for the wargame challenge itself
* [Cloudflare Workers Docs](https://developers.cloudflare.com/workers/)
* [OpenNext](https://opennext.js.org/)
* [Shadcn/UI](https://ui.shadcn.com)
* [Drizzle ORM](https://orm.drizzle.team)
* [Choose a License](https://choosealicense.com)
* [Img Shields](https://shields.io)
* [Contrib.rocks](https://contrib.rocks)
<p align="right">(<a href="#readme-top">back to top</a>)</p>
---
<!-- MARKDOWN LINKS & IMAGES -->
[contributors-shield]: https://img.shields.io/github/contributors/Nicholai/bandit-runner.svg?style=for-the-badge
[contributors-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/Nicholai/bandit-runner.svg?style=for-the-badge
[forks-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/network/members
[stars-shield]: https://img.shields.io/github/stars/Nicholai/bandit-runner.svg?style=for-the-badge
[stars-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/stargazers
[issues-shield]: https://img.shields.io/github/issues/Nicholai/bandit-runner.svg?style=for-the-badge
[issues-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/issues
[license-shield]: https://img.shields.io/github/license/Nicholai/bandit-runner.svg?style=for-the-badge
[license-url]: https://git.biohazardvfx.com/Nicholai/bandit-runner/blob/main/COPYING.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://linkedin.com/in/nicholai-vogel
[product-screenshot]: public/screenshot.png
[Next.js]: https://img.shields.io/badge/Next.js-000000?style=for-the-badge&logo=nextdotjs&logoColor=white
[Next-url]: https://nextjs.org/
[React.js]: https://img.shields.io/badge/React-20232A?style=for-the-badge&logo=react&logoColor=61DAFB
[React-url]: https://react.dev/
[Cloudflare-badge]: https://img.shields.io/badge/Cloudflare%20Workers-F38020?style=for-the-badge&logo=cloudflare&logoColor=white
[Cloudflare-url]: https://developers.cloudflare.com/workers/
[OpenNext-badge]: https://img.shields.io/badge/OpenNext-18181B?style=for-the-badge&logo=vercel&logoColor=white
[OpenNext-url]: https://opennext.js.org/
[Shadcn-badge]: https://img.shields.io/badge/Shadcn%2FUI-000000?style=for-the-badge&logo=react&logoColor=white
[Shadcn-url]: https://ui.shadcn.com
[TypeScript-badge]: https://img.shields.io/badge/TypeScript-3178C6?style=for-the-badge&logo=typescript&logoColor=white
[TypeScript-url]: https://www.typescriptlang.org/
[Drizzle-badge]: https://img.shields.io/badge/Drizzle%20ORM-3E63DD?style=for-the-badge&logo=sqlite&logoColor=white
[Drizzle-url]: https://orm.drizzle.team
[pnpm-badge]: https://img.shields.io/badge/pnpm-F69220?style=for-the-badge&logo=pnpm&logoColor=white
[pnpm-url]: https://pnpm.io
[conventional-commits-badge]: https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?style=for-the-badge&logo=conventionalcommits&logoColor=white