nicholai e934d047b0 added example runs

2025-10-09 22:03:37 -06:00

4.1 KiB

Raw Blame History

Quick Start Guide - Bandit Runner LangGraph Agent

TL;DR - Get Running in 5 Minutes

1. Install Dependencies

cd bandit-runner-app
pnpm install

✅ Already done! LangGraph.js, LangChain, and zod are installed.

2. Set Environment Variables

Create .env.local:

OPENROUTER_API_KEY=sk-or-v1-your-key-here
SSH_PROXY_URL=http://localhost:3001

Get OpenRouter API key: https://openrouter.ai/keys

3. Build SSH Proxy (Separate Terminal)

# In a new directory
mkdir ../ssh-proxy
cd ../ssh-proxy

# Follow SSH-PROXY-README.md or quick version:
npm init -y
npm install express ssh2 cors tsx
# Copy server code from SSH-PROXY-README.md
npm run dev

OR deploy to Fly.io for production (see SSH-PROXY-README.md)

4. Configure Durable Object

The framework is ready, but needs DO export. Create/update:

bandit-runner-app/worker-configuration.d.ts:

interface Env {
  BANDIT_AGENT: DurableObjectNamespace
}

5. Run Development Server

cd bandit-runner-app
pnpm dev

Open http://localhost:3000

6. Start Your First Run

Select model: GPT-4o Mini (fast and cheap)
Set levels: 0 to 2 (test run)
Click START
Watch the magic happen! ✨

What You'll See

Terminal (Left Panel):

$ ls -la
total 24
drwxr-xr-x    2 root   root   4096 ... .
...
$ cat readme
boJ9jbbUNNfktd78OOpsqOltutMc3MY1

Agent Chat (Right Panel):

AGENT: Planning next command for level 0...
AGENT: Executing 'ls -la' to explore the directory
AGENT: Found readme file, reading contents...
AGENT: Password extracted: boJ9jbbUNNfktd78OOpsqOltutMc3MY1
AGENT: Validating password for level 1...
AGENT: ✓ Level 0 → 1 complete!

Troubleshooting

WebSocket Not Connecting

Check SSH proxy is running on port 3001
Verify SSH_PROXY_URL in environment

LangGraph Errors

Make sure OPENROUTER_API_KEY is set
Check console for specific errors
Try with a simpler model first (GPT-4o Mini)

Durable Object Errors

Ensure wrangler.jsonc has DO bindings
May need to use wrangler dev instead of pnpm dev for DO support

Advanced Usage

Pause and Intervene

Click PAUSE during a run
Type manual commands in terminal
Message agent with hints in chat
Click RESUME to continue

Test Different Models

GPT-4o Mini    → Fast, cheap, good for testing
Claude 3 Haiku → Fast, accurate
GPT-4o         → Best reasoning
Claude 3.5     → Most capable (expensive)

Debug Mode

Watch the browser console for:

WebSocket events
LangGraph state transitions
Tool executions
Error details

Next Steps

✅ Get basic run working (Level 0-2)
📝 Deploy SSH proxy to production
🗄️ Set up D1 database for persistence
📦 Configure R2 for log storage
🚀 Deploy to Cloudflare Workers
🎯 Run full Bandit challenge (0-33)

Useful Commands

# Development
pnpm dev                    # Next.js dev server
wrangler dev                # Workers runtime with DO support

# Build
pnpm build                  # Production build
pnpm deploy                 # Deploy to Cloudflare

# Database
wrangler d1 create bandit-runs              # Create D1 database
wrangler d1 execute bandit-runs --file=schema.sql

# Secrets
wrangler secret put OPENROUTER_API_KEY
wrangler secret put ENCRYPTION_KEY

# Logs
wrangler tail                               # Live logs

Resources

Implementation Summary: IMPLEMENTATION-SUMMARY.md
SSH Proxy Guide: SSH-PROXY-README.md
Architecture Doc: docs/bandit-runner.md
System Prompt: docs/bandit/system-prompt.md

Getting Help

Check browser console for errors
Review IMPLEMENTATION-SUMMARY.md for architecture
Test SSH proxy separately: curl http://localhost:3001/ssh/health
Verify OpenRouter API key: https://openrouter.ai/activity

Success Metrics

You'll know it's working when:

✅ WebSocket shows "CONNECTED"
✅ Terminal shows agent commands
✅ Chat shows agent reasoning
✅ Level advances automatically
✅ No errors in console

Happy agent testing! 🎉

4.1 KiB Raw Blame History