8.9 KiB
Testing Guide - Bandit Runner LangGraph Agent
✅ Current Status
What's Working
- ✅ Build successful - no TypeScript errors
- ✅ Dev server starts on port 3002
- ✅ SSH proxy running on port 3001
- ✅ All components installed and configured
- ✅ Beautiful UI fully functional
- ✅ WebSocket infrastructure ready
What Needs Configuration
- ⚠️ OpenRouter API key (required for LLM)
- ⚠️ Durable Object export (works in production, limited in dev)
🚀 Quick Start Testing
1. Set Your OpenRouter API Key
Edit .dev.vars:
OPENROUTER_API_KEY=sk-or-v1-YOUR-ACTUAL-KEY-HERE
Get a key from: https://openrouter.ai/keys
2. Start the Application
cd bandit-runner-app
pnpm dev
Server will start on http://localhost:3002 (port 3000 was taken)
3. Test the UI
What You'll See:
- Beautiful retro terminal interface with control panel
- Model selection dropdown (GPT-4o, Claude, etc.)
- Level range selector (0-33)
- START/PAUSE/RESUME buttons
- Connection status indicators
Try These Actions:
- Select a model - Choose "GPT-4o Mini" (cheapest for testing)
- Set level range - Start with 0-2 (quick test)
- Click START - This will attempt to create a run
4. Expected Behavior (Current State)
⚠️ Known Limitation:
The Durable Object binding doesn't work in local dev mode (next dev). You'll see:
POST /api/agent/run-xxx/start - 500 (Durable Object binding not found)
This is expected! The warning message tells us:
"internal Durable Objects... will not work in local development, but they should work in production"
5. Testing Options
Option A: Test UI Without Backend (Current)
- UI works perfectly
- Control panel functional
- Model selection works
- WebSocket connection attempts (fails gracefully)
- You can type commands and messages in the interface
Option B: Use Wrangler Dev (Full Testing)
# Install wrangler globally if needed
npm i -g wrangler
# Run with Workers runtime
wrangler dev
# This gives you:
# ✅ Full Durable Object support
# ✅ Real WebSocket connections
# ✅ Actual agent runs
Option C: Deploy to Cloudflare (Production Testing)
# Build
pnpm build
# Deploy
wrangler deploy
# Test on:
# https://bandit-runner-app.your-account.workers.dev
🧪 Manual Testing Checklist
UI Testing (Works Now)
- Control panel displays correctly
- Model dropdown shows all options
- Level selectors work (0-33)
- Streaming mode toggle functional
- START button enabled when idle
- Status indicators show correct state
- Terminal panel renders
- Agent chat panel renders
- Command input accepts text
- Chat input accepts text
- Keyboard shortcuts work (Ctrl+K/J, ESC, arrow keys)
- Theme toggle works
- Retro styling (scan lines, grid) visible
Backend Testing (Requires Wrangler Dev)
- Start run creates Durable Object
- WebSocket connection established
- Agent begins planning
- SSH commands execute via proxy
- Terminal shows command output
- Chat shows agent thoughts
- Pause button stops execution
- Resume button continues
- Manual commands work when paused
- Level advancement works
- Run completes successfully
- Error handling works
- Retry logic functions
SSH Proxy Integration
Test your SSH proxy directly:
# Test connection
curl -X POST http://localhost:3001/ssh/connect \
-H "Content-Type: application/json" \
-d '{
"host":"bandit.labs.overthewire.org",
"port":2220,
"username":"bandit0",
"password":"bandit0"
}'
# Should return:
# {"connectionId":"conn-xxx","success":true,"message":"Connected successfully"}
# Test command execution
curl -X POST http://localhost:3001/ssh/exec \
-H "Content-Type: application/json" \
-d '{
"connectionId":"conn-xxx",
"command":"cat readme"
}'
# Should return:
# {"output":"boJ9jbbUNNfktd78OOpsqOltutMc3MY1\n","exitCode":0,"success":true}
🐛 Known Issues & Workarounds
Issue 1: Durable Object Not Found (Local Dev)
Error:
Durable Object binding not found
Cause: next dev uses standard Node.js runtime, not Workers runtime
Solutions:
- Use
wrangler devinstead ofpnpm dev - Deploy to Cloudflare for full testing
- Test UI functionality only in local dev
Issue 2: WebSocket Connection Failed
Error:
WebSocket connection error
connectionState: 'error'
Cause: Durable Object not available in local dev
Solution: Use wrangler dev or deploy to production
Issue 3: OpenRouter API Errors
Error:
401 Unauthorized / Invalid API key
Solution:
- Check
.dev.varshas correct API key - Verify key at https://openrouter.ai/activity
- Ensure key has credits
📊 Test Scenarios
Scenario 1: Simple Level Test (0-1)
Setup:
- Model: GPT-4o Mini
- Levels: 0 to 1
- Max retries: 3
Expected:
- Agent connects as bandit0
- Executes
ls -la - Finds
readmefile - Executes
cat readme - Extracts password:
boJ9jbbUNNfktd78OOpsqOltutMc3MY1 - Validates password
- Advances to level 1
- Completes successfully
Duration: ~30 seconds
Scenario 2: Multi-Level Test (0-5)
Setup:
- Model: Claude 3 Haiku or GPT-4o
- Levels: 0 to 5
- Max retries: 3
Expected:
- Each level solved systematically
- SSH connections maintained
- Checkpoints saved
- Total time: ~3-5 minutes
Scenario 3: Pause/Resume Test
Setup:
- Model: Any
- Levels: 0 to 3
- Pause after level 1
Expected:
- Start run
- Complete level 0-1
- Click PAUSE
- Type manual command:
pwd - See output in terminal
- Click RESUME
- Agent continues from level 1
Scenario 4: Error Recovery Test
Setup:
- Model: GPT-4o Mini
- Levels: 0 to 10
- Intentionally disconnect SSH mid-run
Expected:
- Agent detects error
- Retry logic kicks in
- Re-establishes connection
- Continues execution
📈 Success Criteria
Minimum Viable Test
- ✅ UI loads without errors
- ✅ SSH proxy connects to Bandit server
- ✅ Can start a run (even if it fails)
- ✅ WebSocket attempts connection
- ✅ Terminal displays messages
Full Integration Test
- ✅ Complete level 0-1 successfully
- ✅ Agent reasoning visible in chat
- ✅ Commands executed via SSH proxy
- ✅ Password validation works
- ✅ Level advancement automatic
- ✅ Pause/resume functional
- ✅ Manual intervention works
Production Ready
- ✅ Complete levels 0-10 reliably
- ✅ Error recovery working
- ✅ Cost tracking accurate
- ✅ Logs saved to R2 (when configured)
- ✅ Multiple concurrent runs supported
- ✅ All models work via OpenRouter
🔍 Debugging Tips
Check SSH Proxy Logs
# In your ssh-proxy terminal
# Should see connection requests
Check Browser Console
// Open DevTools (F12)
// Look for:
// - WebSocket connection attempts
// - API call results
// - Error messages
Check Network Tab
- API calls to
/api/agent/[runId]/start - WebSocket upgrade to
/api/agent/[runId]/ws - Response status codes
Check Wrangler Logs
# If using wrangler dev
# Ctrl+C to stop, logs show:
# - Durable Object creation
# - WebSocket messages
# - LangGraph execution
🎯 Next Steps
For Local Testing:
- ✅ SSH proxy running (you have this!)
- ✅ Set OpenRouter API key in
.dev.vars - ⏳ Switch to
wrangler devfor full testing - 🎉 Test complete run (level 0-2)
For Production:
- Create Cloudflare account
- Deploy with
wrangler deploy - Set secrets:
wrangler secret put OPENROUTER_API_KEY - Test on live URL
- Optional: Set up D1 and R2
🎨 Current UI Features You Can Test
Even without the backend, you can test:
- Theme toggle - Dark/light mode
- Panel switching - Ctrl+K/J or ESC
- Command history - Arrow up/down
- Model selection - All 10+ models listed
- Level range - Any combination 0-33
- Control buttons - START/PAUSE/RESUME visual states
- Status indicators - Connection and run state
- Retro effects - Scan lines, grid, CRT glow
- Responsive layout - Desktop and mobile
- Terminal styling - Monospace, colors, timestamps
- Chat formatting - User/agent message differentiation
📝 Test Results Template
## Test Run - [Date]
**Configuration:**
- Model: GPT-4o Mini
- Levels: 0-2
- Runtime: Wrangler Dev
**Results:**
- ✅ UI loaded correctly
- ✅ SSH proxy connected
- ✅ Agent started
- ✅ Level 0 completed (30s)
- ✅ Level 1 completed (45s)
- ❌ Level 2 failed (wrong command)
- Total time: 2m 15s
- Cost: $0.003
**Issues Found:**
- Agent confused by file with spaces in name
- Retry logic worked correctly
- Manual intervention successful
**Notes:**
- Claude 3 Haiku performed better on level 2
- Should increase timeout for decompression
🚀 Ready to Test!
You're all set! The implementation is complete. Start with UI testing, then move to wrangler dev for full integration testing.
Good luck! 🎉