nicholai e934d047b0 added example runs

2025-10-09 22:03:37 -06:00

10 KiB

Raw Blame History

LangGraph Agent Framework - Implementation Summary

Overview

Successfully implemented a comprehensive LangGraph.js-based agentic framework for the Bandit Runner application. The framework runs entirely in Cloudflare Durable Objects and provides a beautiful retro terminal UI for interacting with autonomous agents that solve the OverTheWire Bandit wargame.

✅ What Was Implemented

1. Core Backend Components

State Management (`src/lib/agents/bandit-state.ts`)

Comprehensive TypeScript interfaces for agent state
Level goals for all 34 Bandit levels
Command and thought log tracking
Checkpoint system for pause/resume functionality

LLM Provider Layer (`src/lib/agents/llm-provider.ts`)

OpenRouter integration supporting multiple models:
- OpenAI (GPT-4o, GPT-4o Mini)
- Anthropic (Claude 3.5 Sonnet, Claude 3 Haiku)
- Meta (Llama 3.1)
- DeepSeek, Gemini, Mistral, and more
Streaming and non-streaming response modes
Abstraction layer for easy provider switching

SSH Tool Wrappers (`src/lib/agents/tools.ts`)

ssh_connect - Establish SSH connections
ssh_exec - Execute commands with safety allowlist
validate_password - Test passwords via SSH
ssh_disconnect - Close connections
Command validation and security checks

LangGraph State Machine (`src/lib/agents/graph.ts`)

State graph with nodes:
- plan_level - LLM plans next command
- execute_command - Runs SSH command
- validate_result - Checks for password
- advance_level - Moves to next level
Conditional edges based on agent status
Integration with LangChain tools

Error Handling (`src/lib/agents/error-handler.ts`)

Error classification (network, SSH, timeout, API)
Retry strategies with exponential backoff
Cost tracking for LLM API calls
Spending limit enforcement

Storage Layer (`src/lib/storage/run-storage.ts`)

Durable Object storage interface
D1 database schema for run metadata
R2 storage for JSONL logs
Password vault with encryption
Data lifecycle management (DO → D1 → R2)

2. Durable Object Implementation

BanditAgentDO (`src/lib/durable-objects/BanditAgentDO.ts`)

Runs LangGraph state machine
WebSocket server for real-time streaming
HTTP endpoints for agent control:
- /start - Start new run
- /pause - Pause execution
- /resume - Resume from checkpoint
- /command - Manual command injection
- /retry - Retry current level
- /status - Get current state
Alarm-based auto-cleanup after 2 hours
State persistence in DO storage

3. API Routes

Agent Lifecycle (`src/app/api/agent/[runId]/route.ts`)

POST endpoints for all agent actions
GET endpoint for status queries
Durable Object proxy layer
Error handling and validation

WebSocket Route (`src/app/api/agent/[runId]/ws/route.ts`)

WebSocket upgrade handling
Bidirectional communication with Durable Object
Real-time event streaming

4. Frontend Components

WebSocket Hook (`src/hooks/useAgentWebSocket.ts`)

React hook for WebSocket management
Auto-reconnect with exponential backoff
Event handlers for terminal and chat updates
Connection state tracking
Ping/pong keep-alive

Agent Control Panel (`src/components/agent-control-panel.tsx`)

Model selection dropdown
Level range selector (0-33)
Streaming mode toggle
Start/Pause/Resume/Stop buttons
Status indicators (idle/running/paused/complete/failed)
Connection status display

Enhanced Terminal Interface (`src/components/terminal-chat-interface.tsx`)

Integrated with WebSocket for real-time updates
Split-pane layout (terminal left, agent chat right)
Command history with arrow keys
Panel switching (Ctrl+K/J, ESC)
Support for manual intervention when paused
Beautiful retro styling with scan lines and grid patterns
System messages for agent events
Thinking indicators

5. WebSocket Event System

Event Handlers (`src/lib/websocket/agent-events.ts`)

Standardized event types:
- terminal_output - Command execution
- agent_message - Agent commentary
- thinking - Agent reasoning
- tool_call - Tool execution
- level_complete - Level advancement
- run_complete - Full run completion
- error - Error messages
Event routing to terminal and chat displays
Timestamp and metadata tracking

6. Configuration

Wrangler (`wrangler.jsonc`)

Durable Object bindings configured
Environment variables for SSH proxy
Placeholders for D1 and R2 (ready to uncomment)
Secret management instructions

TypeScript (`src/types/env.d.ts`)

Environment type declarations
Cloudflare binding types
Type safety for all env variables

7. Dependencies Installed

{
  "@langchain/langgraph": "latest",
  "@langchain/core": "latest", 
  "@langchain/openai": "latest",
  "zod": "latest"
}

🚧 What Still Needs to Be Done

1. SSH Proxy Service (CRITICAL)

Build the Node.js SSH proxy (see SSH-PROXY-README.md)
Deploy to Fly.io/Railway/Render
Update SSH_PROXY_URL in wrangler.jsonc

2. Durable Object Export

Export BanditAgentDO in worker entry point
Configure migration tag for DO deployment

3. LangGraph Integration Refinement

Test graph execution in Workers environment
May need to use @langchain/langgraph/web entry point
Add manual config passing to avoid async_hooks issues
Integrate actual tool execution (currently mocked)

4. D1 and R2 Setup

Create D1 database: wrangler d1 create bandit-runs
Run schema migrations
Create R2 bucket: wrangler r2 bucket create bandit-logs
Uncomment bindings in wrangler.jsonc

5. Secrets Configuration

wrangler secret put OPENROUTER_API_KEY
wrangler secret put ENCRYPTION_KEY

6. Testing

Unit tests for graph nodes
Integration tests for WebSocket
Mock SSH proxy for development
Load testing for concurrent runs

7. Advanced Features (Future)

Run history and comparison UI
Export functionality (JSONL, CSV)
Keyboard shortcuts reference modal
Run templates and presets
Cost analytics dashboard
Multi-run leaderboard

📝 Key Files Created

bandit-runner-app/src/
├── lib/
│   ├── agents/
│   │   ├── bandit-state.ts        (State schema, level goals)
│   │   ├── llm-provider.ts        (OpenRouter integration)
│   │   ├── tools.ts               (SSH tool wrappers)
│   │   ├── graph.ts               (LangGraph state machine)
│   │   └── error-handler.ts       (Retry logic, cost tracking)
│   ├── durable-objects/
│   │   └── BanditAgentDO.ts       (Durable Object implementation)
│   ├── storage/
│   │   └── run-storage.ts         (DO/D1/R2 storage layer)
│   └── websocket/
│       └── agent-events.ts        (Event handlers)
├── app/api/agent/[runId]/
│   ├── route.ts                   (HTTP API routes)
│   └── ws/
│       └── route.ts               (WebSocket route)
├── components/
│   ├── agent-control-panel.tsx    (Control panel UI)
│   └── terminal-chat-interface.tsx (Enhanced terminal)
├── hooks/
│   └── useAgentWebSocket.ts       (WebSocket React hook)
└── types/
    └── env.d.ts                   (Environment types)

🎯 How to Use

1. Local Development

cd bandit-runner-app
pnpm install
pnpm dev

2. Configure SSH Proxy

Build and deploy the SSH proxy service (see SSH-PROXY-README.md), then update:

export SSH_PROXY_URL=https://your-proxy.fly.dev

3. Set API Key

export OPENROUTER_API_KEY=sk-or-...

4. Start a Run

Open http://localhost:3000
Select a model (e.g., GPT-4o Mini)
Choose level range (e.g., 0-5)
Click START
Watch the agent work in real-time!

5. Manual Intervention

Click PAUSE to stop the agent
Type commands in the terminal (left pane)
Message the agent in chat (right pane)
Click RESUME to continue

🏗️ Architecture Highlights

Execution Flow

User clicks START
    ↓
POST /api/agent/{runId}/start
    ↓
Durable Object spawned/retrieved
    ↓
LangGraph state machine initialized
    ↓
WebSocket connection established
    ↓
Graph executes: plan → execute → validate → advance
    ↓
Events streamed to UI in real-time
    ↓
State checkpointed in DO storage
    ↓
On completion: metadata saved to D1, logs to R2

WebSocket Event Flow

Durable Object
    ↓ (WebSocket)
API Route /ws
    ↓
useAgentWebSocket hook
    ↓ (handleAgentEvent)
Terminal/Chat UI updates

State Persistence

Active State: Durable Object (in-memory)
Checkpoints: Durable Object storage
Metadata: D1 Database (when configured)
Logs: R2 Bucket (when configured)

🎨 UI Features

Retro Terminal Aesthetic: Scan lines, grid patterns, CRT-style
Dual Panels: Terminal (left) + Agent Chat (right)
Real-time Updates: WebSocket streaming
Status Indicators: Connection, run state, level progress
Model Selection: 10+ LLM models via OpenRouter
Manual Control: Pause, resume, manual commands
Keyboard Navigation: Ctrl+K/J panel switching, arrow keys for history

🔐 Security

SSH target hardcoded to bandit.labs.overthewire.org:2220
Command allowlist enforcement
Password redaction in logs
R2 encryption for sensitive data
Rate limiting (to be implemented)
Automatic cleanup of stale runs

📊 Next Steps

Deploy SSH Proxy - Build from SSH-PROXY-README.md
Test Integration - Run end-to-end test with a simple level
Refine LangGraph - Ensure Workers compatibility
Add D1/R2 - Set up persistent storage
Production Deploy - Deploy to Cloudflare Workers
Monitor & Iterate - Track performance, costs, success rates

🎉 What's Amazing

Full LangGraph.js in Cloudflare Durable Objects
Multi-LLM Support via OpenRouter (10+ models)
Beautiful UI with retro terminal aesthetic
Real-time Streaming via WebSocket
Pause/Resume with state checkpointing
Manual Intervention for debugging
Extensible architecture for future features

🙏 Acknowledgments

Built on Next.js, OpenNext, Cloudflare Workers
Powered by LangGraph.js and LangChain
UI components from shadcn/ui
Inspired by the OverTheWire Bandit wargame

10 KiB Raw Blame History