10 KiB
10 KiB
LangGraph Agent Framework - Implementation Summary
Overview
Successfully implemented a comprehensive LangGraph.js-based agentic framework for the Bandit Runner application. The framework runs entirely in Cloudflare Durable Objects and provides a beautiful retro terminal UI for interacting with autonomous agents that solve the OverTheWire Bandit wargame.
✅ What Was Implemented
1. Core Backend Components
State Management (src/lib/agents/bandit-state.ts)
- Comprehensive TypeScript interfaces for agent state
- Level goals for all 34 Bandit levels
- Command and thought log tracking
- Checkpoint system for pause/resume functionality
LLM Provider Layer (src/lib/agents/llm-provider.ts)
- OpenRouter integration supporting multiple models:
- OpenAI (GPT-4o, GPT-4o Mini)
- Anthropic (Claude 3.5 Sonnet, Claude 3 Haiku)
- Meta (Llama 3.1)
- DeepSeek, Gemini, Mistral, and more
- Streaming and non-streaming response modes
- Abstraction layer for easy provider switching
SSH Tool Wrappers (src/lib/agents/tools.ts)
ssh_connect- Establish SSH connectionsssh_exec- Execute commands with safety allowlistvalidate_password- Test passwords via SSHssh_disconnect- Close connections- Command validation and security checks
LangGraph State Machine (src/lib/agents/graph.ts)
- State graph with nodes:
plan_level- LLM plans next commandexecute_command- Runs SSH commandvalidate_result- Checks for passwordadvance_level- Moves to next level
- Conditional edges based on agent status
- Integration with LangChain tools
Error Handling (src/lib/agents/error-handler.ts)
- Error classification (network, SSH, timeout, API)
- Retry strategies with exponential backoff
- Cost tracking for LLM API calls
- Spending limit enforcement
Storage Layer (src/lib/storage/run-storage.ts)
- Durable Object storage interface
- D1 database schema for run metadata
- R2 storage for JSONL logs
- Password vault with encryption
- Data lifecycle management (DO → D1 → R2)
2. Durable Object Implementation
BanditAgentDO (src/lib/durable-objects/BanditAgentDO.ts)
- Runs LangGraph state machine
- WebSocket server for real-time streaming
- HTTP endpoints for agent control:
/start- Start new run/pause- Pause execution/resume- Resume from checkpoint/command- Manual command injection/retry- Retry current level/status- Get current state
- Alarm-based auto-cleanup after 2 hours
- State persistence in DO storage
3. API Routes
Agent Lifecycle (src/app/api/agent/[runId]/route.ts)
- POST endpoints for all agent actions
- GET endpoint for status queries
- Durable Object proxy layer
- Error handling and validation
WebSocket Route (src/app/api/agent/[runId]/ws/route.ts)
- WebSocket upgrade handling
- Bidirectional communication with Durable Object
- Real-time event streaming
4. Frontend Components
WebSocket Hook (src/hooks/useAgentWebSocket.ts)
- React hook for WebSocket management
- Auto-reconnect with exponential backoff
- Event handlers for terminal and chat updates
- Connection state tracking
- Ping/pong keep-alive
Agent Control Panel (src/components/agent-control-panel.tsx)
- Model selection dropdown
- Level range selector (0-33)
- Streaming mode toggle
- Start/Pause/Resume/Stop buttons
- Status indicators (idle/running/paused/complete/failed)
- Connection status display
Enhanced Terminal Interface (src/components/terminal-chat-interface.tsx)
- Integrated with WebSocket for real-time updates
- Split-pane layout (terminal left, agent chat right)
- Command history with arrow keys
- Panel switching (Ctrl+K/J, ESC)
- Support for manual intervention when paused
- Beautiful retro styling with scan lines and grid patterns
- System messages for agent events
- Thinking indicators
5. WebSocket Event System
Event Handlers (src/lib/websocket/agent-events.ts)
- Standardized event types:
terminal_output- Command executionagent_message- Agent commentarythinking- Agent reasoningtool_call- Tool executionlevel_complete- Level advancementrun_complete- Full run completionerror- Error messages
- Event routing to terminal and chat displays
- Timestamp and metadata tracking
6. Configuration
Wrangler (wrangler.jsonc)
- Durable Object bindings configured
- Environment variables for SSH proxy
- Placeholders for D1 and R2 (ready to uncomment)
- Secret management instructions
TypeScript (src/types/env.d.ts)
- Environment type declarations
- Cloudflare binding types
- Type safety for all env variables
7. Dependencies Installed
{
"@langchain/langgraph": "latest",
"@langchain/core": "latest",
"@langchain/openai": "latest",
"zod": "latest"
}
🚧 What Still Needs to Be Done
1. SSH Proxy Service (CRITICAL)
- Build the Node.js SSH proxy (see
SSH-PROXY-README.md) - Deploy to Fly.io/Railway/Render
- Update
SSH_PROXY_URLin wrangler.jsonc
2. Durable Object Export
- Export BanditAgentDO in worker entry point
- Configure migration tag for DO deployment
3. LangGraph Integration Refinement
- Test graph execution in Workers environment
- May need to use
@langchain/langgraph/webentry point - Add manual config passing to avoid
async_hooksissues - Integrate actual tool execution (currently mocked)
4. D1 and R2 Setup
- Create D1 database:
wrangler d1 create bandit-runs - Run schema migrations
- Create R2 bucket:
wrangler r2 bucket create bandit-logs - Uncomment bindings in wrangler.jsonc
5. Secrets Configuration
wrangler secret put OPENROUTER_API_KEY
wrangler secret put ENCRYPTION_KEY
6. Testing
- Unit tests for graph nodes
- Integration tests for WebSocket
- Mock SSH proxy for development
- Load testing for concurrent runs
7. Advanced Features (Future)
- Run history and comparison UI
- Export functionality (JSONL, CSV)
- Keyboard shortcuts reference modal
- Run templates and presets
- Cost analytics dashboard
- Multi-run leaderboard
📝 Key Files Created
bandit-runner-app/src/
├── lib/
│ ├── agents/
│ │ ├── bandit-state.ts (State schema, level goals)
│ │ ├── llm-provider.ts (OpenRouter integration)
│ │ ├── tools.ts (SSH tool wrappers)
│ │ ├── graph.ts (LangGraph state machine)
│ │ └── error-handler.ts (Retry logic, cost tracking)
│ ├── durable-objects/
│ │ └── BanditAgentDO.ts (Durable Object implementation)
│ ├── storage/
│ │ └── run-storage.ts (DO/D1/R2 storage layer)
│ └── websocket/
│ └── agent-events.ts (Event handlers)
├── app/api/agent/[runId]/
│ ├── route.ts (HTTP API routes)
│ └── ws/
│ └── route.ts (WebSocket route)
├── components/
│ ├── agent-control-panel.tsx (Control panel UI)
│ └── terminal-chat-interface.tsx (Enhanced terminal)
├── hooks/
│ └── useAgentWebSocket.ts (WebSocket React hook)
└── types/
└── env.d.ts (Environment types)
🎯 How to Use
1. Local Development
cd bandit-runner-app
pnpm install
pnpm dev
2. Configure SSH Proxy
Build and deploy the SSH proxy service (see SSH-PROXY-README.md), then update:
export SSH_PROXY_URL=https://your-proxy.fly.dev
3. Set API Key
export OPENROUTER_API_KEY=sk-or-...
4. Start a Run
- Open http://localhost:3000
- Select a model (e.g., GPT-4o Mini)
- Choose level range (e.g., 0-5)
- Click START
- Watch the agent work in real-time!
5. Manual Intervention
- Click PAUSE to stop the agent
- Type commands in the terminal (left pane)
- Message the agent in chat (right pane)
- Click RESUME to continue
🏗️ Architecture Highlights
Execution Flow
User clicks START
↓
POST /api/agent/{runId}/start
↓
Durable Object spawned/retrieved
↓
LangGraph state machine initialized
↓
WebSocket connection established
↓
Graph executes: plan → execute → validate → advance
↓
Events streamed to UI in real-time
↓
State checkpointed in DO storage
↓
On completion: metadata saved to D1, logs to R2
WebSocket Event Flow
Durable Object
↓ (WebSocket)
API Route /ws
↓
useAgentWebSocket hook
↓ (handleAgentEvent)
Terminal/Chat UI updates
State Persistence
Active State: Durable Object (in-memory)
Checkpoints: Durable Object storage
Metadata: D1 Database (when configured)
Logs: R2 Bucket (when configured)
🎨 UI Features
- Retro Terminal Aesthetic: Scan lines, grid patterns, CRT-style
- Dual Panels: Terminal (left) + Agent Chat (right)
- Real-time Updates: WebSocket streaming
- Status Indicators: Connection, run state, level progress
- Model Selection: 10+ LLM models via OpenRouter
- Manual Control: Pause, resume, manual commands
- Keyboard Navigation: Ctrl+K/J panel switching, arrow keys for history
🔐 Security
- SSH target hardcoded to
bandit.labs.overthewire.org:2220 - Command allowlist enforcement
- Password redaction in logs
- R2 encryption for sensitive data
- Rate limiting (to be implemented)
- Automatic cleanup of stale runs
📊 Next Steps
- Deploy SSH Proxy - Build from
SSH-PROXY-README.md - Test Integration - Run end-to-end test with a simple level
- Refine LangGraph - Ensure Workers compatibility
- Add D1/R2 - Set up persistent storage
- Production Deploy - Deploy to Cloudflare Workers
- Monitor & Iterate - Track performance, costs, success rates
🎉 What's Amazing
- Full LangGraph.js in Cloudflare Durable Objects
- Multi-LLM Support via OpenRouter (10+ models)
- Beautiful UI with retro terminal aesthetic
- Real-time Streaming via WebSocket
- Pause/Resume with state checkpointing
- Manual Intervention for debugging
- Extensible architecture for future features
🙏 Acknowledgments
- Built on Next.js, OpenNext, Cloudflare Workers
- Powered by LangGraph.js and LangChain
- UI components from shadcn/ui
- Inspired by the OverTheWire Bandit wargame