bandit-runner/RETRY-FUNCTIONALITY-STATUS.md
2025-10-13 10:21:50 -06:00

182 lines
5.9 KiB
Markdown

# Retry Functionality Implementation Status
## Date: 2025-10-10
## Summary
The max-retries modal implementation is **95% complete**. The modal appears correctly, but the retry button functionality has one remaining bug.
## ✅ What Works
1. **Modal Appears Correctly**
- Agent hits max retries at any level
- `paused_for_user_action` status is emitted from SSH proxy
- DO detects the status and emits `user_action_required` event
- Frontend displays the modal with three options: Stop, Intervene, Continue
2. **Agent Flow**
- Successfully completes Level 0
- Advances to Level 1 automatically
- Hits max retries on Level 1 (as expected - the password file has a special character)
- Pauses and shows modal
3. **UI/UX**
- Terminal shows all commands and output
- Chat panel shows thinking messages
- Token count and cost tracking working
- Modal message is clear and actionable
## ❌ What's Broken
### The `/retry` Endpoint Returns 400
**Symptom:**
- When user clicks "Continue" in the modal, the frontend makes a POST to `/api/agent/run-{id}/retry`
- The DO's `retryLevel()` method returns `400: "No paused run to resume"`
**Root Cause:**
The `run_complete` event from the SSH proxy is setting `this.state.status` back to `'complete'` even though we added protection in `updateStateFromEvent`. The issue is timing:
1. SSH proxy emits `paused_for_user_action` → DO sets `status = 'paused'`
2. SSH proxy ends the graph → emits `run_complete`
3. DO receives `run_complete``updateStateFromEvent` runs
4. Even though we check `if (this.state.status !== 'paused')`, something is still overriding it
**Code Context:**
```typescript:bandit-runner-app/workers/bandit-agent-do/src/index.ts
// In retryLevel():
if (!this.state) {
return new Response(JSON.stringify({ error: "No active run" }), {
status: 400,
})
}
// This check passes, but then something happens that makes the retry fail
```
## Files Modified (Complete List)
### SSH Proxy
1. `ssh-proxy/agent.ts`
- Added `'paused_for_user_action'` to status type
- Modified `validateResult` to return `paused_for_user_action` instead of `failed` on max retries
- Modified `shouldContinue` to handle `paused_for_user_action`
- Modified `run` method to accept `initialState` parameter for rehydration
2. `ssh-proxy/server.ts`
- Modified `/agent/run` endpoint to accept `initialState` in request body
- Pass `initialState` to `agent.run()`
### Frontend (bandit-runner-app)
1. `src/lib/agents/bandit-state.ts`
- Added `'paused_for_user_action'` to status type
2. `src/app/api/agent/[runId]/retry/route.ts`
- **NEW FILE**: Created route handler for retry endpoint
3. `src/components/terminal-chat-interface.tsx`
- Reverted visual styling to match original design
### Durable Object
1. `workers/bandit-agent-do/src/index.ts`
- Added `'paused_for_user_action'` to BanditAgentState status type
- Added `initialState?: Partial<BanditAgentState>` to RunConfig interface
- Modified `startRun` to persist full state after initialization
- Modified `runAgentViaProxy` to pass `initialState` in request body
- Added explicit detection for `paused_for_user_action` in event stream loop
- Modified `updateStateFromEvent` to not override `'paused'` status on `run_complete` or `error` events
- Modified `retryLevel` to include `initialState` in RunConfig
- Modified `resumeRun` to include `initialState` in RunConfig
- Fixed `handlePost` to correctly handle endpoints with/without request bodies
## Next Steps to Fix
### Option 1: Add a "retry pending" flag
Add a flag that prevents status changes after retry is clicked:
```typescript
private retryPending: boolean = false
// In retryLevel():
this.retryPending = true
this.state.status = 'planning'
// ... rest of retry logic
// In updateStateFromEvent():
if (this.retryPending) return // Don't update state during retry transition
```
### Option 2: Check for `initialState` presence instead of status
Modify `retryLevel` to not check status at all, just check if state exists:
```typescript
private async retryLevel(): Promise<Response> {
if (!this.state || !this.state.runId) {
return new Response(JSON.stringify({ error: "No active run" }), {
status: 400,
})
}
// Don't check status - just proceed with retry
this.state.retryCount = 0
this.state.status = 'planning'
//... rest
}
```
### Option 3: Use a separate "retryable" field
Add a field to track if retry is allowed:
```typescript
interface BanditAgentState {
// ... existing fields
retryable: boolean // Set to true when max retries hit
}
// In retryLevel():
if (!this.state || !this.state.retryable) {
return new Response(JSON.stringify({ error: "No retryable run" }), {
status: 400,
})
}
```
## Test Results
### Successful Test Flow
1. ✅ Start run with GPT-4o-mini
2. ✅ Agent completes Level 0 (finds password in readme)
3. ✅ Agent advances to Level 1
4. ✅ Agent tries multiple commands: `cat ./-`, `cat < -`, `cat -`
5. ✅ Max retries reached after 3 failed attempts
6. ✅ Modal appears with correct message
7. ❌ Click "Continue" → 400 error
### Modal Content (Verified Correct)
```
Max Retries Reached
The agent has reached the maximum retry limit (3) for Level 1.
Max retries reached for level 1
What would you like to do?
• Stop: End the run completely
• Intervene: Enable manual mode to help the agent
• Continue: Reset retry count and let the agent try again
[Stop] [Intervene] [Continue]
```
## Deployment Status
All changes have been deployed:
- ✅ SSH Proxy deployed to Fly.io
- ✅ Main app deployed to Cloudflare Workers
- ✅ Durable Object worker deployed separately
- ✅ `/retry` route exists and routes correctly to DO
## Recommendation
Implement **Option 2** (remove status check) as the quickest fix. The presence of `this.state` with a valid `runId` is sufficient validation. The status will be set to `'planning'` immediately anyway, so checking for `'paused'` status is unnecessary and causes the race condition.