bandit-runner/RETRY-FUNCTIONALITY-STATUS.md

# Retry Functionality Implementation Status

## Date: 2025-10-10

## Summary

The max-retries modal implementation is **95% complete**. The modal appears correctly, but the retry button functionality has one remaining bug.

## ✅ What Works

1. **Modal Appears Correctly**
   - Agent hits max retries at any level
   - `paused_for_user_action` status is emitted from SSH proxy
   - DO detects the status and emits `user_action_required` event
   - Frontend displays the modal with three options: Stop, Intervene, Continue

2. **Agent Flow**
   - Successfully completes Level 0
   - Advances to Level 1 automatically
   - Hits max retries on Level 1 (as expected - the password file has a special character)
   - Pauses and shows modal

3. **UI/UX**
   - Terminal shows all commands and output
   - Chat panel shows thinking messages
   - Token count and cost tracking working
   - Modal message is clear and actionable

## ❌ What's Broken

### The `/retry` Endpoint Returns 400

**Symptom:**
- When user clicks "Continue" in the modal, the frontend makes a POST to `/api/agent/run-{id}/retry`
- The DO's `retryLevel()` method returns `400: "No paused run to resume"`

**Root Cause:**
The `run_complete` event from the SSH proxy is setting `this.state.status` back to `'complete'` even though we added protection in `updateStateFromEvent`. The issue is timing:

1. SSH proxy emits `paused_for_user_action` → DO sets `status = 'paused'`
2. SSH proxy ends the graph → emits `run_complete`
3. DO receives `run_complete` → `updateStateFromEvent` runs
4. Even though we check `if (this.state.status !== 'paused')`, something is still overriding it

**Code Context:**

```typescript:bandit-runner-app/workers/bandit-agent-do/src/index.ts
// In retryLevel():
if (!this.state) {
  return new Response(JSON.stringify({ error: "No active run" }), {
    status: 400,
  })
}
// This check passes, but then something happens that makes the retry fail
```

## Files Modified (Complete List)

### SSH Proxy
1. `ssh-proxy/agent.ts`
   - Added `'paused_for_user_action'` to status type
   - Modified `validateResult` to return `paused_for_user_action` instead of `failed` on max retries
   - Modified `shouldContinue` to handle `paused_for_user_action`
   - Modified `run` method to accept `initialState` parameter for rehydration

2. `ssh-proxy/server.ts`
   - Modified `/agent/run` endpoint to accept `initialState` in request body
   - Pass `initialState` to `agent.run()`

### Frontend (bandit-runner-app)
1. `src/lib/agents/bandit-state.ts`
   - Added `'paused_for_user_action'` to status type

2. `src/app/api/agent/[runId]/retry/route.ts`
   - **NEW FILE**: Created route handler for retry endpoint

3. `src/components/terminal-chat-interface.tsx`
   - Reverted visual styling to match original design

### Durable Object
1. `workers/bandit-agent-do/src/index.ts`
   - Added `'paused_for_user_action'` to BanditAgentState status type
   - Added `initialState?: Partial<BanditAgentState>` to RunConfig interface
   - Modified `startRun` to persist full state after initialization
   - Modified `runAgentViaProxy` to pass `initialState` in request body
   - Added explicit detection for `paused_for_user_action` in event stream loop
   - Modified `updateStateFromEvent` to not override `'paused'` status on `run_complete` or `error` events
   - Modified `retryLevel` to include `initialState` in RunConfig
   - Modified `resumeRun` to include `initialState` in RunConfig
   - Fixed `handlePost` to correctly handle endpoints with/without request bodies

## Next Steps to Fix

### Option 1: Add a "retry pending" flag
Add a flag that prevents status changes after retry is clicked:

```typescript
private retryPending: boolean = false

// In retryLevel():
this.retryPending = true
this.state.status = 'planning'
// ... rest of retry logic

// In updateStateFromEvent():
if (this.retryPending) return // Don't update state during retry transition
```

### Option 2: Check for `initialState` presence instead of status
Modify `retryLevel` to not check status at all, just check if state exists:

```typescript
private async retryLevel(): Promise<Response> {
  if (!this.state || !this.state.runId) {
    return new Response(JSON.stringify({ error: "No active run" }), {
      status: 400,
    })
  }
  // Don't check status - just proceed with retry
  this.state.retryCount = 0
  this.state.status = 'planning'
  //... rest
}
```

### Option 3: Use a separate "retryable" field
Add a field to track if retry is allowed:

```typescript
interface BanditAgentState {
  // ... existing fields
  retryable: boolean // Set to true when max retries hit
}

// In retryLevel():
if (!this.state || !this.state.retryable) {
  return new Response(JSON.stringify({ error: "No retryable run" }), {
    status: 400,
  })
}
```

## Test Results

### Successful Test Flow
1. ✅ Start run with GPT-4o-mini
2. ✅ Agent completes Level 0 (finds password in readme)
3. ✅ Agent advances to Level 1
4. ✅ Agent tries multiple commands: `cat ./-`, `cat < -`, `cat -`
5. ✅ Max retries reached after 3 failed attempts
6. ✅ Modal appears with correct message
7. ❌ Click "Continue" → 400 error

### Modal Content (Verified Correct)
```
Max Retries Reached

The agent has reached the maximum retry limit (3) for Level 1.

Max retries reached for level 1

What would you like to do?
• Stop: End the run completely
• Intervene: Enable manual mode to help the agent
• Continue: Reset retry count and let the agent try again

[Stop] [Intervene] [Continue]
```

## Deployment Status

All changes have been deployed:
- ✅ SSH Proxy deployed to Fly.io
- ✅ Main app deployed to Cloudflare Workers
- ✅ Durable Object worker deployed separately
- ✅ `/retry` route exists and routes correctly to DO

## Recommendation

Implement **Option 2** (remove status check) as the quickest fix. The presence of `this.state` with a valid `runId` is sufficient validation. The status will be set to `'planning'` immediately anyway, so checking for `'paused'` status is unnecessary and causes the race condition.