bandit-runner/RETRY-FUNCTIONALITY-STATUS.md
2025-10-13 10:21:50 -06:00

5.9 KiB

Retry Functionality Implementation Status

Date: 2025-10-10

Summary

The max-retries modal implementation is 95% complete. The modal appears correctly, but the retry button functionality has one remaining bug.

What Works

  1. Modal Appears Correctly

    • Agent hits max retries at any level
    • paused_for_user_action status is emitted from SSH proxy
    • DO detects the status and emits user_action_required event
    • Frontend displays the modal with three options: Stop, Intervene, Continue
  2. Agent Flow

    • Successfully completes Level 0
    • Advances to Level 1 automatically
    • Hits max retries on Level 1 (as expected - the password file has a special character)
    • Pauses and shows modal
  3. UI/UX

    • Terminal shows all commands and output
    • Chat panel shows thinking messages
    • Token count and cost tracking working
    • Modal message is clear and actionable

What's Broken

The /retry Endpoint Returns 400

Symptom:

  • When user clicks "Continue" in the modal, the frontend makes a POST to /api/agent/run-{id}/retry
  • The DO's retryLevel() method returns 400: "No paused run to resume"

Root Cause: The run_complete event from the SSH proxy is setting this.state.status back to 'complete' even though we added protection in updateStateFromEvent. The issue is timing:

  1. SSH proxy emits paused_for_user_action → DO sets status = 'paused'
  2. SSH proxy ends the graph → emits run_complete
  3. DO receives run_completeupdateStateFromEvent runs
  4. Even though we check if (this.state.status !== 'paused'), something is still overriding it

Code Context:

// In retryLevel():
if (!this.state) {
  return new Response(JSON.stringify({ error: "No active run" }), {
    status: 400,
  })
}
// This check passes, but then something happens that makes the retry fail

Files Modified (Complete List)

SSH Proxy

  1. ssh-proxy/agent.ts

    • Added 'paused_for_user_action' to status type
    • Modified validateResult to return paused_for_user_action instead of failed on max retries
    • Modified shouldContinue to handle paused_for_user_action
    • Modified run method to accept initialState parameter for rehydration
  2. ssh-proxy/server.ts

    • Modified /agent/run endpoint to accept initialState in request body
    • Pass initialState to agent.run()

Frontend (bandit-runner-app)

  1. src/lib/agents/bandit-state.ts

    • Added 'paused_for_user_action' to status type
  2. src/app/api/agent/[runId]/retry/route.ts

    • NEW FILE: Created route handler for retry endpoint
  3. src/components/terminal-chat-interface.tsx

    • Reverted visual styling to match original design

Durable Object

  1. workers/bandit-agent-do/src/index.ts
    • Added 'paused_for_user_action' to BanditAgentState status type
    • Added initialState?: Partial<BanditAgentState> to RunConfig interface
    • Modified startRun to persist full state after initialization
    • Modified runAgentViaProxy to pass initialState in request body
    • Added explicit detection for paused_for_user_action in event stream loop
    • Modified updateStateFromEvent to not override 'paused' status on run_complete or error events
    • Modified retryLevel to include initialState in RunConfig
    • Modified resumeRun to include initialState in RunConfig
    • Fixed handlePost to correctly handle endpoints with/without request bodies

Next Steps to Fix

Option 1: Add a "retry pending" flag

Add a flag that prevents status changes after retry is clicked:

private retryPending: boolean = false

// In retryLevel():
this.retryPending = true
this.state.status = 'planning'
// ... rest of retry logic

// In updateStateFromEvent():
if (this.retryPending) return // Don't update state during retry transition

Option 2: Check for initialState presence instead of status

Modify retryLevel to not check status at all, just check if state exists:

private async retryLevel(): Promise<Response> {
  if (!this.state || !this.state.runId) {
    return new Response(JSON.stringify({ error: "No active run" }), {
      status: 400,
    })
  }
  // Don't check status - just proceed with retry
  this.state.retryCount = 0
  this.state.status = 'planning'
  //... rest
}

Option 3: Use a separate "retryable" field

Add a field to track if retry is allowed:

interface BanditAgentState {
  // ... existing fields
  retryable: boolean // Set to true when max retries hit
}

// In retryLevel():
if (!this.state || !this.state.retryable) {
  return new Response(JSON.stringify({ error: "No retryable run" }), {
    status: 400,
  })
}

Test Results

Successful Test Flow

  1. Start run with GPT-4o-mini
  2. Agent completes Level 0 (finds password in readme)
  3. Agent advances to Level 1
  4. Agent tries multiple commands: cat ./-, cat < -, cat -
  5. Max retries reached after 3 failed attempts
  6. Modal appears with correct message
  7. Click "Continue" → 400 error

Modal Content (Verified Correct)

Max Retries Reached

The agent has reached the maximum retry limit (3) for Level 1.

Max retries reached for level 1

What would you like to do?
• Stop: End the run completely
• Intervene: Enable manual mode to help the agent
• Continue: Reset retry count and let the agent try again

[Stop] [Intervene] [Continue]

Deployment Status

All changes have been deployed:

  • SSH Proxy deployed to Fly.io
  • Main app deployed to Cloudflare Workers
  • Durable Object worker deployed separately
  • /retry route exists and routes correctly to DO

Recommendation

Implement Option 2 (remove status check) as the quickest fix. The presence of this.state with a valid runId is sufficient validation. The status will be set to 'planning' immediately anyway, so checking for 'paused' status is unnecessary and causes the race condition.