2025-10-13 10:21:50 -06:00

5.9 KiB

Raw Permalink Blame History

Retry Functionality Implementation Status

Date: 2025-10-10

Summary

The max-retries modal implementation is 95% complete. The modal appears correctly, but the retry button functionality has one remaining bug.

✅ What Works

Modal Appears Correctly
- Agent hits max retries at any level
- paused_for_user_action status is emitted from SSH proxy
- DO detects the status and emits user_action_required event
- Frontend displays the modal with three options: Stop, Intervene, Continue
Agent Flow
- Successfully completes Level 0
- Advances to Level 1 automatically
- Hits max retries on Level 1 (as expected - the password file has a special character)
- Pauses and shows modal
UI/UX
- Terminal shows all commands and output
- Chat panel shows thinking messages
- Token count and cost tracking working
- Modal message is clear and actionable

❌ What's Broken

The `/retry` Endpoint Returns 400

Symptom:

When user clicks "Continue" in the modal, the frontend makes a POST to /api/agent/run-{id}/retry
The DO's retryLevel() method returns 400: "No paused run to resume"

Root Cause: The run_complete event from the SSH proxy is setting this.state.status back to 'complete' even though we added protection in updateStateFromEvent. The issue is timing:

SSH proxy emits paused_for_user_action → DO sets status = 'paused'
SSH proxy ends the graph → emits run_complete
DO receives run_complete → updateStateFromEvent runs
Even though we check if (this.state.status !== 'paused'), something is still overriding it

Code Context:

// In retryLevel():
if (!this.state) {
  return new Response(JSON.stringify({ error: "No active run" }), {
    status: 400,
  })
}
// This check passes, but then something happens that makes the retry fail

Files Modified (Complete List)

SSH Proxy

ssh-proxy/agent.ts
- Added 'paused_for_user_action' to status type
- Modified validateResult to return paused_for_user_action instead of failed on max retries
- Modified shouldContinue to handle paused_for_user_action
- Modified run method to accept initialState parameter for rehydration
ssh-proxy/server.ts
- Modified /agent/run endpoint to accept initialState in request body
- Pass initialState to agent.run()

Frontend (bandit-runner-app)

src/lib/agents/bandit-state.ts
- Added 'paused_for_user_action' to status type
src/app/api/agent/[runId]/retry/route.ts
- NEW FILE: Created route handler for retry endpoint
src/components/terminal-chat-interface.tsx
- Reverted visual styling to match original design

Durable Object

workers/bandit-agent-do/src/index.ts
- Added 'paused_for_user_action' to BanditAgentState status type
- Added initialState?: Partial<BanditAgentState> to RunConfig interface
- Modified startRun to persist full state after initialization
- Modified runAgentViaProxy to pass initialState in request body
- Added explicit detection for paused_for_user_action in event stream loop
- Modified updateStateFromEvent to not override 'paused' status on run_complete or error events
- Modified retryLevel to include initialState in RunConfig
- Modified resumeRun to include initialState in RunConfig
- Fixed handlePost to correctly handle endpoints with/without request bodies

Next Steps to Fix

Option 1: Add a "retry pending" flag

Add a flag that prevents status changes after retry is clicked:

private retryPending: boolean = false

// In retryLevel():
this.retryPending = true
this.state.status = 'planning'
// ... rest of retry logic

// In updateStateFromEvent():
if (this.retryPending) return // Don't update state during retry transition

Option 2: Check for `initialState` presence instead of status

Modify retryLevel to not check status at all, just check if state exists:

private async retryLevel(): Promise<Response> {
  if (!this.state || !this.state.runId) {
    return new Response(JSON.stringify({ error: "No active run" }), {
      status: 400,
    })
  }
  // Don't check status - just proceed with retry
  this.state.retryCount = 0
  this.state.status = 'planning'
  //... rest
}

Option 3: Use a separate "retryable" field

Add a field to track if retry is allowed:

interface BanditAgentState {
  // ... existing fields
  retryable: boolean // Set to true when max retries hit
}

// In retryLevel():
if (!this.state || !this.state.retryable) {
  return new Response(JSON.stringify({ error: "No retryable run" }), {
    status: 400,
  })
}

Test Results

Successful Test Flow

✅ Start run with GPT-4o-mini
✅ Agent completes Level 0 (finds password in readme)
✅ Agent advances to Level 1
✅ Agent tries multiple commands: cat ./-, cat < -, cat -
✅ Max retries reached after 3 failed attempts
✅ Modal appears with correct message
❌ Click "Continue" → 400 error

Max Retries Reached

The agent has reached the maximum retry limit (3) for Level 1.

Max retries reached for level 1

What would you like to do?
• Stop: End the run completely
• Intervene: Enable manual mode to help the agent
• Continue: Reset retry count and let the agent try again

[Stop] [Intervene] [Continue]

Deployment Status

All changes have been deployed:

✅ SSH Proxy deployed to Fly.io
✅ Main app deployed to Cloudflare Workers
✅ Durable Object worker deployed separately
✅ /retry route exists and routes correctly to DO

Recommendation

Implement Option 2 (remove status check) as the quickest fix. The presence of this.state with a valid runId is sufficient validation. The status will be set to 'planning' immediately anyway, so checking for 'paused' status is unnecessary and causes the race condition.

5.9 KiB Raw Permalink Blame History