182 lines
5.9 KiB
Markdown
182 lines
5.9 KiB
Markdown
# Retry Functionality Implementation Status
|
|
|
|
## Date: 2025-10-10
|
|
|
|
## Summary
|
|
|
|
The max-retries modal implementation is **95% complete**. The modal appears correctly, but the retry button functionality has one remaining bug.
|
|
|
|
## ✅ What Works
|
|
|
|
1. **Modal Appears Correctly**
|
|
- Agent hits max retries at any level
|
|
- `paused_for_user_action` status is emitted from SSH proxy
|
|
- DO detects the status and emits `user_action_required` event
|
|
- Frontend displays the modal with three options: Stop, Intervene, Continue
|
|
|
|
2. **Agent Flow**
|
|
- Successfully completes Level 0
|
|
- Advances to Level 1 automatically
|
|
- Hits max retries on Level 1 (as expected - the password file has a special character)
|
|
- Pauses and shows modal
|
|
|
|
3. **UI/UX**
|
|
- Terminal shows all commands and output
|
|
- Chat panel shows thinking messages
|
|
- Token count and cost tracking working
|
|
- Modal message is clear and actionable
|
|
|
|
## ❌ What's Broken
|
|
|
|
### The `/retry` Endpoint Returns 400
|
|
|
|
**Symptom:**
|
|
- When user clicks "Continue" in the modal, the frontend makes a POST to `/api/agent/run-{id}/retry`
|
|
- The DO's `retryLevel()` method returns `400: "No paused run to resume"`
|
|
|
|
**Root Cause:**
|
|
The `run_complete` event from the SSH proxy is setting `this.state.status` back to `'complete'` even though we added protection in `updateStateFromEvent`. The issue is timing:
|
|
|
|
1. SSH proxy emits `paused_for_user_action` → DO sets `status = 'paused'`
|
|
2. SSH proxy ends the graph → emits `run_complete`
|
|
3. DO receives `run_complete` → `updateStateFromEvent` runs
|
|
4. Even though we check `if (this.state.status !== 'paused')`, something is still overriding it
|
|
|
|
**Code Context:**
|
|
|
|
```typescript:bandit-runner-app/workers/bandit-agent-do/src/index.ts
|
|
// In retryLevel():
|
|
if (!this.state) {
|
|
return new Response(JSON.stringify({ error: "No active run" }), {
|
|
status: 400,
|
|
})
|
|
}
|
|
// This check passes, but then something happens that makes the retry fail
|
|
```
|
|
|
|
## Files Modified (Complete List)
|
|
|
|
### SSH Proxy
|
|
1. `ssh-proxy/agent.ts`
|
|
- Added `'paused_for_user_action'` to status type
|
|
- Modified `validateResult` to return `paused_for_user_action` instead of `failed` on max retries
|
|
- Modified `shouldContinue` to handle `paused_for_user_action`
|
|
- Modified `run` method to accept `initialState` parameter for rehydration
|
|
|
|
2. `ssh-proxy/server.ts`
|
|
- Modified `/agent/run` endpoint to accept `initialState` in request body
|
|
- Pass `initialState` to `agent.run()`
|
|
|
|
### Frontend (bandit-runner-app)
|
|
1. `src/lib/agents/bandit-state.ts`
|
|
- Added `'paused_for_user_action'` to status type
|
|
|
|
2. `src/app/api/agent/[runId]/retry/route.ts`
|
|
- **NEW FILE**: Created route handler for retry endpoint
|
|
|
|
3. `src/components/terminal-chat-interface.tsx`
|
|
- Reverted visual styling to match original design
|
|
|
|
### Durable Object
|
|
1. `workers/bandit-agent-do/src/index.ts`
|
|
- Added `'paused_for_user_action'` to BanditAgentState status type
|
|
- Added `initialState?: Partial<BanditAgentState>` to RunConfig interface
|
|
- Modified `startRun` to persist full state after initialization
|
|
- Modified `runAgentViaProxy` to pass `initialState` in request body
|
|
- Added explicit detection for `paused_for_user_action` in event stream loop
|
|
- Modified `updateStateFromEvent` to not override `'paused'` status on `run_complete` or `error` events
|
|
- Modified `retryLevel` to include `initialState` in RunConfig
|
|
- Modified `resumeRun` to include `initialState` in RunConfig
|
|
- Fixed `handlePost` to correctly handle endpoints with/without request bodies
|
|
|
|
## Next Steps to Fix
|
|
|
|
### Option 1: Add a "retry pending" flag
|
|
Add a flag that prevents status changes after retry is clicked:
|
|
|
|
```typescript
|
|
private retryPending: boolean = false
|
|
|
|
// In retryLevel():
|
|
this.retryPending = true
|
|
this.state.status = 'planning'
|
|
// ... rest of retry logic
|
|
|
|
// In updateStateFromEvent():
|
|
if (this.retryPending) return // Don't update state during retry transition
|
|
```
|
|
|
|
### Option 2: Check for `initialState` presence instead of status
|
|
Modify `retryLevel` to not check status at all, just check if state exists:
|
|
|
|
```typescript
|
|
private async retryLevel(): Promise<Response> {
|
|
if (!this.state || !this.state.runId) {
|
|
return new Response(JSON.stringify({ error: "No active run" }), {
|
|
status: 400,
|
|
})
|
|
}
|
|
// Don't check status - just proceed with retry
|
|
this.state.retryCount = 0
|
|
this.state.status = 'planning'
|
|
//... rest
|
|
}
|
|
```
|
|
|
|
### Option 3: Use a separate "retryable" field
|
|
Add a field to track if retry is allowed:
|
|
|
|
```typescript
|
|
interface BanditAgentState {
|
|
// ... existing fields
|
|
retryable: boolean // Set to true when max retries hit
|
|
}
|
|
|
|
// In retryLevel():
|
|
if (!this.state || !this.state.retryable) {
|
|
return new Response(JSON.stringify({ error: "No retryable run" }), {
|
|
status: 400,
|
|
})
|
|
}
|
|
```
|
|
|
|
## Test Results
|
|
|
|
### Successful Test Flow
|
|
1. ✅ Start run with GPT-4o-mini
|
|
2. ✅ Agent completes Level 0 (finds password in readme)
|
|
3. ✅ Agent advances to Level 1
|
|
4. ✅ Agent tries multiple commands: `cat ./-`, `cat < -`, `cat -`
|
|
5. ✅ Max retries reached after 3 failed attempts
|
|
6. ✅ Modal appears with correct message
|
|
7. ❌ Click "Continue" → 400 error
|
|
|
|
### Modal Content (Verified Correct)
|
|
```
|
|
Max Retries Reached
|
|
|
|
The agent has reached the maximum retry limit (3) for Level 1.
|
|
|
|
Max retries reached for level 1
|
|
|
|
What would you like to do?
|
|
• Stop: End the run completely
|
|
• Intervene: Enable manual mode to help the agent
|
|
• Continue: Reset retry count and let the agent try again
|
|
|
|
[Stop] [Intervene] [Continue]
|
|
```
|
|
|
|
## Deployment Status
|
|
|
|
All changes have been deployed:
|
|
- ✅ SSH Proxy deployed to Fly.io
|
|
- ✅ Main app deployed to Cloudflare Workers
|
|
- ✅ Durable Object worker deployed separately
|
|
- ✅ `/retry` route exists and routes correctly to DO
|
|
|
|
## Recommendation
|
|
|
|
Implement **Option 2** (remove status check) as the quickest fix. The presence of `this.state` with a valid `runId` is sufficient validation. The status will be set to `'planning'` immediately anyway, so checking for `'paused'` status is unnecessary and causes the race condition.
|
|
|