# Max-Retries Modal - Root Cause Analysis ## Test Results **Status**: ❌ Modal does NOT appear **Error Seen**: "ERROR: Max retries reached for level 0" (in terminal and chat) **Modal Shown**: NO ## Root Cause The `user_action_required` event is **never emitted** from the Durable Object. ### Why? Looking at `BanditAgentDO.ts`: ```typescript private updateStateFromEvent(event: AgentEvent) { if (!this.state) return switch (event.type) { case 'error': const errorContent = event.data.content || '' if (errorContent.includes('Max retries')) { // Emit user_action_required event this.broadcast({ type: 'user_action_required', data: { ... } }) } } } ``` **The Problem**: `updateStateFromEvent()` is only called when processing events FROM the SSH proxy. But by the time we see the `error` event here, the proxy has already ended its stream with `run_complete`. The `error` event from the proxy goes: 1. SSH Proxy emits `error: Max retries...` 2. DO receives it via `runAgentViaProxy()` stream 3. DO calls `updateStateFromEvent(event)` 4. DO tries to `broadcast()` the `user_action_required` 5. **BUT** - we're inside the proxy stream handler, and immediately after this the proxy sends `run_complete` and ends the stream 6. The frontend never gets the `user_action_required` because it's racing with `run_complete` ## The Real Fix We need to **pause BEFORE emitting the final error**, not after. ### Option 1: Fix in SSH Proxy (Recommended) In `ssh-proxy/agent.ts`, when `validateResult` hits max retries, instead of returning status `'failed'`, return status `'paused_for_user_action'`: ```typescript // In validateResult() if (state.retryCount >= state.maxRetries) { return { status: 'paused_for_user_action' as const, // New status error: `Max retries reached for level ${state.currentLevel}`, } } ``` Then in the graph conditional routing: ```typescript function shouldContinue(state: BanditAgentState): string { if (state.status === 'paused_for_user_action') { return END // Stop graph execution } // ... rest of routing } ``` And in the DO, when we see this status, emit the user action event: ```typescript case 'node_update': if (nodeOutput.status === 'paused_for_user_action') { this.broadcast({ type: 'user_action_required', data: { reason: 'max_retries', level: this.state.currentLevel, // ... } }) this.state.status = 'paused' } ``` ### Option 2: Fix in DO (Simpler but less clean) Before broadcasting the error event, check if it's a max-retries error and emit `user_action_required` FIRST: ```typescript // In runAgentViaProxy(), when processing events: if (agentEvent.type === 'error' && agentEvent.data.content?.includes('Max retries')) { // Emit user_action_required FIRST this.broadcast({ type: 'user_action_required', data: { ... } }) this.state.status = 'paused' await this.storage.saveState(this.state) } // Then broadcast the error normally this.broadcast(agentEvent) ``` ## Why Current Code Doesn't Work The current code tries to detect the error in `updateStateFromEvent()` which is called too late in the event processing pipeline. By the time we try to emit `user_action_required`, the proxy stream has already ended and the frontend has moved on to `run_complete`. ## Recommended Fix **Option 1** is cleaner because it makes the agent's state machine explicit about needing user action. This also prevents the `run_complete` event from firing prematurely. ## Testing Plan 1. Implement Option 1 in `ssh-proxy/agent.ts` 2. Add new status to type definitions 3. Update DO to recognize this status and emit event 4. Test with GPT-4o Mini, wait for Level 1 max retries 5. Verify logs show: - Agent graph ends with `paused_for_user_action` - DO emits `user_action_required` - Frontend receives event and shows modal 6. Test Continue button → retry count resets, agent resumes ## Files to Modify 1. `ssh-proxy/agent.ts`: - Update `BanditState` annotation to include `paused_for_user_action` status - Modify `validateResult` to return this status instead of `'failed'` - Update `shouldContinue` routing 2. `bandit-runner-app/src/lib/agents/bandit-state.ts`: - Add `'paused_for_user_action'` to status union type 3. `bandit-runner-app/src/lib/durable-objects/BanditAgentDO.ts`: - In `runAgentViaProxy()`, detect `paused_for_user_action` status - Emit `user_action_required` when detected - Remove detection from `updateStateFromEvent()` (it's too late)