4.5 KiB
Max-Retries Modal - Root Cause Analysis
Test Results
Status: ❌ Modal does NOT appear
Error Seen: "ERROR: Max retries reached for level 0" (in terminal and chat)
Modal Shown: NO
Root Cause
The user_action_required event is never emitted from the Durable Object.
Why?
Looking at BanditAgentDO.ts:
private updateStateFromEvent(event: AgentEvent) {
if (!this.state) return
switch (event.type) {
case 'error':
const errorContent = event.data.content || ''
if (errorContent.includes('Max retries')) {
// Emit user_action_required event
this.broadcast({
type: 'user_action_required',
data: { ... }
})
}
}
}
The Problem: updateStateFromEvent() is only called when processing events FROM the SSH proxy. But by the time we see the error event here, the proxy has already ended its stream with run_complete.
The error event from the proxy goes:
- SSH Proxy emits
error: Max retries... - DO receives it via
runAgentViaProxy()stream - DO calls
updateStateFromEvent(event) - DO tries to
broadcast()theuser_action_required - BUT - we're inside the proxy stream handler, and immediately after this the proxy sends
run_completeand ends the stream - The frontend never gets the
user_action_requiredbecause it's racing withrun_complete
The Real Fix
We need to pause BEFORE emitting the final error, not after.
Option 1: Fix in SSH Proxy (Recommended)
In ssh-proxy/agent.ts, when validateResult hits max retries, instead of returning status 'failed', return status 'paused_for_user_action':
// In validateResult()
if (state.retryCount >= state.maxRetries) {
return {
status: 'paused_for_user_action' as const, // New status
error: `Max retries reached for level ${state.currentLevel}`,
}
}
Then in the graph conditional routing:
function shouldContinue(state: BanditAgentState): string {
if (state.status === 'paused_for_user_action') {
return END // Stop graph execution
}
// ... rest of routing
}
And in the DO, when we see this status, emit the user action event:
case 'node_update':
if (nodeOutput.status === 'paused_for_user_action') {
this.broadcast({
type: 'user_action_required',
data: {
reason: 'max_retries',
level: this.state.currentLevel,
// ...
}
})
this.state.status = 'paused'
}
Option 2: Fix in DO (Simpler but less clean)
Before broadcasting the error event, check if it's a max-retries error and emit user_action_required FIRST:
// In runAgentViaProxy(), when processing events:
if (agentEvent.type === 'error' && agentEvent.data.content?.includes('Max retries')) {
// Emit user_action_required FIRST
this.broadcast({
type: 'user_action_required',
data: { ... }
})
this.state.status = 'paused'
await this.storage.saveState(this.state)
}
// Then broadcast the error normally
this.broadcast(agentEvent)
Why Current Code Doesn't Work
The current code tries to detect the error in updateStateFromEvent() which is called too late in the event processing pipeline. By the time we try to emit user_action_required, the proxy stream has already ended and the frontend has moved on to run_complete.
Recommended Fix
Option 1 is cleaner because it makes the agent's state machine explicit about needing user action. This also prevents the run_complete event from firing prematurely.
Testing Plan
- Implement Option 1 in
ssh-proxy/agent.ts - Add new status to type definitions
- Update DO to recognize this status and emit event
- Test with GPT-4o Mini, wait for Level 1 max retries
- Verify logs show:
- Agent graph ends with
paused_for_user_action - DO emits
user_action_required - Frontend receives event and shows modal
- Agent graph ends with
- Test Continue button → retry count resets, agent resumes
Files to Modify
-
ssh-proxy/agent.ts:- Update
BanditStateannotation to includepaused_for_user_actionstatus - Modify
validateResultto return this status instead of'failed' - Update
shouldContinuerouting
- Update
-
bandit-runner-app/src/lib/agents/bandit-state.ts:- Add
'paused_for_user_action'to status union type
- Add
-
bandit-runner-app/src/lib/durable-objects/BanditAgentDO.ts:- In
runAgentViaProxy(), detectpaused_for_user_actionstatus - Emit
user_action_requiredwhen detected - Remove detection from
updateStateFromEvent()(it's too late)
- In