bandit-runner/FIXES-NEEDED.md

# Critical Fixes Needed

## Issues Identified from Testing

### 1. Max Retries Modal Not Appearing

**Problem**: The modal doesn't show when max retries are hit, even though the error appears in logs.

**Root Causes**:
1. The `onUserActionRequired` callback registration has a dependency issue - it runs once on mount but doesn't properly persist
2. The Durable Object emits the event but the frontend WebSocket handler might not be invoking the callback
3. The modal state (`showMaxRetriesDialog`) might not be triggering due to React rendering issues

**Fixes Required**:
- Fix the callback registration in `useEffect` to not depend on `onUserActionRequired`
- Add console logging in the callback to verify it's being called
- Ensure the modal is properly mounted and not blocked by other UI elements
- Test with a simpler direct state setter instead of callback pattern

### 2. Terminal Panel Visual Hierarchy

**Current Issues**:
- Commands (`$ cat readme`) blend with output
- `[TOOL]` system messages are cyan but don't stand out enough
- No clear separation between command execution blocks
- Timestamps are small and hard to read
- ANSI codes are preserved but overall readability is poor

**Improvements Needed**:
- **Commands**: Make input lines more prominent with brighter color, maybe add `>` prefix
- **Output**: Slightly dimmed compared to commands
- **System messages**: Different background or border to separate from regular output
- **Spacing**: Add subtle separators between command blocks
- **Typography**: Slightly larger monospace font, better line height

### 3. Agent Panel Visual Hierarchy

**Current Issues**:
- Status badges blend together
- THINKING / AGENT / USER labels all look similar
- No clear distinction between message types
- Dense text makes it hard to scan

**Improvements Needed**:
- **THINKING messages**: Use collapsible UI (shadcn Collapsible) for long reasoning
- **Message types**: Stronger color differentiation (blue for thinking, green for agent, yellow for user)
- **Spacing**: More padding between messages
- **Status indicators**: Level complete events should be more prominent
- **Timestamps**: Slightly larger and better positioned

## Implementation Plan

### Phase 1: Fix Max Retries Modal (Critical)

1. **Update `terminal-chat-interface.tsx`**:
   ```typescript
   // Remove dependency on onUserActionRequired in useEffect
   useEffect(() => {
     onUserActionRequired((data) => {
       console.log('🚨 USER ACTION REQUIRED:', data) // Debug log
       if (data.reason === 'max_retries') {
         setMaxRetriesData({
           level: data.level,
           retryCount: data.retryCount,
           maxRetries: data.maxRetries,
           message: data.message,
         })
         setShowMaxRetriesDialog(true)
       }
     })
   }, []) // Empty dependency array
   ```

2. **Add debug logging** in `useAgentWebSocket.ts`:
   ```typescript
   if (agentEvent.type === 'user_action_required' && userActionCallbackRef.current) {
     console.log('📣 Calling user action callback with:', agentEvent.data)
     userActionCallbackRef.current(agentEvent.data)
   }
   ```

3. **Verify DO emission** - add logging in `BanditAgentDO.ts`:
   ```typescript
   console.log('🚨 Emitting user_action_required event:', {
     reason: 'max_retries',
     level,
     retryCount: this.state.retryCount,
     maxRetries: this.state.maxRetries,
   })
   this.broadcast({...})
   ```

### Phase 2: Improve Terminal Visual Hierarchy

1. **Update terminal line rendering** in `terminal-chat-interface.tsx`:
   ```tsx
   // Add stronger visual distinction
   <div className={cn(
     "font-mono text-sm py-1 px-2",
     line.type === "input" && "text-cyan-400 font-bold bg-cyan-950/20 border-l-2 border-cyan-500",
     line.type === "output" && "text-zinc-300 pl-4",
     line.type === "system" && "text-purple-400 bg-purple-950/20 border-l-2 border-purple-500",
     line.type === "error" && "text-red-400 bg-red-950/20 border-l-2 border-red-500"
   )}>
   ```

2. **Add command block separators**:
   ```tsx
   {line.command && idx > 0 && (
     <div className="h-px bg-border/30 my-1" />
   )}
   ```

3. **Improve typography**:
   ```css
   .terminal-output {
     font-family: 'JetBrains Mono', 'Fira Code', monospace;
     font-size: 13px;
     line-height: 1.6;
   }
   ```

### Phase 3: Improve Agent Panel Visual Hierarchy

1. **Use Collapsible for thinking messages**:
   ```tsx
   {msg.type === 'thinking' && (
     <Collapsible>
       <CollapsibleTrigger className="flex items-center gap-2 text-blue-400">
         <ChevronRight className="h-3 w-3" />
         THINKING
       </CollapsibleTrigger>
       <CollapsibleContent className="pl-4 text-blue-300/80">
         {msg.content}
       </CollapsibleContent>
     </Collapsible>
   )}
   ```

2. **Stronger message type colors**:
   ```tsx
   msg.type === "thinking" && "border-blue-500 bg-blue-950/20"
   msg.type === "agent" && "border-green-500 bg-green-950/20"
   msg.type === "user" && "border-yellow-500 bg-yellow-950/20"
   ```

3. **Add spacing and padding**:
   ```tsx
   <div className="space-y-3"> {/* was space-y-1 */}
     <div className="p-3 rounded border"> {/* add padding and border */}
   ```

## Testing Checklist

- [ ] Start a run with GPT-4o Mini
- [ ] Wait for Level 1 max retries (should hit after 3 attempts)
- [ ] Verify console shows "🚨 USER ACTION REQUIRED" log
- [ ] Verify modal appears with Stop/Intervene/Continue buttons
- [ ] Test Continue button → verify retry count resets and agent resumes
- [ ] Check terminal readability - commands should be clearly distinct from output
- [ ] Check agent panel - thinking messages should be collapsible and color-coded
- [ ] Verify token/cost tracking still works

## Priority

1. **Critical**: Fix max retries modal (blocks core functionality)
2. **High**: Improve terminal hierarchy (UX severely impacted)
3. **Medium**: Improve agent panel hierarchy (nice to have, less critical)