bandit-runner/FIXES-NEEDED.md
2025-10-13 10:21:50 -06:00

170 lines
5.8 KiB
Markdown

# Critical Fixes Needed
## Issues Identified from Testing
### 1. Max Retries Modal Not Appearing
**Problem**: The modal doesn't show when max retries are hit, even though the error appears in logs.
**Root Causes**:
1. The `onUserActionRequired` callback registration has a dependency issue - it runs once on mount but doesn't properly persist
2. The Durable Object emits the event but the frontend WebSocket handler might not be invoking the callback
3. The modal state (`showMaxRetriesDialog`) might not be triggering due to React rendering issues
**Fixes Required**:
- Fix the callback registration in `useEffect` to not depend on `onUserActionRequired`
- Add console logging in the callback to verify it's being called
- Ensure the modal is properly mounted and not blocked by other UI elements
- Test with a simpler direct state setter instead of callback pattern
### 2. Terminal Panel Visual Hierarchy
**Current Issues**:
- Commands (`$ cat readme`) blend with output
- `[TOOL]` system messages are cyan but don't stand out enough
- No clear separation between command execution blocks
- Timestamps are small and hard to read
- ANSI codes are preserved but overall readability is poor
**Improvements Needed**:
- **Commands**: Make input lines more prominent with brighter color, maybe add `>` prefix
- **Output**: Slightly dimmed compared to commands
- **System messages**: Different background or border to separate from regular output
- **Spacing**: Add subtle separators between command blocks
- **Typography**: Slightly larger monospace font, better line height
### 3. Agent Panel Visual Hierarchy
**Current Issues**:
- Status badges blend together
- THINKING / AGENT / USER labels all look similar
- No clear distinction between message types
- Dense text makes it hard to scan
**Improvements Needed**:
- **THINKING messages**: Use collapsible UI (shadcn Collapsible) for long reasoning
- **Message types**: Stronger color differentiation (blue for thinking, green for agent, yellow for user)
- **Spacing**: More padding between messages
- **Status indicators**: Level complete events should be more prominent
- **Timestamps**: Slightly larger and better positioned
## Implementation Plan
### Phase 1: Fix Max Retries Modal (Critical)
1. **Update `terminal-chat-interface.tsx`**:
```typescript
// Remove dependency on onUserActionRequired in useEffect
useEffect(() => {
onUserActionRequired((data) => {
console.log('🚨 USER ACTION REQUIRED:', data) // Debug log
if (data.reason === 'max_retries') {
setMaxRetriesData({
level: data.level,
retryCount: data.retryCount,
maxRetries: data.maxRetries,
message: data.message,
})
setShowMaxRetriesDialog(true)
}
})
}, []) // Empty dependency array
```
2. **Add debug logging** in `useAgentWebSocket.ts`:
```typescript
if (agentEvent.type === 'user_action_required' && userActionCallbackRef.current) {
console.log('📣 Calling user action callback with:', agentEvent.data)
userActionCallbackRef.current(agentEvent.data)
}
```
3. **Verify DO emission** - add logging in `BanditAgentDO.ts`:
```typescript
console.log('🚨 Emitting user_action_required event:', {
reason: 'max_retries',
level,
retryCount: this.state.retryCount,
maxRetries: this.state.maxRetries,
})
this.broadcast({...})
```
### Phase 2: Improve Terminal Visual Hierarchy
1. **Update terminal line rendering** in `terminal-chat-interface.tsx`:
```tsx
// Add stronger visual distinction
<div className={cn(
"font-mono text-sm py-1 px-2",
line.type === "input" && "text-cyan-400 font-bold bg-cyan-950/20 border-l-2 border-cyan-500",
line.type === "output" && "text-zinc-300 pl-4",
line.type === "system" && "text-purple-400 bg-purple-950/20 border-l-2 border-purple-500",
line.type === "error" && "text-red-400 bg-red-950/20 border-l-2 border-red-500"
)}>
```
2. **Add command block separators**:
```tsx
{line.command && idx > 0 && (
<div className="h-px bg-border/30 my-1" />
)}
```
3. **Improve typography**:
```css
.terminal-output {
font-family: 'JetBrains Mono', 'Fira Code', monospace;
font-size: 13px;
line-height: 1.6;
}
```
### Phase 3: Improve Agent Panel Visual Hierarchy
1. **Use Collapsible for thinking messages**:
```tsx
{msg.type === 'thinking' && (
<Collapsible>
<CollapsibleTrigger className="flex items-center gap-2 text-blue-400">
<ChevronRight className="h-3 w-3" />
THINKING
</CollapsibleTrigger>
<CollapsibleContent className="pl-4 text-blue-300/80">
{msg.content}
</CollapsibleContent>
</Collapsible>
)}
```
2. **Stronger message type colors**:
```tsx
msg.type === "thinking" && "border-blue-500 bg-blue-950/20"
msg.type === "agent" && "border-green-500 bg-green-950/20"
msg.type === "user" && "border-yellow-500 bg-yellow-950/20"
```
3. **Add spacing and padding**:
```tsx
<div className="space-y-3"> {/* was space-y-1 */}
<div className="p-3 rounded border"> {/* add padding and border */}
```
## Testing Checklist
- [ ] Start a run with GPT-4o Mini
- [ ] Wait for Level 1 max retries (should hit after 3 attempts)
- [ ] Verify console shows "🚨 USER ACTION REQUIRED" log
- [ ] Verify modal appears with Stop/Intervene/Continue buttons
- [ ] Test Continue button → verify retry count resets and agent resumes
- [ ] Check terminal readability - commands should be clearly distinct from output
- [ ] Check agent panel - thinking messages should be collapsible and color-coded
- [ ] Verify token/cost tracking still works
## Priority
1. **Critical**: Fix max retries modal (blocks core functionality)
2. **High**: Improve terminal hierarchy (UX severely impacted)
3. **Medium**: Improve agent panel hierarchy (nice to have, less critical)