bandit-runner/FIXES-NEEDED.md
2025-10-13 10:21:50 -06:00

5.8 KiB

Critical Fixes Needed

Issues Identified from Testing

1. Max Retries Modal Not Appearing

Problem: The modal doesn't show when max retries are hit, even though the error appears in logs.

Root Causes:

  1. The onUserActionRequired callback registration has a dependency issue - it runs once on mount but doesn't properly persist
  2. The Durable Object emits the event but the frontend WebSocket handler might not be invoking the callback
  3. The modal state (showMaxRetriesDialog) might not be triggering due to React rendering issues

Fixes Required:

  • Fix the callback registration in useEffect to not depend on onUserActionRequired
  • Add console logging in the callback to verify it's being called
  • Ensure the modal is properly mounted and not blocked by other UI elements
  • Test with a simpler direct state setter instead of callback pattern

2. Terminal Panel Visual Hierarchy

Current Issues:

  • Commands ($ cat readme) blend with output
  • [TOOL] system messages are cyan but don't stand out enough
  • No clear separation between command execution blocks
  • Timestamps are small and hard to read
  • ANSI codes are preserved but overall readability is poor

Improvements Needed:

  • Commands: Make input lines more prominent with brighter color, maybe add > prefix
  • Output: Slightly dimmed compared to commands
  • System messages: Different background or border to separate from regular output
  • Spacing: Add subtle separators between command blocks
  • Typography: Slightly larger monospace font, better line height

3. Agent Panel Visual Hierarchy

Current Issues:

  • Status badges blend together
  • THINKING / AGENT / USER labels all look similar
  • No clear distinction between message types
  • Dense text makes it hard to scan

Improvements Needed:

  • THINKING messages: Use collapsible UI (shadcn Collapsible) for long reasoning
  • Message types: Stronger color differentiation (blue for thinking, green for agent, yellow for user)
  • Spacing: More padding between messages
  • Status indicators: Level complete events should be more prominent
  • Timestamps: Slightly larger and better positioned

Implementation Plan

Phase 1: Fix Max Retries Modal (Critical)

  1. Update terminal-chat-interface.tsx:

    // Remove dependency on onUserActionRequired in useEffect
    useEffect(() => {
      onUserActionRequired((data) => {
        console.log('🚨 USER ACTION REQUIRED:', data) // Debug log
        if (data.reason === 'max_retries') {
          setMaxRetriesData({
            level: data.level,
            retryCount: data.retryCount,
            maxRetries: data.maxRetries,
            message: data.message,
          })
          setShowMaxRetriesDialog(true)
        }
      })
    }, []) // Empty dependency array
    
  2. Add debug logging in useAgentWebSocket.ts:

    if (agentEvent.type === 'user_action_required' && userActionCallbackRef.current) {
      console.log('📣 Calling user action callback with:', agentEvent.data)
      userActionCallbackRef.current(agentEvent.data)
    }
    
  3. Verify DO emission - add logging in BanditAgentDO.ts:

    console.log('🚨 Emitting user_action_required event:', {
      reason: 'max_retries',
      level,
      retryCount: this.state.retryCount,
      maxRetries: this.state.maxRetries,
    })
    this.broadcast({...})
    

Phase 2: Improve Terminal Visual Hierarchy

  1. Update terminal line rendering in terminal-chat-interface.tsx:

    // Add stronger visual distinction
    <div className={cn(
      "font-mono text-sm py-1 px-2",
      line.type === "input" && "text-cyan-400 font-bold bg-cyan-950/20 border-l-2 border-cyan-500",
      line.type === "output" && "text-zinc-300 pl-4",
      line.type === "system" && "text-purple-400 bg-purple-950/20 border-l-2 border-purple-500",
      line.type === "error" && "text-red-400 bg-red-950/20 border-l-2 border-red-500"
    )}>
    
  2. Add command block separators:

    {line.command && idx > 0 && (
      <div className="h-px bg-border/30 my-1" />
    )}
    
  3. Improve typography:

    .terminal-output {
      font-family: 'JetBrains Mono', 'Fira Code', monospace;
      font-size: 13px;
      line-height: 1.6;
    }
    

Phase 3: Improve Agent Panel Visual Hierarchy

  1. Use Collapsible for thinking messages:

    {msg.type === 'thinking' && (
      <Collapsible>
        <CollapsibleTrigger className="flex items-center gap-2 text-blue-400">
          <ChevronRight className="h-3 w-3" />
          THINKING
        </CollapsibleTrigger>
        <CollapsibleContent className="pl-4 text-blue-300/80">
          {msg.content}
        </CollapsibleContent>
      </Collapsible>
    )}
    
  2. Stronger message type colors:

    msg.type === "thinking" && "border-blue-500 bg-blue-950/20"
    msg.type === "agent" && "border-green-500 bg-green-950/20"
    msg.type === "user" && "border-yellow-500 bg-yellow-950/20"
    
  3. Add spacing and padding:

    <div className="space-y-3"> {/* was space-y-1 */}
      <div className="p-3 rounded border"> {/* add padding and border */}
    

Testing Checklist

  • Start a run with GPT-4o Mini
  • Wait for Level 1 max retries (should hit after 3 attempts)
  • Verify console shows "🚨 USER ACTION REQUIRED" log
  • Verify modal appears with Stop/Intervene/Continue buttons
  • Test Continue button → verify retry count resets and agent resumes
  • Check terminal readability - commands should be clearly distinct from output
  • Check agent panel - thinking messages should be collapsible and color-coded
  • Verify token/cost tracking still works

Priority

  1. Critical: Fix max retries modal (blocks core functionality)
  2. High: Improve terminal hierarchy (UX severely impacted)
  3. Medium: Improve agent panel hierarchy (nice to have, less critical)