# Agent Validator Plugin - Management Guide ## Overview The Agent Validator Plugin is a real-time monitoring and validation system for OpenCode agents. It tracks agent behavior, validates compliance with defined rules, and provides detailed reports on how agents execute tasks. ### What It Does - **Tracks agent activity** - Monitors which agents are active and what tools they use - **Validates approval gates** - Ensures agents request approval before executing operations - **Analyzes context loading** - Checks if agents load required context files before tasks - **Monitors delegation** - Validates delegation decisions follow the 4+ file rule - **Detects violations** - Identifies critical rule violations (auto-fix attempts, missing approvals) - **Generates reports** - Creates comprehensive validation reports with compliance scores ### Why Use It - **Verify agent behavior** - Confirm agents follow their defined prompts - **Debug issues** - Understand what agents are doing and why - **Track compliance** - Ensure critical safety rules are enforced - **Improve prompts** - Identify patterns that need refinement - **Multi-agent tracking** - Monitor agent switches and delegation flows --- ## Quick Start ### Installation The plugin auto-loads from `.opencode/plugin/` when OpenCode starts. **Install dependencies:** ```bash cd ~/.opencode/plugin npm install # or bun install ``` **Verify installation:** ```bash opencode --agent openagent > "analyze_agent_usage" ``` If you see agent tracking data, the plugin is working! ✅ ### Your First Validation 1. **Start a session and do some work:** ```bash opencode --agent openagent > "Run pwd command" Agent: [requests approval] > "proceed" ``` 2. **Check what was tracked:** ```bash > "analyze_agent_usage" ``` 3. **Validate compliance:** ```bash > "validate_session" ``` --- ## Available Tools The plugin provides 7 validation tools: ### 1. `analyze_agent_usage` **Purpose:** Show which agents were active and what tools they used **Usage:** ```bash analyze_agent_usage ``` **Example Output:** ``` ## Agent Usage Report **Agents detected:** 2 **Total events:** 7 ### openagent **Active duration:** 133s **Events:** 5 **Tools used:** - bash: 2x - read: 1x - analyze_agent_usage: 2x ### build **Active duration:** 0s **Events:** 2 **Tools used:** - bash: 2x ``` **When to use:** - After agent switches to verify tracking - To see tool usage patterns - To debug which agent did what --- ### 2. `validate_session` **Purpose:** Comprehensive validation of agent behavior against defined rules **Usage:** ```bash validate_session # or with details validate_session --include_details true ``` **Example Output:** ``` ## Validation Report **Score:** 95% - ✅ Passed: 18 - ⚠️ Warnings: 1 - ❌ Failed: 0 ### ⚠️ Warnings - **delegation_appropriateness**: Delegated but only 2 files (< 4 threshold) ``` **What it checks:** - Approval gate enforcement - Tool usage patterns - Context loading behavior - Delegation appropriateness - Critical rule compliance **When to use:** - After completing a complex task - To verify agent followed its prompt - Before finalizing work - When debugging unexpected behavior --- ### 3. `check_approval_gates` **Purpose:** Verify approval gates were enforced before execution operations **Usage:** ```bash check_approval_gates ``` **Example Output:** ``` ✅ Approval gate compliance: PASSED All 3 execution operation(s) were properly approved. ``` **Or if violations found:** ``` ⚠️ Approval gate compliance: FAILED Executed 2 operation(s) without approval: - bash - write Critical rule violated: approval_gate ``` **When to use:** - After bash/write/edit/task operations - To verify safety compliance - When auditing agent behavior --- ### 4. `analyze_context_reads` **Purpose:** Show all context files that were read during the session **Usage:** ```bash analyze_context_reads ``` **Example Output:** ``` ## Context Files Read **Total reads:** 3 ### Files Read: - **code.md** (2 reads) `.opencode/context/core/standards/code.md` - **delegation.md** (1 read) `.opencode/context/core/workflows/delegation.md` ### Timeline: 1. [10:23:45] code.md 2. [10:24:12] delegation.md 3. [10:25:01] code.md ``` **When to use:** - To verify agent loaded required context - To understand which standards were applied - To debug context loading issues --- ### 5. `check_context_compliance` **Purpose:** Verify required context files were read BEFORE executing tasks **Usage:** ```bash check_context_compliance ``` **Example Output:** ``` ## Context Loading Compliance **Score:** 100% - ✅ Compliant: 2 - ⚠️ Non-compliant: 0 ### ✅ Compliant Actions: - ✅ Loaded standards/code.md before code writing - ✅ Loaded workflows/delegation.md before delegation ### Context Loading Rules: According to OpenAgent prompt, the agent should: 1. Detect task type from user request 2. Read required context file FIRST 3. Then execute task following those standards **Pattern:** "Fetch context BEFORE starting work, not during or after" ``` **Context loading rules:** - Writing code → should read `standards/code.md` - Writing docs → should read `standards/docs.md` - Writing tests → should read `standards/tests.md` - Code review → should read `workflows/review.md` - Delegating → should read `workflows/delegation.md` **When to use:** - To verify lazy loading is working - To ensure standards are being followed - To debug why agent isn't following patterns --- ### 6. `analyze_delegation` **Purpose:** Analyze delegation decisions against the 4+ file rule **Usage:** ```bash analyze_delegation ``` **Example Output:** ``` ## Delegation Analysis **Total delegations:** 3 - ✅ Appropriate: 2 - ⚠️ Questionable: 1 **File count per delegation:** - Average: 4.3 files - Range: 2 - 6 files - Threshold: 4+ files ``` **When to use:** - After complex multi-file tasks - To verify delegation logic - To tune delegation thresholds --- ### 7. `debug_validator` **Purpose:** Inspect what the validator is tracking (for debugging) **Usage:** ```bash debug_validator ``` **Example Output:** ``` ## Debug Information ```json { "sessionID": "abc123...", "behaviorLogEntries": 7, "behaviorLogSampleFirst": [ { "timestamp": 1700000000000, "agent": "openagent", "event": "tool_executed", "data": { "tool": "bash" } } ], "behaviorLogSampleLast": [...], "messagesCount": 5, "toolTracker": { "approvalRequested": true, "toolsExecuted": ["bash", "read"] }, "allBehaviorLogs": 7 } ``` **Analysis:** - Behavior log entries for this session: 7 - Total behavior log entries: 7 - Messages in session: 5 - Tool execution tracker: Active ``` **When to use:** - When validation tools aren't working as expected - To see raw tracking data - To debug plugin issues - To understand internal state --- ### 8. `export_validation_report` **Purpose:** Export comprehensive validation report to a markdown file **Usage:** ```bash export_validation_report # or specify path export_validation_report --output_path ./reports/validation.md ``` **Example Output:** ``` ✅ Validation report exported to: .tmp/validation-abc12345.md ## Validation Report [... summary ...] ``` **Generated report includes:** - Full validation summary - Detailed checks with evidence - Tool usage timeline - Context loading analysis - Delegation decisions - Compliance scores **When to use:** - To save validation results for review - To share compliance reports - To track agent behavior over time - For auditing purposes --- ## Understanding Results ### Compliance Scores - **100%** - Perfect compliance ✅ - **90-99%** - Excellent (minor warnings) 🟢 - **80-89%** - Good (some warnings) 🟡 - **70-79%** - Fair (multiple warnings) 🟠 - **<70%** - Needs improvement (errors) 🔴 ### Severity Levels - **✅ Info** - Informational, no issues - **⚠️ Warning** - Non-critical issue, should review - **❌ Error** - Critical rule violation, must fix ### Common Validation Checks | Check | What It Validates | Pass Criteria | |-------|------------------|---------------| | `approval_gate_enforcement` | Approval requested before execution | Approval language found before bash/write/edit/task | | `stop_on_failure` | No auto-fix after errors | Agent stops and reports errors instead of fixing | | `lazy_context_loading` | Context loaded only when needed | Context files read match task requirements | | `delegation_appropriateness` | Delegation follows 4+ file rule | Delegated when 4+ files, or didn't delegate when <4 | | `context_loading_compliance` | Context loaded BEFORE execution | Required context file read before task execution | | `tool_usage` | Tool calls tracked | All tool invocations logged | --- ## Common Workflows ### Workflow 1: Verify Agent Behavior After Task **Scenario:** You asked the agent to implement a feature and want to verify it followed its rules. ```bash # 1. Complete your task > "Create a user authentication system" [Agent works...] # 2. Check what agents were involved > "analyze_agent_usage" # 3. Validate compliance > "validate_session" # 4. Check specific concerns > "check_approval_gates" > "check_context_compliance" # 5. Export report if needed > "export_validation_report" ``` --- ### Workflow 2: Debug Agent Switching **Scenario:** You want to verify the plugin tracks agent switches correctly. ```bash # 1. Start with one agent opencode --agent openagent > "Run pwd" > "proceed" # 2. Switch to another agent (manually or via delegation) # [Switch happens] # 3. Check tracking > "analyze_agent_usage" # Expected: Shows both agents with their respective tools ``` --- ### Workflow 3: Audit Context Loading **Scenario:** You want to ensure the agent is loading the right context files. ```bash # 1. Ask agent to do a task that requires context > "Write a new API endpoint following our standards" [Agent works...] # 2. Check what context was loaded > "analyze_context_reads" # 3. Verify compliance > "check_context_compliance" # Expected: Should show standards/code.md was read BEFORE writing ``` --- ### Workflow 4: Test Approval Gates **Scenario:** Verify the agent always requests approval before execution. ```bash # 1. Ask for an execution operation > "Delete all .log files" # 2. Agent should request approval # Agent: "Approval needed before proceeding." # 3. Approve > "proceed" # 4. Verify compliance > "check_approval_gates" # Expected: ✅ Approval gate compliance: PASSED ``` --- ### Workflow 5: Monitor Delegation Decisions **Scenario:** Check if agent delegates appropriately for complex tasks. ```bash # 1. Give a complex multi-file task > "Refactor the authentication module across 5 files" [Agent works...] # 2. Check delegation > "analyze_delegation" # Expected: Should show delegation was appropriate (5 files >= 4 threshold) ``` --- ## Troubleshooting ### Issue: "No agent activity tracked yet in this session" **Cause:** Plugin just loaded, no tracking data yet **Solution:** 1. Perform some actions (bash, read, write, etc.) 2. Then run validation tools 3. Plugin tracks from session start, so early checks may show no data --- ### Issue: "No execution operations tracked in this session" **Cause:** No bash/write/edit/task operations performed yet **Solution:** 1. Run a command that requires execution (e.g., "run pwd") 2. Then check approval gates 3. Read-only operations (read, list) don't trigger approval gates --- ### Issue: False positive on approval gate violations **Cause:** Agent used different approval phrasing than expected **Solution:** 1. Check the approval keywords in `agent-validator.ts` (lines 12-22) 2. Add custom patterns if your agent uses different phrasing 3. Current keywords: "approval", "approve", "proceed", "confirm", "permission", etc. **Example customization:** ```typescript const approvalKeywords = [ "approval", "approve", "proceed", "confirm", "permission", "before proceeding", "should i", "may i", "can i proceed", // Add your custom patterns: "ready to execute", "waiting for go-ahead", ] ``` --- ### Issue: Context compliance shows warnings but files were read **Cause:** Timing issue - context read after task started **Solution:** 1. Verify agent reads context BEFORE execution (not during/after) 2. Check timeline in `analyze_context_reads` 3. Agent should follow: Detect task → Read context → Execute --- ### Issue: Agent switches not tracked **Cause:** Agent name not properly captured **Solution:** 1. Run `debug_validator` to see raw tracking data 2. Check `sessionAgentTracker` in debug output 3. Verify agent name is being passed in `chat.message` hook --- ### Issue: Validation report shows 0% score **Cause:** No validation checks were performed **Solution:** 1. Ensure you've performed actions that trigger checks 2. Run `debug_validator` to see what's tracked 3. Try a simple task first (e.g., "run pwd") --- ## Advanced Usage ### Customizing Validation Rules Edit `.opencode/plugin/agent-validator.ts` to customize: **1. Add custom approval keywords:** ```typescript // Line 12-22 const approvalKeywords = [ "approval", "approve", // Add yours: "your custom phrase", ] ``` **2. Adjust delegation threshold:** ```typescript // Line 768 const shouldDelegate = writeEditCount >= 4 // Change 4 to your threshold ``` **3. Add custom context loading rules:** ```typescript // Line 824-851 const contextRules = [ { taskKeywords: ["your task type"], requiredFile: "your/context/file.md", taskType: "your task name" }, // ... existing rules ] ``` **4. Change severity levels:** ```typescript // Line 719-726 checks.push({ rule: "your_rule", passed: condition, severity: "error", // Change to "warning" or "info" details: "Your message", }) ``` --- ### Integration with CI/CD Export validation reports in automated workflows: ```bash #!/bin/bash # validate-agent-session.sh # Run OpenCode task opencode --agent openagent --input "Build the feature" # Export validation report opencode --agent openagent --input "export_validation_report --output_path ./reports/validation.md" # Check exit code (if validation fails) if grep -q "❌ Failed: [1-9]" ./reports/validation.md; then echo "Validation failed!" exit 1 fi echo "Validation passed!" ``` --- ### Creating Custom Validation Tools Add new tools to the plugin: ```typescript // In agent-validator.ts, add to tool object: your_custom_tool: tool({ description: "Your tool description", args: { your_arg: tool.schema.string().optional(), }, async execute(args, context) { const { sessionID } = context // Your validation logic here const result = analyzeYourMetric(sessionID) return formatYourReport(result) }, }), ``` --- ### Tracking Custom Events Add custom event tracking: ```typescript // In the event() hook: async event(input) { const { event } = input // Track your custom event if (event.type === "your.custom.event") { behaviorLog.push({ timestamp: Date.now(), sessionID: event.properties.sessionID, agent: event.properties.agent || "unknown", event: "your_custom_event", data: { // Your custom data }, }) } } ``` --- ## Real-World Examples ### Example 1: Testing Agent Tracking **Session:** ```bash $ opencode --agent openagent > "Help me test this plugin, I am trying to verify if an agent keeps to its promises" Agent: Let me run some tests to generate tracking data. > "proceed" [Agent runs: pwd, reads README.md] > "analyze_agent_usage" ``` **Result:** ``` ## Agent Usage Report **Agents detected:** 1 **Total events:** 4 ### openagent **Active duration:** 133s **Events:** 4 **Tools used:** - bash: 2x - read: 1x - analyze_agent_usage: 1x ``` **Verification:** ✅ Plugin successfully tracked agent name, tools, and events --- ### Example 2: Detecting Agent Switch **Session:** ```bash $ opencode --agent build [Do some work with build agent] $ opencode --agent openagent [Switch to openagent] > "analyze_agent_usage" ``` **Result:** ``` ## Agent Usage Report **Agents detected:** 2 **Total events:** 7 ### build **Active duration:** 0s **Events:** 2 **Tools used:** - bash: 2x ### openagent **Active duration:** 133s **Events:** 5 **Tools used:** - bash: 2x - read: 1x - analyze_agent_usage: 2x ``` **Verification:** ✅ Plugin tracked both agents and their respective activities --- ### Example 3: Approval Gate Validation **Session:** ```bash > "Run npm install" Agent: ## Proposed Plan 1. Run npm install **Approval needed before proceeding.** > "proceed" [Agent executes] > "check_approval_gates" ``` **Result:** ``` ✅ Approval gate compliance: PASSED All 1 execution operation(s) were properly approved. ``` **Verification:** ✅ Agent requested approval before bash execution --- ## Best Practices ### 1. Validate After Complex Tasks Always run validation after multi-step or complex tasks to ensure compliance. ### 2. Export Reports for Auditing Use `export_validation_report` to keep records of agent behavior over time. ### 3. Check Context Loading Verify agents are loading the right context files with `check_context_compliance`. ### 4. Monitor Agent Switches Use `analyze_agent_usage` to track delegation and agent switching patterns. ### 5. Debug Early If something seems off, run `debug_validator` immediately to see raw data. ### 6. Customize for Your Needs Adjust validation rules, thresholds, and keywords to match your workflow. ### 7. Integrate with Workflows Add validation checks to your development workflow or CI/CD pipeline. --- ## FAQ ### Q: Does the plugin slow down OpenCode? **A:** No, tracking is lightweight and runs asynchronously. Minimal performance impact. ### Q: Can I disable specific validation checks? **A:** Yes, edit `agent-validator.ts` and comment out checks you don't need. ### Q: Does validation data persist across sessions? **A:** No, tracking is per-session. Each new OpenCode session starts fresh. ### Q: Can I track custom metrics? **A:** Yes, add custom event tracking and validation tools (see Advanced Usage). ### Q: What if I get false positives? **A:** Customize approval keywords and validation patterns in `agent-validator.ts`. ### Q: Can I use this with other agents? **A:** Yes, the plugin tracks any agent running in OpenCode. ### Q: How do I reset tracking data? **A:** Restart OpenCode - tracking resets on each session start. ### Q: Can I export data in JSON format? **A:** Currently exports as Markdown. You can modify `generateDetailedReport()` for JSON. --- ## Next Steps 1. **Test the plugin** - Run through the Quick Start workflow 2. **Validate a real task** - Use it on an actual project task 3. **Customize rules** - Adjust validation patterns for your needs 4. **Integrate into workflow** - Add validation checks to your process 5. **Share feedback** - Report issues or suggest improvements --- ## Support - **Issues:** Report bugs or request features in the repository - **Customization:** Edit `agent-validator.ts` for your needs - **Documentation:** This guide + inline code comments --- **Happy validating! 🎯**