united-tattoo/.opencode/plugin/docs/VALIDATOR_GUIDE.md
Nicholai f372ab56de chore: add project configuration and agent files
Add BMAD, Claude, Cursor, and OpenCode configuration directories along with AGENTS.md documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 04:31:56 -07:00

903 lines
19 KiB
Markdown

# Agent Validator Plugin - Management Guide
## Overview
The Agent Validator Plugin is a real-time monitoring and validation system for OpenCode agents. It tracks agent behavior, validates compliance with defined rules, and provides detailed reports on how agents execute tasks.
### What It Does
- **Tracks agent activity** - Monitors which agents are active and what tools they use
- **Validates approval gates** - Ensures agents request approval before executing operations
- **Analyzes context loading** - Checks if agents load required context files before tasks
- **Monitors delegation** - Validates delegation decisions follow the 4+ file rule
- **Detects violations** - Identifies critical rule violations (auto-fix attempts, missing approvals)
- **Generates reports** - Creates comprehensive validation reports with compliance scores
### Why Use It
- **Verify agent behavior** - Confirm agents follow their defined prompts
- **Debug issues** - Understand what agents are doing and why
- **Track compliance** - Ensure critical safety rules are enforced
- **Improve prompts** - Identify patterns that need refinement
- **Multi-agent tracking** - Monitor agent switches and delegation flows
---
## Quick Start
### Installation
The plugin auto-loads from `.opencode/plugin/` when OpenCode starts.
**Install dependencies:**
```bash
cd ~/.opencode/plugin
npm install
# or
bun install
```
**Verify installation:**
```bash
opencode --agent openagent
> "analyze_agent_usage"
```
If you see agent tracking data, the plugin is working! ✅
### Your First Validation
1. **Start a session and do some work:**
```bash
opencode --agent openagent
> "Run pwd command"
Agent: [requests approval]
> "proceed"
```
2. **Check what was tracked:**
```bash
> "analyze_agent_usage"
```
3. **Validate compliance:**
```bash
> "validate_session"
```
---
## Available Tools
The plugin provides 7 validation tools:
### 1. `analyze_agent_usage`
**Purpose:** Show which agents were active and what tools they used
**Usage:**
```bash
analyze_agent_usage
```
**Example Output:**
```
## Agent Usage Report
**Agents detected:** 2
**Total events:** 7
### openagent
**Active duration:** 133s
**Events:** 5
**Tools used:**
- bash: 2x
- read: 1x
- analyze_agent_usage: 2x
### build
**Active duration:** 0s
**Events:** 2
**Tools used:**
- bash: 2x
```
**When to use:**
- After agent switches to verify tracking
- To see tool usage patterns
- To debug which agent did what
---
### 2. `validate_session`
**Purpose:** Comprehensive validation of agent behavior against defined rules
**Usage:**
```bash
validate_session
# or with details
validate_session --include_details true
```
**Example Output:**
```
## Validation Report
**Score:** 95%
- ✅ Passed: 18
- ⚠️ Warnings: 1
- ❌ Failed: 0
### ⚠️ Warnings
- **delegation_appropriateness**: Delegated but only 2 files (< 4 threshold)
```
**What it checks:**
- Approval gate enforcement
- Tool usage patterns
- Context loading behavior
- Delegation appropriateness
- Critical rule compliance
**When to use:**
- After completing a complex task
- To verify agent followed its prompt
- Before finalizing work
- When debugging unexpected behavior
---
### 3. `check_approval_gates`
**Purpose:** Verify approval gates were enforced before execution operations
**Usage:**
```bash
check_approval_gates
```
**Example Output:**
```
✅ Approval gate compliance: PASSED
All 3 execution operation(s) were properly approved.
```
**Or if violations found:**
```
⚠️ Approval gate compliance: FAILED
Executed 2 operation(s) without approval:
- bash
- write
Critical rule violated: approval_gate
```
**When to use:**
- After bash/write/edit/task operations
- To verify safety compliance
- When auditing agent behavior
---
### 4. `analyze_context_reads`
**Purpose:** Show all context files that were read during the session
**Usage:**
```bash
analyze_context_reads
```
**Example Output:**
```
## Context Files Read
**Total reads:** 3
### Files Read:
- **code.md** (2 reads)
`.opencode/context/core/standards/code.md`
- **delegation.md** (1 read)
`.opencode/context/core/workflows/delegation.md`
### Timeline:
1. [10:23:45] code.md
2. [10:24:12] delegation.md
3. [10:25:01] code.md
```
**When to use:**
- To verify agent loaded required context
- To understand which standards were applied
- To debug context loading issues
---
### 5. `check_context_compliance`
**Purpose:** Verify required context files were read BEFORE executing tasks
**Usage:**
```bash
check_context_compliance
```
**Example Output:**
```
## Context Loading Compliance
**Score:** 100%
- ✅ Compliant: 2
- ⚠️ Non-compliant: 0
### ✅ Compliant Actions:
- ✅ Loaded standards/code.md before code writing
- ✅ Loaded workflows/delegation.md before delegation
### Context Loading Rules:
According to OpenAgent prompt, the agent should:
1. Detect task type from user request
2. Read required context file FIRST
3. Then execute task following those standards
**Pattern:** "Fetch context BEFORE starting work, not during or after"
```
**Context loading rules:**
- Writing code → should read `standards/code.md`
- Writing docs → should read `standards/docs.md`
- Writing tests → should read `standards/tests.md`
- Code review → should read `workflows/review.md`
- Delegating → should read `workflows/delegation.md`
**When to use:**
- To verify lazy loading is working
- To ensure standards are being followed
- To debug why agent isn't following patterns
---
### 6. `analyze_delegation`
**Purpose:** Analyze delegation decisions against the 4+ file rule
**Usage:**
```bash
analyze_delegation
```
**Example Output:**
```
## Delegation Analysis
**Total delegations:** 3
- ✅ Appropriate: 2
- ⚠️ Questionable: 1
**File count per delegation:**
- Average: 4.3 files
- Range: 2 - 6 files
- Threshold: 4+ files
```
**When to use:**
- After complex multi-file tasks
- To verify delegation logic
- To tune delegation thresholds
---
### 7. `debug_validator`
**Purpose:** Inspect what the validator is tracking (for debugging)
**Usage:**
```bash
debug_validator
```
**Example Output:**
```
## Debug Information
```json
{
"sessionID": "abc123...",
"behaviorLogEntries": 7,
"behaviorLogSampleFirst": [
{
"timestamp": 1700000000000,
"agent": "openagent",
"event": "tool_executed",
"data": { "tool": "bash" }
}
],
"behaviorLogSampleLast": [...],
"messagesCount": 5,
"toolTracker": {
"approvalRequested": true,
"toolsExecuted": ["bash", "read"]
},
"allBehaviorLogs": 7
}
```
**Analysis:**
- Behavior log entries for this session: 7
- Total behavior log entries: 7
- Messages in session: 5
- Tool execution tracker: Active
```
**When to use:**
- When validation tools aren't working as expected
- To see raw tracking data
- To debug plugin issues
- To understand internal state
---
### 8. `export_validation_report`
**Purpose:** Export comprehensive validation report to a markdown file
**Usage:**
```bash
export_validation_report
# or specify path
export_validation_report --output_path ./reports/validation.md
```
**Example Output:**
```
✅ Validation report exported to: .tmp/validation-abc12345.md
## Validation Report
[... summary ...]
```
**Generated report includes:**
- Full validation summary
- Detailed checks with evidence
- Tool usage timeline
- Context loading analysis
- Delegation decisions
- Compliance scores
**When to use:**
- To save validation results for review
- To share compliance reports
- To track agent behavior over time
- For auditing purposes
---
## Understanding Results
### Compliance Scores
- **100%** - Perfect compliance ✅
- **90-99%** - Excellent (minor warnings) 🟢
- **80-89%** - Good (some warnings) 🟡
- **70-79%** - Fair (multiple warnings) 🟠
- **<70%** - Needs improvement (errors) 🔴
### Severity Levels
- **✅ Info** - Informational, no issues
- **⚠️ Warning** - Non-critical issue, should review
- **❌ Error** - Critical rule violation, must fix
### Common Validation Checks
| Check | What It Validates | Pass Criteria |
|-------|------------------|---------------|
| `approval_gate_enforcement` | Approval requested before execution | Approval language found before bash/write/edit/task |
| `stop_on_failure` | No auto-fix after errors | Agent stops and reports errors instead of fixing |
| `lazy_context_loading` | Context loaded only when needed | Context files read match task requirements |
| `delegation_appropriateness` | Delegation follows 4+ file rule | Delegated when 4+ files, or didn't delegate when <4 |
| `context_loading_compliance` | Context loaded BEFORE execution | Required context file read before task execution |
| `tool_usage` | Tool calls tracked | All tool invocations logged |
---
## Common Workflows
### Workflow 1: Verify Agent Behavior After Task
**Scenario:** You asked the agent to implement a feature and want to verify it followed its rules.
```bash
# 1. Complete your task
> "Create a user authentication system"
[Agent works...]
# 2. Check what agents were involved
> "analyze_agent_usage"
# 3. Validate compliance
> "validate_session"
# 4. Check specific concerns
> "check_approval_gates"
> "check_context_compliance"
# 5. Export report if needed
> "export_validation_report"
```
---
### Workflow 2: Debug Agent Switching
**Scenario:** You want to verify the plugin tracks agent switches correctly.
```bash
# 1. Start with one agent
opencode --agent openagent
> "Run pwd"
> "proceed"
# 2. Switch to another agent (manually or via delegation)
# [Switch happens]
# 3. Check tracking
> "analyze_agent_usage"
# Expected: Shows both agents with their respective tools
```
---
### Workflow 3: Audit Context Loading
**Scenario:** You want to ensure the agent is loading the right context files.
```bash
# 1. Ask agent to do a task that requires context
> "Write a new API endpoint following our standards"
[Agent works...]
# 2. Check what context was loaded
> "analyze_context_reads"
# 3. Verify compliance
> "check_context_compliance"
# Expected: Should show standards/code.md was read BEFORE writing
```
---
### Workflow 4: Test Approval Gates
**Scenario:** Verify the agent always requests approval before execution.
```bash
# 1. Ask for an execution operation
> "Delete all .log files"
# 2. Agent should request approval
# Agent: "Approval needed before proceeding."
# 3. Approve
> "proceed"
# 4. Verify compliance
> "check_approval_gates"
# Expected: ✅ Approval gate compliance: PASSED
```
---
### Workflow 5: Monitor Delegation Decisions
**Scenario:** Check if agent delegates appropriately for complex tasks.
```bash
# 1. Give a complex multi-file task
> "Refactor the authentication module across 5 files"
[Agent works...]
# 2. Check delegation
> "analyze_delegation"
# Expected: Should show delegation was appropriate (5 files >= 4 threshold)
```
---
## Troubleshooting
### Issue: "No agent activity tracked yet in this session"
**Cause:** Plugin just loaded, no tracking data yet
**Solution:**
1. Perform some actions (bash, read, write, etc.)
2. Then run validation tools
3. Plugin tracks from session start, so early checks may show no data
---
### Issue: "No execution operations tracked in this session"
**Cause:** No bash/write/edit/task operations performed yet
**Solution:**
1. Run a command that requires execution (e.g., "run pwd")
2. Then check approval gates
3. Read-only operations (read, list) don't trigger approval gates
---
### Issue: False positive on approval gate violations
**Cause:** Agent used different approval phrasing than expected
**Solution:**
1. Check the approval keywords in `agent-validator.ts` (lines 12-22)
2. Add custom patterns if your agent uses different phrasing
3. Current keywords: "approval", "approve", "proceed", "confirm", "permission", etc.
**Example customization:**
```typescript
const approvalKeywords = [
"approval",
"approve",
"proceed",
"confirm",
"permission",
"before proceeding",
"should i",
"may i",
"can i proceed",
// Add your custom patterns:
"ready to execute",
"waiting for go-ahead",
]
```
---
### Issue: Context compliance shows warnings but files were read
**Cause:** Timing issue - context read after task started
**Solution:**
1. Verify agent reads context BEFORE execution (not during/after)
2. Check timeline in `analyze_context_reads`
3. Agent should follow: Detect task → Read context → Execute
---
### Issue: Agent switches not tracked
**Cause:** Agent name not properly captured
**Solution:**
1. Run `debug_validator` to see raw tracking data
2. Check `sessionAgentTracker` in debug output
3. Verify agent name is being passed in `chat.message` hook
---
### Issue: Validation report shows 0% score
**Cause:** No validation checks were performed
**Solution:**
1. Ensure you've performed actions that trigger checks
2. Run `debug_validator` to see what's tracked
3. Try a simple task first (e.g., "run pwd")
---
## Advanced Usage
### Customizing Validation Rules
Edit `.opencode/plugin/agent-validator.ts` to customize:
**1. Add custom approval keywords:**
```typescript
// Line 12-22
const approvalKeywords = [
"approval",
"approve",
// Add yours:
"your custom phrase",
]
```
**2. Adjust delegation threshold:**
```typescript
// Line 768
const shouldDelegate = writeEditCount >= 4 // Change 4 to your threshold
```
**3. Add custom context loading rules:**
```typescript
// Line 824-851
const contextRules = [
{
taskKeywords: ["your task type"],
requiredFile: "your/context/file.md",
taskType: "your task name"
},
// ... existing rules
]
```
**4. Change severity levels:**
```typescript
// Line 719-726
checks.push({
rule: "your_rule",
passed: condition,
severity: "error", // Change to "warning" or "info"
details: "Your message",
})
```
---
### Integration with CI/CD
Export validation reports in automated workflows:
```bash
#!/bin/bash
# validate-agent-session.sh
# Run OpenCode task
opencode --agent openagent --input "Build the feature"
# Export validation report
opencode --agent openagent --input "export_validation_report --output_path ./reports/validation.md"
# Check exit code (if validation fails)
if grep -q "❌ Failed: [1-9]" ./reports/validation.md; then
echo "Validation failed!"
exit 1
fi
echo "Validation passed!"
```
---
### Creating Custom Validation Tools
Add new tools to the plugin:
```typescript
// In agent-validator.ts, add to tool object:
your_custom_tool: tool({
description: "Your tool description",
args: {
your_arg: tool.schema.string().optional(),
},
async execute(args, context) {
const { sessionID } = context
// Your validation logic here
const result = analyzeYourMetric(sessionID)
return formatYourReport(result)
},
}),
```
---
### Tracking Custom Events
Add custom event tracking:
```typescript
// In the event() hook:
async event(input) {
const { event } = input
// Track your custom event
if (event.type === "your.custom.event") {
behaviorLog.push({
timestamp: Date.now(),
sessionID: event.properties.sessionID,
agent: event.properties.agent || "unknown",
event: "your_custom_event",
data: {
// Your custom data
},
})
}
}
```
---
## Real-World Examples
### Example 1: Testing Agent Tracking
**Session:**
```bash
$ opencode --agent openagent
> "Help me test this plugin, I am trying to verify if an agent keeps to its promises"
Agent: Let me run some tests to generate tracking data.
> "proceed"
[Agent runs: pwd, reads README.md]
> "analyze_agent_usage"
```
**Result:**
```
## Agent Usage Report
**Agents detected:** 1
**Total events:** 4
### openagent
**Active duration:** 133s
**Events:** 4
**Tools used:**
- bash: 2x
- read: 1x
- analyze_agent_usage: 1x
```
**Verification:** ✅ Plugin successfully tracked agent name, tools, and events
---
### Example 2: Detecting Agent Switch
**Session:**
```bash
$ opencode --agent build
[Do some work with build agent]
$ opencode --agent openagent
[Switch to openagent]
> "analyze_agent_usage"
```
**Result:**
```
## Agent Usage Report
**Agents detected:** 2
**Total events:** 7
### build
**Active duration:** 0s
**Events:** 2
**Tools used:**
- bash: 2x
### openagent
**Active duration:** 133s
**Events:** 5
**Tools used:**
- bash: 2x
- read: 1x
- analyze_agent_usage: 2x
```
**Verification:** ✅ Plugin tracked both agents and their respective activities
---
### Example 3: Approval Gate Validation
**Session:**
```bash
> "Run npm install"
Agent: ## Proposed Plan
1. Run npm install
**Approval needed before proceeding.**
> "proceed"
[Agent executes]
> "check_approval_gates"
```
**Result:**
```
✅ Approval gate compliance: PASSED
All 1 execution operation(s) were properly approved.
```
**Verification:** ✅ Agent requested approval before bash execution
---
## Best Practices
### 1. Validate After Complex Tasks
Always run validation after multi-step or complex tasks to ensure compliance.
### 2. Export Reports for Auditing
Use `export_validation_report` to keep records of agent behavior over time.
### 3. Check Context Loading
Verify agents are loading the right context files with `check_context_compliance`.
### 4. Monitor Agent Switches
Use `analyze_agent_usage` to track delegation and agent switching patterns.
### 5. Debug Early
If something seems off, run `debug_validator` immediately to see raw data.
### 6. Customize for Your Needs
Adjust validation rules, thresholds, and keywords to match your workflow.
### 7. Integrate with Workflows
Add validation checks to your development workflow or CI/CD pipeline.
---
## FAQ
### Q: Does the plugin slow down OpenCode?
**A:** No, tracking is lightweight and runs asynchronously. Minimal performance impact.
### Q: Can I disable specific validation checks?
**A:** Yes, edit `agent-validator.ts` and comment out checks you don't need.
### Q: Does validation data persist across sessions?
**A:** No, tracking is per-session. Each new OpenCode session starts fresh.
### Q: Can I track custom metrics?
**A:** Yes, add custom event tracking and validation tools (see Advanced Usage).
### Q: What if I get false positives?
**A:** Customize approval keywords and validation patterns in `agent-validator.ts`.
### Q: Can I use this with other agents?
**A:** Yes, the plugin tracks any agent running in OpenCode.
### Q: How do I reset tracking data?
**A:** Restart OpenCode - tracking resets on each session start.
### Q: Can I export data in JSON format?
**A:** Currently exports as Markdown. You can modify `generateDetailedReport()` for JSON.
---
## Next Steps
1. **Test the plugin** - Run through the Quick Start workflow
2. **Validate a real task** - Use it on an actual project task
3. **Customize rules** - Adjust validation patterns for your needs
4. **Integrate into workflow** - Add validation checks to your process
5. **Share feedback** - Report issues or suggest improvements
---
## Support
- **Issues:** Report bugs or request features in the repository
- **Customization:** Edit `agent-validator.ts` for your needs
- **Documentation:** This guide + inline code comments
---
**Happy validating! 🎯**