Nicholai f372ab56de chore: add project configuration and agent files

Add BMAD, Claude, Cursor, and OpenCode configuration directories along with AGENTS.md documentation.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

2025-11-27 04:31:56 -07:00

19 KiB

Raw Blame History

Agent Validator Plugin - Management Guide

Overview

The Agent Validator Plugin is a real-time monitoring and validation system for OpenCode agents. It tracks agent behavior, validates compliance with defined rules, and provides detailed reports on how agents execute tasks.

What It Does

Tracks agent activity - Monitors which agents are active and what tools they use
Validates approval gates - Ensures agents request approval before executing operations
Analyzes context loading - Checks if agents load required context files before tasks
Monitors delegation - Validates delegation decisions follow the 4+ file rule
Detects violations - Identifies critical rule violations (auto-fix attempts, missing approvals)
Generates reports - Creates comprehensive validation reports with compliance scores

Why Use It

Verify agent behavior - Confirm agents follow their defined prompts
Debug issues - Understand what agents are doing and why
Track compliance - Ensure critical safety rules are enforced
Improve prompts - Identify patterns that need refinement
Multi-agent tracking - Monitor agent switches and delegation flows

Quick Start

Installation

The plugin auto-loads from .opencode/plugin/ when OpenCode starts.

Install dependencies:

cd ~/.opencode/plugin
npm install
# or
bun install

Verify installation:

opencode --agent openagent
> "analyze_agent_usage"

If you see agent tracking data, the plugin is working! ✅

Your First Validation

Start a session and do some work:

opencode --agent openagent
> "Run pwd command"
Agent: [requests approval]
> "proceed"

Check what was tracked:
```
> "analyze_agent_usage"
```
Validate compliance:
```
> "validate_session"
```

Available Tools

The plugin provides 7 validation tools:

1. `analyze_agent_usage`

Purpose: Show which agents were active and what tools they used

Usage:

analyze_agent_usage

Example Output:

## Agent Usage Report

**Agents detected:** 2
**Total events:** 7

### openagent
**Active duration:** 133s
**Events:** 5
**Tools used:**
- bash: 2x
- read: 1x
- analyze_agent_usage: 2x

### build
**Active duration:** 0s
**Events:** 2
**Tools used:**
- bash: 2x

When to use:

After agent switches to verify tracking
To see tool usage patterns
To debug which agent did what

2. `validate_session`

Purpose: Comprehensive validation of agent behavior against defined rules

Usage:

validate_session
# or with details
validate_session --include_details true

Example Output:

## Validation Report

**Score:** 95%
- ✅ Passed: 18
- ⚠️  Warnings: 1
- ❌ Failed: 0

### ⚠️  Warnings
- **delegation_appropriateness**: Delegated but only 2 files (< 4 threshold)

What it checks:

Approval gate enforcement
Tool usage patterns
Context loading behavior
Delegation appropriateness
Critical rule compliance

When to use:

After completing a complex task
To verify agent followed its prompt
Before finalizing work
When debugging unexpected behavior

3. `check_approval_gates`

Purpose: Verify approval gates were enforced before execution operations

Usage:

check_approval_gates

Example Output:

✅ Approval gate compliance: PASSED

All 3 execution operation(s) were properly approved.

Or if violations found:

⚠️ Approval gate compliance: FAILED

Executed 2 operation(s) without approval:
  - bash
  - write

Critical rule violated: approval_gate

When to use:

After bash/write/edit/task operations
To verify safety compliance
When auditing agent behavior

4. `analyze_context_reads`

Purpose: Show all context files that were read during the session

Usage:

analyze_context_reads

Example Output:

## Context Files Read

**Total reads:** 3

### Files Read:
- **code.md** (2 reads)
  `.opencode/context/core/standards/code.md`
- **delegation.md** (1 read)
  `.opencode/context/core/workflows/delegation.md`

### Timeline:
1. [10:23:45] code.md
2. [10:24:12] delegation.md
3. [10:25:01] code.md

When to use:

To verify agent loaded required context
To understand which standards were applied
To debug context loading issues

5. `check_context_compliance`

Purpose: Verify required context files were read BEFORE executing tasks

Usage:

check_context_compliance

Example Output:

## Context Loading Compliance

**Score:** 100%
- ✅ Compliant: 2
- ⚠️  Non-compliant: 0

### ✅ Compliant Actions:
- ✅ Loaded standards/code.md before code writing
- ✅ Loaded workflows/delegation.md before delegation

### Context Loading Rules:
According to OpenAgent prompt, the agent should:
1. Detect task type from user request
2. Read required context file FIRST
3. Then execute task following those standards

**Pattern:** "Fetch context BEFORE starting work, not during or after"

Context loading rules:

Writing code → should read standards/code.md
Writing docs → should read standards/docs.md
Writing tests → should read standards/tests.md
Code review → should read workflows/review.md
Delegating → should read workflows/delegation.md

When to use:

To verify lazy loading is working
To ensure standards are being followed
To debug why agent isn't following patterns

6. `analyze_delegation`

Purpose: Analyze delegation decisions against the 4+ file rule

Usage:

analyze_delegation

Example Output:

## Delegation Analysis

**Total delegations:** 3
- ✅ Appropriate: 2
- ⚠️  Questionable: 1

**File count per delegation:**
- Average: 4.3 files
- Range: 2 - 6 files
- Threshold: 4+ files

When to use:

After complex multi-file tasks
To verify delegation logic
To tune delegation thresholds

7. `debug_validator`

Purpose: Inspect what the validator is tracking (for debugging)

Usage:

debug_validator

Example Output:

## Debug Information

```json
{
  "sessionID": "abc123...",
  "behaviorLogEntries": 7,
  "behaviorLogSampleFirst": [
    {
      "timestamp": 1700000000000,
      "agent": "openagent",
      "event": "tool_executed",
      "data": { "tool": "bash" }
    }
  ],
  "behaviorLogSampleLast": [...],
  "messagesCount": 5,
  "toolTracker": {
    "approvalRequested": true,
    "toolsExecuted": ["bash", "read"]
  },
  "allBehaviorLogs": 7
}

Analysis:

Behavior log entries for this session: 7
Total behavior log entries: 7
Messages in session: 5
Tool execution tracker: Active


**When to use:**
- When validation tools aren't working as expected
- To see raw tracking data
- To debug plugin issues
- To understand internal state

---

### 8. `export_validation_report`

**Purpose:** Export comprehensive validation report to a markdown file

**Usage:**
```bash
export_validation_report
# or specify path
export_validation_report --output_path ./reports/validation.md

Example Output:

✅ Validation report exported to: .tmp/validation-abc12345.md

## Validation Report
[... summary ...]

Generated report includes:

Full validation summary
Detailed checks with evidence
Tool usage timeline
Context loading analysis
Delegation decisions
Compliance scores

When to use:

To save validation results for review
To share compliance reports
To track agent behavior over time
For auditing purposes

Understanding Results

Compliance Scores

100% - Perfect compliance ✅
90-99% - Excellent (minor warnings) 🟢
80-89% - Good (some warnings) 🟡
70-79% - Fair (multiple warnings) 🟠
<70% - Needs improvement (errors) 🔴

Severity Levels

✅ Info - Informational, no issues
⚠️ Warning - Non-critical issue, should review
❌ Error - Critical rule violation, must fix

Common Validation Checks

Check	What It Validates	Pass Criteria
`approval_gate_enforcement`	Approval requested before execution	Approval language found before bash/write/edit/task
`stop_on_failure`	No auto-fix after errors	Agent stops and reports errors instead of fixing
`lazy_context_loading`	Context loaded only when needed	Context files read match task requirements
`delegation_appropriateness`	Delegation follows 4+ file rule	Delegated when 4+ files, or didn't delegate when <4
`context_loading_compliance`	Context loaded BEFORE execution	Required context file read before task execution
`tool_usage`	Tool calls tracked	All tool invocations logged

Common Workflows

Workflow 1: Verify Agent Behavior After Task

Scenario: You asked the agent to implement a feature and want to verify it followed its rules.

# 1. Complete your task
> "Create a user authentication system"
[Agent works...]

# 2. Check what agents were involved
> "analyze_agent_usage"

# 3. Validate compliance
> "validate_session"

# 4. Check specific concerns
> "check_approval_gates"
> "check_context_compliance"

# 5. Export report if needed
> "export_validation_report"

Workflow 2: Debug Agent Switching

Scenario: You want to verify the plugin tracks agent switches correctly.

# 1. Start with one agent
opencode --agent openagent
> "Run pwd"
> "proceed"

# 2. Switch to another agent (manually or via delegation)
# [Switch happens]

# 3. Check tracking
> "analyze_agent_usage"

# Expected: Shows both agents with their respective tools

Workflow 3: Audit Context Loading

Scenario: You want to ensure the agent is loading the right context files.

# 1. Ask agent to do a task that requires context
> "Write a new API endpoint following our standards"
[Agent works...]

# 2. Check what context was loaded
> "analyze_context_reads"

# 3. Verify compliance
> "check_context_compliance"

# Expected: Should show standards/code.md was read BEFORE writing

Workflow 4: Test Approval Gates

Scenario: Verify the agent always requests approval before execution.

# 1. Ask for an execution operation
> "Delete all .log files"

# 2. Agent should request approval
# Agent: "Approval needed before proceeding."

# 3. Approve
> "proceed"

# 4. Verify compliance
> "check_approval_gates"

# Expected: ✅ Approval gate compliance: PASSED

Workflow 5: Monitor Delegation Decisions

Scenario: Check if agent delegates appropriately for complex tasks.

# 1. Give a complex multi-file task
> "Refactor the authentication module across 5 files"
[Agent works...]

# 2. Check delegation
> "analyze_delegation"

# Expected: Should show delegation was appropriate (5 files >= 4 threshold)

Troubleshooting

Issue: "No agent activity tracked yet in this session"

Cause: Plugin just loaded, no tracking data yet

Solution:

Perform some actions (bash, read, write, etc.)
Then run validation tools
Plugin tracks from session start, so early checks may show no data

Issue: "No execution operations tracked in this session"

Cause: No bash/write/edit/task operations performed yet

Solution:

Run a command that requires execution (e.g., "run pwd")
Then check approval gates
Read-only operations (read, list) don't trigger approval gates

Issue: False positive on approval gate violations

Cause: Agent used different approval phrasing than expected

Solution:

Check the approval keywords in agent-validator.ts (lines 12-22)
Add custom patterns if your agent uses different phrasing
Current keywords: "approval", "approve", "proceed", "confirm", "permission", etc.

Example customization:

const approvalKeywords = [
  "approval",
  "approve",
  "proceed",
  "confirm",
  "permission",
  "before proceeding",
  "should i",
  "may i",
  "can i proceed",
  // Add your custom patterns:
  "ready to execute",
  "waiting for go-ahead",
]

Issue: Context compliance shows warnings but files were read

Cause: Timing issue - context read after task started

Solution:

Verify agent reads context BEFORE execution (not during/after)
Check timeline in analyze_context_reads
Agent should follow: Detect task → Read context → Execute

Issue: Agent switches not tracked

Cause: Agent name not properly captured

Solution:

Run debug_validator to see raw tracking data
Check sessionAgentTracker in debug output
Verify agent name is being passed in chat.message hook

Issue: Validation report shows 0% score

Cause: No validation checks were performed

Solution:

Ensure you've performed actions that trigger checks
Run debug_validator to see what's tracked
Try a simple task first (e.g., "run pwd")

Advanced Usage

Customizing Validation Rules

Edit .opencode/plugin/agent-validator.ts to customize:

1. Add custom approval keywords:

// Line 12-22
const approvalKeywords = [
  "approval",
  "approve",
  // Add yours:
  "your custom phrase",
]

2. Adjust delegation threshold:

// Line 768
const shouldDelegate = writeEditCount >= 4  // Change 4 to your threshold

3. Add custom context loading rules:

// Line 824-851
const contextRules = [
  {
    taskKeywords: ["your task type"],
    requiredFile: "your/context/file.md",
    taskType: "your task name"
  },
  // ... existing rules
]

4. Change severity levels:

// Line 719-726
checks.push({
  rule: "your_rule",
  passed: condition,
  severity: "error",  // Change to "warning" or "info"
  details: "Your message",
})

Integration with CI/CD

Export validation reports in automated workflows:

#!/bin/bash
# validate-agent-session.sh

# Run OpenCode task
opencode --agent openagent --input "Build the feature"

# Export validation report
opencode --agent openagent --input "export_validation_report --output_path ./reports/validation.md"

# Check exit code (if validation fails)
if grep -q "❌ Failed: [1-9]" ./reports/validation.md; then
  echo "Validation failed!"
  exit 1
fi

echo "Validation passed!"

Creating Custom Validation Tools

Add new tools to the plugin:

// In agent-validator.ts, add to tool object:
your_custom_tool: tool({
  description: "Your tool description",
  args: {
    your_arg: tool.schema.string().optional(),
  },
  async execute(args, context) {
    const { sessionID } = context
    
    // Your validation logic here
    const result = analyzeYourMetric(sessionID)
    
    return formatYourReport(result)
  },
}),

Tracking Custom Events

Add custom event tracking:

// In the event() hook:
async event(input) {
  const { event } = input
  
  // Track your custom event
  if (event.type === "your.custom.event") {
    behaviorLog.push({
      timestamp: Date.now(),
      sessionID: event.properties.sessionID,
      agent: event.properties.agent || "unknown",
      event: "your_custom_event",
      data: {
        // Your custom data
      },
    })
  }
}

Real-World Examples

Example 1: Testing Agent Tracking

Session:

$ opencode --agent openagent

> "Help me test this plugin, I am trying to verify if an agent keeps to its promises"

Agent: Let me run some tests to generate tracking data.

> "proceed"

[Agent runs: pwd, reads README.md]

> "analyze_agent_usage"

Result:

## Agent Usage Report

**Agents detected:** 1
**Total events:** 4

### openagent
**Active duration:** 133s
**Events:** 4
**Tools used:**
- bash: 2x
- read: 1x
- analyze_agent_usage: 1x

Verification: ✅ Plugin successfully tracked agent name, tools, and events

Example 2: Detecting Agent Switch

Session:

$ opencode --agent build
[Do some work with build agent]

$ opencode --agent openagent
[Switch to openagent]

> "analyze_agent_usage"

Result:

## Agent Usage Report

**Agents detected:** 2
**Total events:** 7

### build
**Active duration:** 0s
**Events:** 2
**Tools used:**
- bash: 2x

### openagent
**Active duration:** 133s
**Events:** 5
**Tools used:**
- bash: 2x
- read: 1x
- analyze_agent_usage: 2x

Verification: ✅ Plugin tracked both agents and their respective activities

Example 3: Approval Gate Validation

Session:

> "Run npm install"

Agent: ## Proposed Plan
1. Run npm install

**Approval needed before proceeding.**

> "proceed"

[Agent executes]

> "check_approval_gates"

Result:

✅ Approval gate compliance: PASSED

All 1 execution operation(s) were properly approved.

Verification: ✅ Agent requested approval before bash execution

Best Practices

1. Validate After Complex Tasks

Always run validation after multi-step or complex tasks to ensure compliance.

2. Export Reports for Auditing

Use export_validation_report to keep records of agent behavior over time.

3. Check Context Loading

Verify agents are loading the right context files with check_context_compliance.

4. Monitor Agent Switches

Use analyze_agent_usage to track delegation and agent switching patterns.

5. Debug Early

If something seems off, run debug_validator immediately to see raw data.

6. Customize for Your Needs

Adjust validation rules, thresholds, and keywords to match your workflow.

7. Integrate with Workflows

Add validation checks to your development workflow or CI/CD pipeline.

FAQ

Q: Does the plugin slow down OpenCode?

A: No, tracking is lightweight and runs asynchronously. Minimal performance impact.

Q: Can I disable specific validation checks?

A: Yes, edit agent-validator.ts and comment out checks you don't need.

Q: Does validation data persist across sessions?

A: No, tracking is per-session. Each new OpenCode session starts fresh.

Q: Can I track custom metrics?

A: Yes, add custom event tracking and validation tools (see Advanced Usage).

Q: What if I get false positives?

A: Customize approval keywords and validation patterns in agent-validator.ts.

Q: Can I use this with other agents?

A: Yes, the plugin tracks any agent running in OpenCode.

Q: How do I reset tracking data?

A: Restart OpenCode - tracking resets on each session start.

Q: Can I export data in JSON format?

A: Currently exports as Markdown. You can modify generateDetailedReport() for JSON.

Next Steps

Test the plugin - Run through the Quick Start workflow
Validate a real task - Use it on an actual project task
Customize rules - Adjust validation patterns for your needs
Integrate into workflow - Add validation checks to your process
Share feedback - Report issues or suggest improvements

Support

Issues: Report bugs or request features in the repository
Customization: Edit agent-validator.ts for your needs
Documentation: This guide + inline code comments

Happy validating! 🎯

19 KiB Raw Blame History

Agent Validator Plugin - Management Guide

Overview

What It Does

Why Use It

Quick Start

Installation

Your First Validation

Available Tools

1. analyze_agent_usage

2. validate_session

3. check_approval_gates

4. analyze_context_reads

5. check_context_compliance

6. analyze_delegation

7. debug_validator

Understanding Results

Compliance Scores

Severity Levels

Common Validation Checks

Common Workflows

Workflow 1: Verify Agent Behavior After Task

Workflow 2: Debug Agent Switching

Workflow 3: Audit Context Loading

Workflow 4: Test Approval Gates

Workflow 5: Monitor Delegation Decisions

Troubleshooting

Issue: "No agent activity tracked yet in this session"

Issue: "No execution operations tracked in this session"

Issue: False positive on approval gate violations

Issue: Context compliance shows warnings but files were read

Issue: Agent switches not tracked

Issue: Validation report shows 0% score

Advanced Usage

Customizing Validation Rules

Integration with CI/CD

Creating Custom Validation Tools

Tracking Custom Events

Real-World Examples

Example 1: Testing Agent Tracking

Example 2: Detecting Agent Switch

Example 3: Approval Gate Validation

Best Practices

1. Validate After Complex Tasks

2. Export Reports for Auditing

3. Check Context Loading

4. Monitor Agent Switches

5. Debug Early

6. Customize for Your Needs

7. Integrate with Workflows

FAQ

Q: Does the plugin slow down OpenCode?

Q: Can I disable specific validation checks?

Q: Does validation data persist across sessions?

Q: Can I track custom metrics?

Q: What if I get false positives?

Q: Can I use this with other agents?

Q: How do I reset tracking data?

Q: Can I export data in JSON format?

Next Steps

Support

19 KiB

Raw Blame History

1. `analyze_agent_usage`

2. `validate_session`

3. `check_approval_gates`

4. `analyze_context_reads`

5. `check_context_compliance`

6. `analyze_delegation`

7. `debug_validator`