Minh141120 0225bb6b1e feat: add prompt evaluation script for develop and deploy

2025-08-20 17:30:29 +07:00

7.9 KiB

Raw Blame History

AutoQA Reliability Testing Guide

🚀 Comprehensive guide for running reliability tests with AutoQA to verify test case stability and reliability.

Overview

Reliability testing is designed to verify that your test cases are stable and reliable by running them multiple times. This helps identify flaky tests and ensures consistent behavior before deploying to production.

Two Testing Phases

1. Development Phase

Purpose: Verify basic stability during development
Runs: 5 times
Success Rate Requirement: ≥80%
Use Case: During development to catch obvious stability issues

2. Deployment Phase

Purpose: Verify production readiness
Runs: 20 times
Success Rate Requirement: ≥90%
Use Case: Before deploying to production to ensure reliability

Command Line Usage

Basic Reliability Testing

# Development phase (5 runs)
python main.py --enable-reliability-test --reliability-phase development

# Deployment phase (20 runs)
python main.py --enable-reliability-test --reliability-phase deployment

Custom Configuration

# Custom number of runs
python main.py --enable-reliability-test --reliability-runs 10

# Specific test file
python main.py --enable-reliability-test --reliability-test-path "tests/base/default-jan-assistant.txt"

# Custom max turns
python main.py --enable-reliability-test --reliability-phase development --max-turns 50

With ReportPortal Integration

# Development phase with ReportPortal
python main.py \
  --enable-reliability-test \
  --reliability-phase development \
  --enable-reportportal \
  --rp-token "YOUR_TOKEN" \
  --rp-project "jan_reliability_tests"

# Deployment phase with ReportPortal
python main.py \
  --enable-reliability-test \
  --reliability-phase deployment \
  --enable-reportportal \
  --rp-token "YOUR_TOKEN" \
  --rp-project "jan_reliability_tests"

Environment Variables

# Enable reliability testing
export ENABLE_RELIABILITY_TEST=true

# Set phase
export RELIABILITY_PHASE=deployment

# Custom runs (overrides phase)
export RELIABILITY_RUNS=15

# Specific test path
export RELIABILITY_TEST_PATH="tests/base/my-test.txt"

# Run with environment variables
python main.py --enable-reliability-test

Command Line Arguments

Argument	Environment Variable	Default	Description
`--enable-reliability-test`	`ENABLE_RELIABILITY_TEST`	`false`	Enable reliability testing mode
`--reliability-phase`	`RELIABILITY_PHASE`	`development`	Testing phase: development or deployment
`--reliability-runs`	`RELIABILITY_RUNS`	`0`	Custom number of runs (overrides phase)
`--reliability-test-path`	`RELIABILITY_TEST_PATH`	-	Specific test file path

Test Execution Flow

Single Test Reliability Testing

Load Test File: Read the specified test file
Run Multiple Times: Execute the test the specified number of times
Track Results: Monitor success/failure for each run
Calculate Success Rate: Determine overall reliability
Generate Report: Provide detailed results and statistics

Multiple Tests Reliability Testing

Scan Test Files: Find all test files in the specified directory
Run Reliability Tests: Execute reliability testing on each test file
Aggregate Results: Combine results from all tests
Overall Assessment: Determine if the entire test suite is reliable

Output and Results

Success Rate Calculation

Success Rate = (Successful Runs / Total Runs) × 100

Development Phase Requirements

Target: 5 runs
Minimum Success Rate: 80%
Result: PASS if ≥80%, FAIL if <80%

Deployment Phase Requirements

Target: 20 runs
Minimum Success Rate: 90%
Result: PASS if ≥90%, FAIL if <90%

Sample Output

==========================================
RELIABILITY TEST SUMMARY
==========================================
Test: tests/base/default-jan-assistant.txt
Phase: DEVELOPMENT
Completed runs: 5/5
Successful runs: 4
Failed runs: 1
Success rate: 80.0%
Total duration: 125.3 seconds
Average duration per run: 25.1 seconds
Overall result: ✅ PASSED
Development phase requirement: ≥80% success rate

Use Cases

1. New Test Development

# Test a new test case for basic stability
python main.py \
  --enable-reliability-test \
  --reliability-phase development \
  --reliability-test-path "tests/base/my-new-test.txt"

2. Pre-Production Validation

# Verify test suite is production-ready
python main.py \
  --enable-reliability-test \
  --reliability-phase deployment \
  --tests-dir "tests/base"

3. Flaky Test Investigation

# Run a potentially flaky test multiple times
python main.py \
  --enable-reliability-test \
  --reliability-runs 25 \
  --reliability-test-path "tests/base/flaky-test.txt"

4. CI/CD Integration

# Automated reliability testing in CI/CD
ENABLE_RELIABILITY_TEST=true \
RELIABILITY_PHASE=deployment \
python main.py --max-turns 40

Best Practices

1. Start with Development Phase

Begin with 5 runs to catch obvious issues
Use during active development
Quick feedback on test stability

2. Use Deployment Phase for Production

Run 20 times before production deployment
Ensures high reliability standards
Catches intermittent failures

3. Custom Runs for Specific Needs

Use custom run counts for special testing scenarios
Investigate flaky tests with higher run counts
Balance between thoroughness and execution time

4. Monitor Execution Time

Reliability testing takes longer than single runs
Plan accordingly for CI/CD pipelines
Consider parallel execution for multiple test files

Troubleshooting

Common Issues

1. Test File Not Found

# Ensure test path is correct
python main.py \
  --enable-reliability-test \
  --reliability-test-path "tests/base/existing-test.txt"

2. Low Success Rate

Check test environment stability
Verify test dependencies
Review test logic for race conditions

3. Long Execution Time

Reduce max turns if appropriate
Use development phase for quick feedback
Consider running fewer test files

Debug Mode

# Enable debug logging
export LOG_LEVEL=DEBUG
export PYTHONPATH=.

# Run with verbose output
python main.py --enable-reliability-test --reliability-phase development

Integration with Existing Workflows

Migration Testing

# Run reliability tests on migration test cases
python main.py \
  --enable-reliability-test \
  --reliability-phase deployment \
  --tests-dir "tests/migration"

Base Testing

# Run reliability tests on base test cases
python main.py \
  --enable-reliability-test \
  --reliability-phase development \
  --tests-dir "tests/base"

Custom Test Directories

# Run reliability tests on custom test directory
python main.py \
  --enable-reliability-test \
  --reliability-phase deployment \
  --tests-dir "my_custom_tests"

Performance Considerations

Execution Time

Development Phase: ~5x single test execution time
Deployment Phase: ~20x single test execution time
Multiple Tests: Multiply by number of test files

Resource Usage

Screen recordings for each run
Trajectory data for each run
ReportPortal uploads (if enabled)

Optimization Tips

Use development phase for quick feedback
Run deployment phase during off-peak hours
Consider parallel execution for multiple test files
Clean up old recordings and trajectories regularly

Next Steps

Start Simple: Begin with development phase on single test files
Scale Up: Move to deployment phase for critical tests
Automate: Integrate into CI/CD pipelines
Monitor: Track reliability trends over time
Improve: Use results to identify and fix flaky tests

For more information, see the main README.md, QUICK_START.md, and explore the test files in the tests/ directory.

7.9 KiB Raw Blame History Unescape Escape