DocumentationNeuronAgent Documentation
Documentation Branch: You are viewing documentation for the main branch (3.0.0-devel). Select a branch to view its documentation:

Quality & Evaluation

Overview

The evaluation framework provides comprehensive quality assessment and performance evaluation for agents, enabling continuous improvement and quality assurance.

Key Features

  • Reflections: Agent self-reflection and quality assessment with LLM-powered analysis
  • Quality Scoring: Automated quality scoring for agent responses using multiple metrics
  • Evaluation Framework: Built-in evaluation system with configurable metrics
  • Performance Metrics: Comprehensive metrics: accuracy, relevance, completeness, latency
  • Verification Agent: Dedicated verification agent for validating and cross-checking outputs
  • Execution Snapshots: Capture and replay agent execution states for debugging
  • Quality Reports: Automated quality reports with trends and recommendations

Quality Scoring

Quality scoring automatically evaluates agent responses using multiple metrics, providing comprehensive quality assessment.

Scoring Metrics

  • Accuracy: Correctness of agent responses
  • Relevance: Relevance to the query or task
  • Completeness: Completeness of the response
  • Coherence: Logical coherence and consistency
  • Latency: Response time and efficiency

Scoring Methods

Quality scoring uses LLM-based evaluation, rule-based metrics, and human feedback to provide comprehensive quality assessment.

Evaluation Framework

The evaluation framework provides a structured approach to evaluating agent performance with configurable metrics and evaluation criteria.

Evaluation Types

  • Automated Evaluation: LLM and rule-based automated evaluation
  • Human Evaluation: Human-in-the-loop evaluation with feedback
  • Comparative Evaluation: Compare multiple agents or versions

Configurable Metrics

Evaluation metrics can be configured per use case, enabling domain-specific evaluation criteria.

Performance Metrics

Comprehensive performance metrics provide detailed insights into agent performance across multiple dimensions.

Metric Categories

  • Quality Metrics: Accuracy, relevance, completeness
  • Efficiency Metrics: Latency, throughput, resource usage
  • Reliability Metrics: Error rates, success rates, consistency
  • Cost Metrics: Token usage, API costs, resource costs

Metric Aggregation

Metrics are aggregated across sessions, agents, and time periods, providing comprehensive performance insights.

Verification Agent

The verification agent provides dedicated validation and cross-checking of agent outputs, ensuring quality and correctness.

Verification Process

The verification agent analyzes agent outputs, checks for correctness, completeness, and consistency, and provides feedback for improvement.

Cross-Checking

Multiple verification agents can cross-check outputs, providing consensus-based validation.

Quality Reports

Automated quality reports provide trends, insights, and recommendations for improving agent performance.

Report Types

  • Performance Reports: Overall performance trends and metrics
  • Quality Trends: Quality score trends over time
  • Recommendation Reports: Actionable recommendations for improvement

Report Generation

Reports are automatically generated on a schedule or on-demand, providing continuous visibility into agent performance.

API Reference

The evaluation framework is accessible through the NeuronAgent REST API for running evaluations and viewing results.

Evaluation Endpoints

  • POST /api/v1/evaluations - Run an evaluation
  • GET /api/v1/evaluations - List evaluations
  • GET /api/v1/evaluations/:id - Get evaluation results
  • GET /api/v1/agents/:id/quality - Get agent quality metrics
  • POST /api/v1/agents/:id/reflect - Trigger agent reflection