Review results

Australia Enable AI

Release

australia

ft:locale

en-US

ft:publication_title

Australia Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Review agentic evaluation outputs

Release version: Australia

Updated March 18, 2026

2 minutes to read

Assess your agent's overall performance after a run completes, including per-metric scores and issue counts. Use the results as your starting point for diagnosing quality issues and opportunities for improvement before deployment.

Before you begin

You must have a completed agentic evaluation.

Role required: sn_aia.admin or admin

About this task

Automated evaluations include scores and recommendations across the different metrics you chose. Each output provides information you can use to make decisions about development and deployment of the agentic AI asset. The evaluation results help you identify performance patterns, quality issues, and optimization opportunities before deploying your agent to production.

Procedure

Navigate to All > Now Assist Skill Kit > Agentic Evaluations.
Select the automated evaluation you want to review the results of.
The evaluation details page opens, displaying the overall results and performance metrics.
Review the evaluation summary section to understand the overall performance.
The summary provides a high-level overview of your agent's performance across all evaluated metrics. Key information includes:
- Agentic AI asset information such as name and version
- Total number of test cases evaluated
- Average scores across all metrics
- Number of issues identified by severity level
Review the overall LLM-judged scores for each metric.
General LLM-judged scores for each metric demonstrate overall patterns and trends across the metrics you have evaluated against. These scores provide general recommendations for deployment based on the current version of the agentic AI asset. Detailed results include:
- Numerical score
- Performance rating (Excellent, Good, Moderate, or Poor)
- Individual record evaluations
Investigate any issues and their associated traces.
If problems with the agentic AI asset's performance are found, they are categorized by severity level, metric, and use case. Issues can be tracked down to their sources in specific interactions, called "traces." Review issues and their traces to diagnose underlying issues. Issues are classified by severity level:
- Critical: Issues that can prevent the agent from functioning correctly, resulting in a poor user experience
- High: Significant problems that impact user experience or accuracy
- Medium: Moderate issues that may affect performance in some scenarios
- Low: Minor issues that have minimal impact on overall functionality
Apply optimizations based on the findings.

The automated evaluation can include recommended optimizations to address issues found in the evaluation. After you have applied the optimization, you can rerun the evaluation to see the changes in behavior and performance. Track improvements by comparing results across evaluation runs.

Result

You have a comprehensive review of your agent's performance across all evaluated metrics. Use these insights to make informed decisions about deployment readiness or identify areas requiring additional development work.