Troubleshoot agentic evaluations

  • Release version: Australia
  • Updated March 18, 2026
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Troubleshoot agentic evaluations

    This guide helps ServiceNow customers diagnose and resolve common issues encountered when running agentic evaluations in AI Agent Studio. It addresses errors like evaluation run failures, data ingestion problems, and unexpected results, enabling you to ensure accurate assessment of your agentic AI assets.

    Show full answer Show less

    Common Causes of Evaluation Run Failures

    • Agent Version Unavailable: Confirm the agent version selected for evaluation exists and is not deleted or archived.
    • User Permissions: Verify your user account has the necessary permissions to execute evaluation runs and access the specific AI asset. Use access tests to confirm.
    • Data Format Errors: Ensure datasets comply with required formats and supported data types. Malformed records can cause failures.
    • Metric and Data Mismatch: Check that all metrics have the required data inputs, including ground truth fields when necessary.

    Addressing Agentic AI Asset Underperformance

    • Dataset Coverage: Review evaluation datasets for gaps or missing edge cases that may not expose agent weaknesses. Add representative scenarios to better reflect real-world usage.
    • Metric Selection: Assess whether chosen metrics accurately capture performance issues. Consider creating custom metrics to evaluate aspects like response length or formatting.
    • Scoring Thresholds: Adjust pass/fail thresholds on metrics to better align with your organizational requirements.

    When Optimizations Don’t Improve Re-evaluation Scores

    • Examine trace details to ensure optimizations address root causes, not just symptoms.
    • Check for regressions where improvements in one metric degrade others.
    • Verify that updated optimization steps are applied to the correct agent version being evaluated.

    Data Processing Error Causes and Solutions

    • Incorrect File Format: Use only CSV or structured JSON files for datasets.
    • Missing Required Fields: Include all necessary columns, especially ground truth fields if required by metrics.
    • Encoding Issues: Ensure files are UTF-8 encoded to prevent processing failures.
    • File Size: Large datasets may time out; reduce size or contact your Platform Administrator if needed.

    Find solutions to common evaluation errors, including run failures, data ingestion issues, and unexpected results.

    When using agentic evaluations, you may see unexpected execution results or errors. The following discusses situations you could encounter and some of the reasons why those situations occur.

    Evaluation run failed

    There are a few different reasons why an evaluation run may fail to execute properly.

    Agent version unavailable
    Verify that the selected agent version still exists in AI Agent Studio. The version does not have to be the one currently active, but deleted or archived versions can't be evaluated.
    User permissions
    Confirm that your User record has the permissions required to execute evaluation runs in general and to use the specific AI asset. To check whether a certain user has access, you can perform an access test. See Test user access to an AI agent and Test user access to an agentic workflow.
    Data format errors
    Verify that the dataset conforms to the required format. Malformed records can cause the evaluation to fail. See Data requirements for agentic evaluations for the supported data types.
    Metric and data mismatch
    Confirm that all selected metrics have the required data inputs. Metrics that require ground truth will fail if the ground truth field is missing from the dataset.

    Agentic AI asset underperformance despite no issues found

    If the evaluation found no issues, but the specific agentic AI asset is still not performing to acceptable standards, consider the following:

    Dataset coverage
    The evaluation dataset may not include the types of inputs or scenarios that expose the agent's weaknesses. Review the dataset for any coverage gaps and add representative edge cases to more closely align what is being evaluated with real-world scenarios.
    Metric selection
    The selected metrics may not be measuring where the agentic AI asset is failing. Review whether additional or different metrics would better capture the performance gap. You can create custom metrics to evaluate other dimensions of the agentic AI asset responses or actions, such as length of response or whether a response meets certain formatting requirements.
    Scoring thresholds
    The pass threshold for a metric may be set at a level that does not reflect your requirements. Review threshold settings in the metrics configuration to redefine success and failure.

    Optimization applied, but re-evaluation didn't improve

    If the re-evaluation scores did not improve after applying optimizations, try the following:

    • Review trace details for the issues that were targeted. The optimization may have only alleviated surface-level symptoms without resolving the underlying root cause.
    • Check whether the optimization introduced a regression in a different metric. Score improvements in one area can sometimes degrade another, lowering the final scores.
    • If the optimization was applied to the list of steps of an agentic AI asset, verify that the updated list of steps was applied to the version you are evaluating.

    Data processing errors

    If the data can't be processed because it doesn't meet data requirements, the evaluation can't execute properly. The following describes common causes of data processing errors:

    Incorrect file format
    The accepted file formats are CSV and structured JSON. Other file formats can't be processed.
    Missing required fields
    Datasets must include the fields required by the selected metrics. Check for missing or misnamed columns. If you're using a ground truth, you must include it in the dataset.
    Encoding issues
    Files must be UTF-8 encoded. Files with non-standard encoding may fail to be processed.
    File size
    Very large files or datasets may time out during processing. If this occurs, reduce the dataset size or contact your Platform Administrator.