Multi-model Batch Testing

Australia Enable AI

Release

australia

ft:locale

en-US

ft:publication_title

Australia Enable AI

ft:clusterId

platai

bundleId

platai

workflow

Platform

Multi-model Batch Testing

Release version: Australia

Updated March 12, 2026

2 minutes to read

Summarize

Summarized using AI

Summary of Multi-model Batch Testing

Multi-model Batch Testing enables ServiceNow customers to evaluate multiple Natural Language Understanding (NLU) models by testing them against large sets of utterances. This feature helps assess model performance by comparing predicted intents with expected intents in a structured, scalable way. It supports all languages available in ServiceNow’s NLU offerings.

Show full answer Show less

Installation and Setup

Multi-model Batch Testing is included in the NLU Workbench - Advanced Features app, available on the ServiceNow Store. To use it, ensure the plugin com.snc.nlu.workbench.advanced is activated in your instance.

Creating and Using Test Sets

Test sets consist of utterances paired with their expected intents, uploaded as CSV or XLSX files.
Each test set can contain up to 10,000 utterances.
Utterances should be representative of user input the model will encounter and match the language of the tested model.
Include utterances with no expected intent to evaluate the model’s ability to detect irrelevant input without assigning incorrect intents.
A recommended practice is for test sets to cover at least 60% of the model’s intents to yield meaningful results.
Utterances with expected intents not found in the model are skipped during testing.

Running Tests and Viewing Results

Once test sets are uploaded, you can run batch tests against multiple trained NLU models simultaneously.
The Test Results page provides an overview of all tests, showing the models tested, number of utterances, and prediction accuracy percentages.
Detailed results for each test include summary graphics and breakdowns of correct, missed, and incorrect intent predictions.
The “Intents that need attention” section highlights the top 5 problematic intents, allowing you to drill down into specific utterances to identify areas for model improvement.
A detailed tab lists every utterance tested, with predicted intents and confidence scores per model. Filters and search tools help analyze results efficiently.
Test results can be exported as CSV files, preserving detailed information for offline analysis or reporting.

Practical Benefits

By leveraging Multi-model Batch Testing, ServiceNow customers can systematically validate and compare NLU models against real-world utterances. This capability helps ensure models accurately interpret user intents and identify irrelevant inputs, facilitating continuous model refinement and better conversational experiences.

Test multiple Natural Language Understanding (NLU) models against a large set of utterances to evaluate the performance of the models. Add test sets, test multiple models, and see test results.

Summary usage

Use Multi-model Batch Testing to create and upload test sets comprised of utterances and their expected intents. You can then run tests against your NLU models.

Multi-model Batch Testing works with models for all supported NLU languages. See NLU language support.

Installation

Multi-model Batch Testing is part of the NLU Workbench - Advanced Features app available on the ServiceNow® Store.

To use Multi-model Batch Testing, ensure that the NLU Workbench - Advanced Features (com.snc.nlu.workbench.advanced) plugin is active on your instance. For more information, see Install NLU Workbench - Advanced Features and Activate the NLU Workbench.

Test sets

Test sets are lists of utterances and matched intents. Create a test set by using a table in a CSV or XLSX (Excel workbook) file. The table should contain two columns: one for utterances, and one for the expected intent. Your test set can include up to 10,000 rows.

To get the most out of testing your NLU models, your test sets should include utterances that the model is likely to encounter from your users. Test utterances should be in the same language as the model to be tested. The test set should also include utterances with no expected intents. Including utterances with no expected intent helps assess your model's ability to detect utterances which are irrelevant and shouldn't have any intent predicted.

By including these types of utterances, the test better assesses the model's ability to perceive intents and respond to your users. If your test set does not cover at least 60% of the intents of the models, you can still run the test but the recommended threshold may not be optimal.

Note:

Certain test utterances are skipped during the test if their expected intent does not match any intents in the models.

To create a test set, see Create a test set.

After you have a test set, you can test trained NLU models. To begin testing, see Run a multi-model batch test.

After running a test, your results appear on the Test results page.

Test results

The Test results page lists your completed and in-progress tests. At a glance, the results page shows the models tested against, the number of utterances, and prediction percentages.

Multi-model Batch Testing page with completed tests.

To see the details of a test result, click the name of the test set.

The Overview page shows summary information about the results and includes a graphic with a breakdown of predictions.

The Intents that need attention (Current model) shows the top 5 missed and incorrect intents. Click the intent name to drill down into the test utterances that were predicted incorrectly. Use this information to improve the model.

The Detailed results tab lists information about each utterance that was tested. From here, you can see the prediction outcome and confidence per model for each utterance. Filter the results by using the search bar or interacting with the filter tools and column headers.

You can also export the test results to a CSV file by clicking Export. The file includes the same columns as the detailed results page.

For more information on understanding your test results, see Test and publish your model.