Enabling evaluations
Summarize
Summary of Enabling evaluations
This guide explains how ServiceNow administrators can enable and configure continuous evaluation of virtual agent conversations using the Conversation Evaluator feature in the Yokohama release. Enabling evaluations helps assess virtual agent performance across critical dimensions to improve task success and user trust. It complements the Conversation Insights application, which focuses on customer satisfaction and effort metrics.
Show less
Enabling Evaluations
- Activate Skills: Navigate to Admin > Now Assist Admin > Now Assist Skills > Platform and enable key evaluation skills including Intent Accuracy, Slot Filling, Smoothness, Context Retention, Truthfulness, Conciseness, and Chat Topic Classifier.
- Set Evaluation Limits: Configure the system property snnaconveval.maxEvaluateCount to define the maximum number of conversations evaluated daily. This is done in All > System Properties > All Properties.
- Activate Scheduled Jobs: Enable all scheduled jobs related to Conversation Evaluator except the one that runs only once after installation. These are found under All > System Definition > Scheduled Jobs filtered by Application = Conversation Evaluator.
- Activate Evaluation Flows: By default, the Execute Evaluation flow is deactivated. You can activate it in Flow Designer for real-time evaluation upon chat completion, or use the scheduled nightly job for batch evaluation to avoid conflicts with other LLM calls.
- Batch Evaluations: To evaluate historical data, activate the Execute Batch Evaluation flow. This supports importing and processing past conversations.
- Domain Separation: Note that domain separation is not supported for evaluation features.
Evaluation Dashboard vs. Conversation Insights
The Evaluation dashboard provides detailed diagnostic metrics focused on virtual agent system performance and task success, such as:
- Intent Recognition accuracy
- Slot Filling precision
- Conciseness and Coherence of responses
- Truthfulness and Context Retention
- Deadlock avoidance in dialog flows
- User satisfaction inferred from critical failures
The Conversation Insights application measures end-user perceptions and overall conversation quality, including:
- Inferred customer satisfaction (CSAT)
- User effort level
- Issue resolution status
- Signs of frustration or confusion
- Transfers and escalations
- Empathy and clarity of next steps
Together, these tools provide complementary perspectives: the Evaluation dashboard offers granular, task-focused diagnostics for improving virtual agent design, while Conversation Insights delivers a broad, cost-free view of user experience. Both sets of metrics are consolidated in AI Agent Analytics and AI Control Tower dashboards for comprehensive monitoring of virtual agent health and user satisfaction.
Evaluate random conversations by enabling continuous monitoring.
Role required: admin
Enable evaluations and set the number of evaluations to be performed daily.
Enable evaluations
- Activate Skills:
- Navigate to .
- Turn on the following skills:
- Intent Accuracy Chat Eval
- Inadequate Slot Filling Chat Eval
- Smoothness (Deadlock avoidance)
- Context Retention
- Coherence Chat Evaluation
- Truthfulness Hallucination Chat Eval
- Conciseness Chat Eval
- Chat topic classifier
Note:You can get the filtered list using the filter condition Conversation Evaluator under Features.
- Set the value for the system property sn_na_conv_eval.maxEvaluateCount.
- Navigate to .
- Search for and select the property sn_na_conv_eval.maxEvaluateCount.
- Update the Value field to set the maximum number of conversations to be evaluated daily.
- Select Save.
- Activate the following associated scheduled jobs:
- Navigate to .
- Apply the filter condition Application is Conversation Evaluator and filter out the job Evaluation Value Calcuation - Runs Only once after install.
- Activate all the scheduled jobs in the filtered list.
- Activate the Execute Evaluation flow in Flow Designer.Note:
By default, the Execute Evaluation flow is deactivated. You can use the nightly scheduled job, Execute Evaluations, to evaluate the chats. The nightly job won't dominate over LLM calls from other services, whereas the real-time evaluation through the Execute Evaluation flow might conflict with LLM calls from other applications.
If you want to evaluate the chats real-time on chat completion, activate the Execute Evaluation flow. Domain separation is not supported.
- Navigate to and select Flows.
- Select the Execute Evaluation flow.
- Select Edit flow.
- Select Activate.
- If you want to configure some of the evaluation parameters based on your requirements, see Configuring evaluations.
- If you want to import historical data to be evaluated, you must run batch evaluations by activating the Execute Batch Evaluation flow. For more information on the batch evaluation workflow, see Evaluation flow for batch evaluations.
Evaluation dashboard vs. Conversation Insights
You can use the Evaluation dashboard and the Conversation Insights (CI) application together to gain a complete picture of virtual agent effectiveness, from system performance to end-user satisfaction.
For more information about Conversation Insights, see Conversation Insights.
| Metrics captured by the Evaluation dashboard | Metrics captured by Conversation Insights |
|---|---|
|
The Evaluation dashboard provides granular diagnostic explanations that help improve virtual agent design, dialog flows, and model accuracy. It evaluates performance along dimensions critical to task success and trustworthiness. For example, "Is the system working properly and performing the expected task?"
|
Conversation Insights focuses on measuring customer satisfaction and effort. It uses inferred customer satisfaction (CSAT) and supporting signals to show how end users perceive their interaction with the virtual agent. For example "Is the end user happy with the virtual agent's performance?"
|
- Conversation Insights offers a lightweight, cost-free view of the customer experience across all conversations.
- The Evaluation dashboard delivers granular, task-focused diagnostics that enable targeted improvements to virtual agent design and performance.
- Consolidated in AI Agent Analytics and AI Control Tower dashboards, these metrics give users complementary views into virtual agent system health and end-user satisfaction.