Evaluating assistant conversations

Australia Conversational Interfaces

Release

australia

ft:locale

pt-BR

ft:publication_title

Australia Conversational Interfaces

ft:clusterId

convint

bundleId

convint

workflow

Platform

Evaluating assistant conversations

Versão de lançamento: Australia

Atualizado 12 de mar. de 2026

1 min. de leitura

The Evaluations tab in Assistant Designer helps you assess the quality and performance of your conversational assistant interactions. You can run evaluations on existing conversations or queries to measure response quality, efficiency, and correctness.

Overview of the Evaluations tab

The Evaluations tab provides evaluation capabilities across multiple conversation types including virtual agent topics, conversational catalog flows, AI agents, knowledge responses, and small talk interactions. The feature analyzes conversations to help you identify areas for improvement in your assistant configuration.

When you run an evaluation, the conversation transcripts are examined and selected metrics are applied to generate scores and insights. You can evaluate conversations that have already occurred or use AutoChat to generate conversation logs from queries.

Evaluations tab.

Test one conversation at a time (manual test): Test conversations individually by chatting with your assistant. See how the assistant responds and review what it does. You can use to validate whether an assistant is configured correctly and is working as intended.

Evaluate many conversations at once (automated evaluation): Evaluate your assistant using auto-generated chats, explore multiple scenarios, and view detailed performance insights. Provides a diagnostic view of performance across an entire ecosystem.

You can select an assistant and test it manually, or select Set up automated evaluation/Create evaluation to create an automated evaluation. To set up automated evaluations, see Set up an automated evaluation.

Nota:

Before running your evaluation, you need to upload a table that includes user requests, scenario context, and the correct ground truth responses. Evaluation of approximately 500 conversations per run is supported.