Exploring Service Observability

Release version: Xanadu

Updated January 30, 2025

3 minutes to read

Summarize

Summarized using AI

Summary of Exploring Service Observability

Service Observability is designed to help operations teams effectively triage and manage incidents in complex, distributed production environments. It integrates telemetry data from external Application Performance Monitoring (APM) tools with Configuration Management Database (CMDB) information, presenting both within the Service Operations Workspace (SOW) for streamlined incident management and service health monitoring.

Show full answer Show less

Service Observability supports APM vendors such as Datadog, Dynatrace, and New Relic, and works with databases including MySQL and PostgreSQL. By linking APM metrics to CMDB service configuration items (CIs) through tagging, it enables a unified view of service health and related infrastructure metrics.

Key Features

APM Integration and Data Mapping: Connect existing APM instances and map their metrics to CMDB services using tags to correlate monitoring data with service components.
Unified Service Health View: Display combined health metrics, incidents, alerts, and related changes within the SOW, allowing operators to quickly assess service status.
Role-Based Access: System admins and Service Observability admins configure users, teams, APM connections, data mappings, and dashboard templates. Operators and managers use the platform to triage incidents and investigate issues.
Customizable Dashboards: Admins can tailor dashboard templates to optimize how metrics and related information are displayed for better visibility and analysis.
Observability Workflow: Admins register services, connect APM data, and map services to metrics. Operators leverage these insights to identify root causes and ownership of issues by drilling down from overall service health to related entities.

Key Outcomes

Increased Agility and Reliability: Centralizing critical signals from APM and CMDB data enhances incident response and operational agility.
Improved Efficiency and Reduced MTTR: Viewing combined metrics related to a service enables faster identification of the blast radius and responsible owners, accelerating resolution times.
Comprehensive Context for Operators: Operators gain access to real-time service health, associated incidents, alerts, and changes, all in one place, supporting more informed troubleshooting.
Enhanced Customization: Admins can customize dashboards to better align with organizational needs and improve monitoring effectiveness.

Service Observability helps operations teams triage and manage incidents in a complex and distributed production system. It combines telemetry from external application performance monitoring (APM) systems with related data in the Configuration Management Database (CMDB). It displays both in a single workflow in the Service Operations Workspace (SOW).

Service Observability overview

Service Observability displays health metrics in the SOW related to a given service. Metrics can be ingested from an external APM system and displayed alongside information for related configuration items in the CMDB.

Note:

Service Observability supports the following APM vendors:

Datadog
Dynatrace
New Relic

Service Observability supports the following databases:

MySQL
PostgreSQL

After you have connected an APM instance to Service Observability, you map services in the CMDB to the APM metrics using existing tags on the APM data.

With this data mapping, Service Observability can display metrics in one place from APM entities associated with your service, like a host or database, along with information from related CIs. Operators use the APM metrics, related CI information, and helpful contextual information, like current incidents and alerts related to the service, to understand the health of the service.

For example, say you use Dynatrace to monitor your checkout service, and metrics from your database and host use the tag checkout-service to denote requests coming from that service. By mapping the checkout service CI to the APM data tagged with checkout-service, Service Observability retrieves metrics for those databases and hosts and CIs related to the service, then displays them together. Operators can pinpoint issues on entities related to the service and narrow down the mitigation process without having to leave the SOW.

Service Observability users

Table 1. Users
User	Description
System admin	Version 1.5 only. System admins configure users and teams, register services to be monitored, connect Service Observability to APMs, and then map those services to that data. They can also view the data in the SOW
Service Observability admin	Version 1.6.x and later. Service Observability admins can configure users and teams, connect Service Observability to APMs, and then map services to that data. They can also view the data in the SOW. Admins can also customize dashboard templates used to display metrics and related information.
Operator/operations manager	Operators use Service Observability when triaging incidents in the SOW. They can view basic health metrics for a service, along with related incidents, alerts, and changes. They can get more detailed information by navigating to the Observability tab to view additional service metrics, along with metrics from related entities, such as a host or database.

Service Observability workflow

Admins configure Service Observability by registering services, connecting APM metrics, and then mapping the services to that data. Operators use Service Observability to determine if another related entity is causing issues surfaced by the service's performance.

As an admin, you:

Determine the services to be monitored by Service Observability based on business criticality.
Connect existing APM instances to Service Observability.
Map services with APM metric data based on APM-based tags used on that data.
Customize the templates used to display metric charts.

As an operator or manager, you:

Spot an issue with a service while working in the SOW, for example, from an alert, the Service dashboard, or Express List, then navigate to the Service Details page.
View overall health metrics for the service, along with related incidents, alerts, and changes. If one of the metrics seems unhealthy, navigate to the Observability tab.
View more detailed service metrics, as well as information from related entities, to start root cause investigation. When finding that the issue is further down the system's stack, identify the ownership for that entity to start remediation.

Service Observability benefits


Benefit	Feature	Users
Centralize critical signals and bridge workflows to increase agility and reliability: Connect data from external APMs Map that data to CMDB services View combined data in the SOW	Connect an observability data source Create and manage observability data mappings .	Admins
Increase efficiency and reduce mean time to resolution (MTTR) by viewing combined metrics from entities associated with a service. You can begin to determine blast radius and ownership of an incident.	View service health metrics	Operators
See related changes to the system and alerts associated with a service in one place.	View overall service health.	Operators
Customize dashboard templates.	Customize Service Observability dashboard templates	Admins

What to explore next

To learn more about configuring and using Service Observability, see: