Working with reliability metrics

  • Release version: Yokohama
  • Updated January 30, 2025
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Working with reliability metrics

    This content explains how ServiceNow's Service Reliability Management (SRM) features help you monitor and improve service health through reliability metrics. It covers key tools such as the Service reliability dashboard, notification destinations, and the Reliability metrics tab, enabling you to track service performance, respond to incidents, and align with business goals.

    Show full answer Show less

    Service reliability dashboard

    The Service reliability dashboard offers a customizable, high-level visualization of your service performance across all services managed in SRM. It displays key indicators like service states, error budgets, and service level objectives (SLOs) over time. You can access it via the Service Operations Workspace through the Services or Home menus. This dashboard helps you proactively monitor reliability trends and identify issues before they impact users.

    Notification destinations

    Notification destinations enable automated alerts to teams when error budget policies are breached. By attaching notifications to specific error budget policies, you ensure timely communication about service reliability issues. These destinations are managed within your team settings in Service Operations Workspace, helping streamline incident response and maintain service reliability commitments.

    Reliability metrics tab

    The Reliability metrics tab provides detailed insights into how a specific service meets its reliability goals by tracking SLOs, service level indicators (SLIs), and error budgets. You can access this tab for individual services within Service Operations Workspace. It allows you to create, edit, and view reliability metrics to maintain and improve service performance.

    Service level objectives table

    Within the Reliability metrics tab, the Service level objectives table summarizes critical data for each service’s SLO, including:

    • SLO Name: The target your service aims to meet as part of your SLA.
    • SLI Type: Categories like availability (uptime), errors, latency, and saturation (resource usage).
    • Compliance Period: Time range for measuring performance, such as monthly or rolling days.
    • State: Current status of the SLO (draft, running, retired).
    • Objective (percentage): The performance target to achieve.
    • Limit occurrences: Number of allowed breaches (for count-based SLOs).
    • Error budget: Permitted failure time within the compliance period.
    • Remaining error budget and breach occurrences: How much allowable failure time and breaches remain.

    Note that SLO and SLI records are archived after one year and deleted after five years, so older data is excluded from current tables and visualizations to maintain performance.

    Learn about the reliability metrics and features that can help you track service health, respond to issues, and support business goals.

    Service reliability dashboard

    The Service reliability dashboard displays a customizable, high-level view of service performance. It helps you monitor and manage reliability using visualizations that track service states, error budgets, and service level objectives (SLOs) over time.

    The dashboard displays information about all services in Service Reliability Management (SRM). You can access the dashboard in Service Operations Workspace in the following ways:

    • Navigate to Services (Services icon) > Service reliability.
    • Navigate to Home (Home icon) > Service reliability
    For more details, see Visualizations in the Service reliability dashboard.
    Note:
    You can also view SLO information for all services on the Services Overview tab. See Working with SRM services for more information.

    Notification destinations

    Notification destinations help keep teams informed about service reliability. Attach them to error budget policies to send notifications when a policy is breached.

    To view and manage notification destinations in Service Operations Workspace, navigate to Teams > [Your team] > SLO Notification destinations.

    Visit the following links to learn more about creating and working with notification destinations:

    Reliability metrics tab

    The Reliability metrics tab shows how well a specific service is meeting its reliability goals. Use it to track SLOs, service level indicators (SLIs), and error budgets for a service.

    To view the Reliability metrics tab in Service Operations Workspace, navigate to Services (Services icon) > [Your service] > Reliability Metrics.

    Figure 1. SRM Reliability metrics tab
    The Reliability metrics tab shows a list of SLOs for the User Authentication service.

    See these links to learn more about what you can do in the Reliability metrics tab:

    Service level objectives table

    On the Reliability metrics tab, the Service level objectives table includes the following details about the selected service:

    • Service level objective: Name of the SLO. The SLO is a target value or the objective that your team must reach to meet your service level agreement (SLA).
    • SLI type: Performance category being measured:
      • Availability: Percentage of time your service or configuration item is available, also known as uptime.
      • Errors: Frequency of your service errors.
      • Latency: Time that it takes to service a request.
      • Saturation: Fullness of your system, focusing on resource usage.
    • Compliance period: Time window used to calculate performance:
      • Month: Current month, for example, if the current date is January 26, the month is January 1 through January 31.
      • Rolling 7, 30, or 90 days: Number of days from the current date. For example, for rolling 7 days, the duration is 7 days back from the current date.
    • State: Status of the SLO, such as draft, running, or retired.
    • Objective (percentage): Target percentage of SLI performance.
    • Limit occurrences: Number of limit breaches that have occurred. Used by count-based SLOs only.
    • Service level indicator: SLI associated with the SLO.
    • Error budget: Allowable failure time for the compliance period, calculated using the compliance period and objective (percentage).
    • Remaining error budget: Error budget still available.
    • Remaining breach occurrences: Number of breaches still available before the limit is reached.
    Note:
    For performance purposes, SLO and SLI records ([sn_sow_srm_slo_history] and [sn_sow_srm_sli_metric]) are archived after one year and deleted five years later. Archived data is omitted from tables and visualizations.