Site Reliability Metrics

  • Freigeben Version: Australia
  • Aktualisiert 12. März 2026
  • 1 Minute Lesedauer
  • Site Reliability Metrics (SRM) is an application that extends Site Reliability Operations (SRO). It serves as a signal aggregation point for Application Performance Management (APM) alerts.

    Starting with the Washington DC release, Site Reliability Metrics is being prepared for future deprecation. It will be hidden and no longer installed on new instances but will continue to be supported. For details, see the Deprecation Process [KB0867184] article in the Now Support Knowledge Base.

    SRM enables Site Reliability Engineers (SRE)s to capture signals from multiple sources, set Service Level Objective (SLO) targets, view Error Budgets (EB) and invoke policy-based actions such as creating an incident or sending a notification based on Error Budget thresholds. SREs can measure the service experience and manage release velocity by evaluating key Service Level Indicators (SLI)s sourced from one or more performance management tools. The evaluation and aggregation of these signals enables SREs to trigger policy-based actions and respond quickly to changing conditions.

    Site reliability engineers and service owners can use SRM to ensure that the service they provide is meeting consumer expectations. They can measure quality by setting service level objectives based on SLI types (For example, latency, throughput, availability) and then use Error Budget Policies to trigger one or more policy-based actions.

    The key features of the SRM application are:
    • SLI signal aggregation
    • Create duration and count based service level objectives
    • Calculate error budgets (EB)
    • Error budget policies
    • Error budget visualization

    High-level workflow

    1. SRM leverages SRO integrations for signal aggregation.
    2. Reliability indicators containing SLIs and SLOs are created for the service in SRM.
    3. When a qualified alert is generated for a service in the APM tool, the cumulative breach and the error budget values are updated for the reliability indicators in SRM.
    4. An error budget policy is created for the service to trigger actions such as creating an incident, sending notifications, and so on, to remediate service issues.

    Watch the following video for a general understanding of Site Reliability Metrics and how you can use it.