Site Reliability Metrics basic terminologies

  • Versão de lançamento: Australia
  • Atualizado 12 de mar. de 2026
  • 1 min. de leitura
  • A quick guide to some of the basic terminology to understand Site Reliability Metrics (SRM) and use its features.

    Tabela 1. Terms used in SRM
    Terms Descriptions
    Application Performance Management (APM) The monitoring and management of performance and availability of applications. APM strives to detect and diagnose complex application performance problems and maintain an expected level of service.
    Service Level Management (SLM) A framework by which service levels are agreed to between a provider and a consumer to support business processes. Service Level Management includes Service Level Agreement (SLA), Operating Level Agreement (OLA), and Underpinning Contract (UC).
    Service Level Agreements (SLA) An SLA defines the level of service agreed to between a provider and a consumer. It typically lays out the metrics by which service is measured, as well as remedies or penalties if the agreed-upon service levels are not achieved.
    Service Level Objective (SLO) A target value or range of values for a service level that is measured by an SLI.
    Service Level Indicator (SLI) A quantitative measure of some aspect of the level of service that is provided. Metrics are used to define SLO targets.
    Measured reliability The ability to deliver the promised services in a consistent and accurate manner. Reliability is calculated automatically by subtracting outages from 100%.
    Error budget

    An error budget is the amount of SLO that you can spend over a specified time. It can be used to manage release velocity. It is typically based on availability, latency, and so on.

    Error budget policy A policy or rule that is created for a service to trigger actions such as creating an incident, or sending notifications when a set threshold is crossed.