Site Reliability Metrics basic terminologies
A quick guide to some of the basic terminology to understand Site Reliability Metrics (SRM) and use its features.
| Terms | Descriptions |
|---|---|
| Application Performance Management (APM) | The monitoring and management of performance and availability of applications. APM strives to detect and diagnose complex application performance problems and maintain an expected level of service. |
| Service Level Management (SLM) | A framework by which service levels are agreed to between a provider and a consumer to support business processes. Service Level Management includes Service Level Agreement (SLA), Operating Level Agreement (OLA), and Underpinning Contract (UC). |
| Service Level Agreements (SLA) | An SLA defines the level of service agreed to between a provider and a consumer. It typically lays out the metrics by which service is measured, as well as remedies or penalties if the agreed-upon service levels are not achieved. |
| Service Level Objective (SLO) | A target value or range of values for a service level that is measured by an SLI. |
| Service Level Indicator (SLI) | A quantitative measure of some aspect of the level of service that is provided. Metrics are used to define SLO targets. |
| Measured reliability | The ability to deliver the promised services in a consistent and accurate manner. Reliability is calculated automatically by subtracting outages from 100%. |
| Error budget | An error budget is the amount of SLO that you can spend over a specified time. It can be used to manage release velocity. It is typically based on availability, latency, and so on. |
| Error budget policy | A policy or rule that is created for a service to trigger actions such as creating an incident, or sending notifications when a set threshold is crossed. |