Working with SRM services

  • Release version: Washingtondc
  • Updated February 1, 2024
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Working with SRM services

    Service Reliability Management (SRM) enables teams to manage services that deliver functional outcomes, such as networking or HR services. Each service may consist of various technical components and relies on integrations to effectively route alerts to the appropriate responders, ensuring timely acknowledgment and follow-ups on alerts.

    Show full answer Show less

    Key Features

    • Service Integration: Add integrations via the Services module to monitor technical services and receive events.
    • Reliability Metrics: Create metrics that help track the reliability and performance of each service.
    • Service Management: Services are categorized by various metrics, including active incidents, critical alerts, open changes, and error budget status.
    • Error Budget: Represents the amount of Service Level Objective (SLO) that can be utilized over time, aiding in release management.
    • Customizable Views: The services landing page allows filtering, grouping, and sorting of service lists for better management visibility.

    Key Outcomes

    By utilizing SRM, customers can effectively manage their services, automate response routines, and gain insights into service performance through detailed metrics. This structured approach enables teams to respond promptly to incidents, prioritize critical services, and maintain operational reliability.

    A service represents a functional outcome like networking, payments, or HR services, that is owned by a team. To deliver that outcome, a service can contain one or more technical components like a user authentication service, or a piece of shared infrastructure like a database.

    SRM works with integrations to prioritize and route alerts to the right responders and follow up with escalation until the alert is acknowledged and you know someone is responding. When you create or add a service in SRM, it must reflect a service in your SRM infrastructure.
    Note:

    You might want multiple tool integrations to monitor each technical service and receive events from those tools. Add an integration to SRM using the Services Services icon. module. See Working with SRM integrations.

    In addition, you can create reliability metrics for the service. See Working with Reliability metrics

    Tying a team and policies to that service makes it easier to divide responsibilities and track technical outcomes. It also makes it easier to automate response routines and focus on who you notify and when.

    The state of a exiting service is inherited. The state of a created service in SRM is None.

    Services

    Figure 1. Services landing page
    Services landing page.
    The services cards display metrics for:
    • Your Services: Count of all the services you or your team manages and monitors for reliability.
    • Services with active incidents: Services with one or more open incidents, sorted first by business criticality, most critical at the top; then sorted by number of active incidents, highest number at the top; and finally sorted by % of error budget remaining, lowest at the top.
    • Services with critical alerts: Services with open alerts, sorted first by business criticality, most critical at the top; then sorted by number of alerts, highest number at the top; and finally sorted by % of error budget remaining, lowest at the top.
    • Services with open changes: All the services your team manages and monitors reliability for.
    • Services with low error budget: Services with error budget remaining < 25%

      The error budget metric is represented as the amount of SLO that you can spend over a specified time. It can be used to manage release velocity.

    Note:
    To refresh the card values, as well as the lists they represent, use the browser Refresh Refresh icon. button.
    The list view varies depending on the Services card selected.
    Note:
    Your services is the default selection for the landing page.

    Each column in the list can be grouped or filtered.

    Each list can be edited, sorted or exported.

    For more detailed information on individual services see View an SRM service.

    Services list view metric definitions

    Service level objectives section
    • Service: Name of the service.
    • Class: Application or Technical service.
    • Business criticality: How important this service is the business.
      Choices are:
      • 1 - most critical (default)
      • 2 - somewhat critical
      • 3 - less critical
      • 4 - not critical
    • Open alerts: Number of open alerts assigned to the service.
    • Open incidents: Number of open incidents assigned to the service.
    • Error budget remaining: Percentage of error budget remaining for the service.