Automated alert grouping

  • Release version: Yokohama
  • Updated January 30, 2025
  • 2 minutes to read
  • Summarize
    Summarized using AI
    This content was generated using new OpenAI-powered functionality. Results are provided on an as is basis and are not guaranteed to be accurate or complete.

    Summary of Automated alert grouping

    Automated alert grouping in ServiceNow leverages machine learning and historical data to automatically organize similar alerts into groups. This process helps teams identify patterns, manage recurring problems, and reduce alert noise by consolidating related alerts—such as server errors or network outages—into meaningful clusters. These grouped alerts are displayed in the Express List within the Service Operations Workspace, enabling faster and more efficient incident resolution.

    Show full answer Show less

    Enabling Automated Alert Grouping

    To activate machine learning-based alert correlation, set the system property Enable ML based Automation correlation (saanalytics.specificpatternsenabled) to true. If domain separation is enabled via the Domain Support - Domain Extensions Installer, alert aggregation patterns are created according to the domain level defined by the saanalytics.agg.learnerdomainlevel property, which defaults to level two. This typically corresponds to departmental or team-level groupings within an organization, allowing alerts to be grouped contextually within defined domains.

    How It Works

    • Historical Data Analysis: The system examines past alerts to learn patterns and relationships among them.
    • Machine Learning Application: ML algorithms identify recurring patterns based on alert characteristics such as issue type, affected system, configuration item (CI), or metrics occurring in similar timeframes.
    • Alert Grouping: Alerts sharing these patterns are automatically clustered, helping users view related incidents collectively rather than individually.

    For example, multiple alerts about accidents, traffic jams, and road closures on the same street within a short time are recognized as related events and grouped together. This consolidated view helps focus on the root cause rather than addressing each alert separately.

    Benefits

    • Identify Recurring Issues: Quickly detect patterns such as recurring server problems or system faults.
    • Save Time: Manage and respond to groups of related alerts instead of handling them individually.
    • Improve Response Efficiency: Prioritize resolution of root causes rather than fragmented symptoms, reducing operational noise.

    Automated alert grouping is a process that uses historical data to automatically organize similar alerts into groups. These alerts could be system issues, like server errors or network outages. By grouping related alerts together, it helps teams quickly identify patterns, manage recurring problems, and reduce the noise from too many individual alerts.

    Imagine you’re monitoring a city’s traffic system. You get a lot of alerts—like reports of accidents, traffic jams, and road closures. Automated alert grouping works like a smart assistant that organizes these alerts based on patterns, so you can see related issues together and respond more efficiently. These automated alert groups are displayed in the Express List within the Service Operations Workspace.

    How do you enable this grouping

    To enable machine learning-based automation for alert correlation, set the property Enable ML based Automation correlation (sa_analytics.specific_patterns_enabled) to true.

    If the Domain Support - Domain Extensions Installer is activated, alert aggregation patterns are created based on the domain level defined in the sa_analytics.agg.learner_domain_level property. By default, this domain level is set to two, which corresponds to the second level in the domain hierarchy. For example, in a company, Level 1 might represent the company itself, while Level 2 could represent departments or teams within the company. Alerts are grouped based on this second level, like sorting them by department or team. For more details, Domain separation and Event Management.

    How does it work

    Automated alert grouping uses machine learning (ML) and historical data to identify patterns among alerts. It looks at specific characteristics, called pattern identifiers—such as the type of issue, the affected system, CI or metric that happened multiple times within a similar time period—to determine if alerts are related. The Alert Aggregation Learner uses algorithms to group alerts based on patterns. Specifically, it uses pattern-based algorithms and probabilistic methods to analyze incoming alerts and identify related ones.

    Think of it like noticing that accidents often happen at a particular intersection at rush hour. The system groups similar alerts (like recurring traffic jams at the same spot) together based on certain identifiers (like location or type of problem). The system follows these key steps to group alerts effectively:
    • Analyze Historical Data: The system studies past alerts to learn patterns and relationships.
    • Apply Machine Learning: ML algorithms analyze historical alert data to identify patterns and relationships among alerts. It enables the system to learn from past incidents and improve its ability to group similar alerts together over time.
    • Group Similar Alerts: Alerts with matching patterns are automatically grouped together.
    Imagine you're managing a city's traffic system, and you receive multiple alerts:
    • 8:00 AM: Accident on Main Street
    • 8:05 AM: Traffic jam near Main Street
    • 8:10 AM: Road closure on Main Street
    Automated alert grouping works like a smart assistant by analyzing these alerts and recognizing a pattern. It groups them together because they all relate to Main Street, likely stemming from the same accident. This helps you see the bigger picture quickly and focus on resolving the root cause (the accident), rather than addressing each alert separately.

    Benefits

    • Find Recurring Issues: Quickly spot patterns (like a server consistently overheating).
    • Save Time: Handle groups of related alerts instead of individual ones.
    • Improve Response: Focus on fixing the root cause rather than dealing with scattered issues.