About events and alerts in ITOM AIOps
Learn the difference between events and alerts, and alert types and statuses.
Whenever an issue occurs on your network, such as a computer going down or a database failure, the event monitoring tools send events to your ServiceNow instance. The Event Management application then processes the events and generates alerts to indicate that an action must be taken to resolve the issue.
Events and alerts
An event is a change or occurrence in the normal operations of a service, API, system network, process, or workflow. Events can be triggered by manual input, such as pressing a button, or they can be generated automatically. They can be pulled (observed) or pushed (logged) to a system for tracking. Use event data to track system changes over time and trigger logic to escalate an event to an alert if a specified performance threshold is breached. Event Management generates alerts based on event rules.
An alert is something that needs your attention but doesn't necessarily require an immediate response. Events that meet or exceed a defined condition or threshold to indicate they require immediate attention or action trigger alerts.
As events occur Event Management generates alerts, applies alert management rules, and prioritizes alerts for remediation and root cause analysis.Alert states
- Open - The first stage in the processing of an alert.
- Closed - Closing an alert also closes any related incident that is not already resolved or closed.
- Reopen - When events are generated that are related to an existing closed alert, the alert is reopened. Alerts can also be reopened manually.
- Flapping - Flapping is a state when multiple open-close events are generated in rapid succession for an associated closed alert.
Alert priority and severity
- For better triage and focus, alerts that have a higher priority are brought to the top of the alert list. This placement brings to your attention those alerts that require you to handle them at a higher priority than other alerts. The priority of an alert helps you determine how important the impact is to application services. Your Event Management administrator configures the algorithm used to calculate priority.
- The Severity
of an alert indicates how serious the underlying issue is. The following table lists the
default severities.
Severity Description Critical
The resource is either not functional or critical problems are imminent. Major
Major functionality is severely impaired or performance has degraded. Minor
Partial non-critical loss of functionality or performance degradation occurred. Warning
Attention is required even though the resource is still functional. OK
An informational message that an alert is created. The resource is still functional. No action required. Clear
The alert no longer needs action.
You can view the severities on the Service Operations Workspace dashboard.
Alert types and correlated alerts
Alerts are often related, or correlated, to each other. For example, if a router goes down, several separate alerts could be generated, one for each server connected to the router. Event Management can group alerts automatically based on specified correlation rules, with one primary root alert at the top and secondary alerts under the primary alert. You can choose to suppress secondary alerts to reduce the amount of alert noise and focus on the primary alert. You can also group alerts manually. In addition, you can manually attach additional alerts to a primary alert.
In the following example, if a router goes down on your network, network communication is also affected for connected servers, assuming they can’t reach any other routers. The router outage becomes the primary alert and the alerts generated on the server are secondary alerts correlated under the router alert.
To learn more about the correlation types, see Alert grouping types and creation methods.
Alert flapping
An alert can flap, meaning that it gets multiple open-close events in rapid succession. Flapping indicates that Event Management can’t detect whether the underlying events are genuine. These events could indicate small issues with the way CIs are configured, or larger issues, like network outages.
For example, a server that hosts a web service that has too many active processes might trigger an event about excessive CPU usage. If several events based on the CPU usage are triggered, Event Management may put the alert in the flapping state. This depends on the thresholds set by the administrator.
As another example, consider a loose network cable that causes momentary, repeated network outages. In this case, if there was no set threshold for this type of alert set by the administrator, then Event Management won't considers it as a flapping alert.