Anomaly detection algorithm

Xanadu Impact

Release

xanadu

ft:locale

en-US

ft:publication_title

Xanadu Impact

ft:clusterId

ipact

bundleId

ipact

Anomaly detection algorithm

Release version: Xanadu

Updated August 1, 2024

2 minutes to read

Summarize

Summarized using AI

Summary of Anomaly Detection Algorithm

The Anomaly Detection Algorithm, specifically the Instance Observer, utilizes the Z-score statistical model and an upper threshold-based methodology to identify anomalies across five key metrics: Memory Max, Semaphore Mean, SQL Response Time, Server Response Time, and Transaction Count. This algorithm has been validated through various sampling instances, allowing for effective detection of anomalies over daily, weekly, and monthly data intervals.

Show full answer Show less

Key Features

Z-score Methodology: This method calculates how far a value deviates from the mean in terms of standard deviations. A Z-score of 0 indicates the value is equal to the mean. It takes into account a moving average of the last 15 minutes, the population mean from the previous four weeks, and the standard deviation.
Upper Threshold-Based Methodology: This approach identifies anomalies when metrics approach predefined exhaustion limits. For instance, a semaphore mean exceeding set limits or memory max nearing its capacity triggers an alert only when limits are close.
Cyclicity Consideration: The algorithm accounts for cyclical patterns in the data—such as daily or seasonal trends—by calculating a cyclical score that assesses the similarity between two data series over a defined period.

Key Outcomes

By implementing this anomaly detection algorithm, customers can expect more accurate identification of anomalies in their system performance metrics. The use of Z-score and cyclical analysis enhances the reliability of insights, ensuring that true outliers are detected while considering natural data patterns. This ultimately aids in better resource management and system performance optimization.

Instance Observer is performing anomalies detection through the Z-score Statistical model, otherwise referred to as a univariate method.

Anomaly detection analyzes a set of five metrics, Memory Max, Semaphore Mean, SQL response time, Server Response Time and Transaction count. The detection model has been validated with samplings with multiple instances of daily, weekly, and monthly level data.

Metrics representing anomalies using the Z-score model are Transaction count, Server Response Time & SQL Response time. Metrics representing anomalies using an upper threshold-based approach are Semaphore Mean, Node max Memory, and Job execution. Refer to Getting started with Performance charts for details on the five metrics.

Upper threshold-based methodology

Upper threshold-based methodology uses metrics with an exhausting limit. For example, metric A, which has a semaphore mean value of 14 or 16, which is used on the platform to limit the number of transactions that can occur on a node at one time to protect resources on the node. Metric B, memory max of 2 GB, where each node memory has a pre-defined maximum capacity. In all such similar cases, the situation is alarming only when the metrics are closer to the exhaustion limit. Even if the deviation is higher than the mean, but lower than the exhausting limit, then the threshold limit wouldn’t result in an alarm.

Z-score methodology

A Z-score is  a numerical measurement that describes the relationship between a value to the mean of a group of values. Z-score is measured in terms of standard deviations from the mean. If a Z-score is 0, then the data point score is identical to the mean score.

The formula for calculating a Z-score is z = (x-μ)/σ:

x : The raw score of the data, as the moving average of the previous 15 minutes
μ: The data population mean that is the average of the previous four weeks on the same day, same hour, and same minute
σ: The data population standard deviation

When calculating Z-scores or making comparisons, it’s essential to consider these patterns of the analyzed data with inherent cyclical patterns. Cyclicity in a dataset refers to repeating patterns that occur at regular intervals, such as daily, weekly, or seasonal cycles. For example, sales data may exhibit higher values during holiday seasons or lower values during off-peak periods.

The cyclicity score is the similarity between two series which measure the similarity between two vectors and helps ensure that the Z-score model provides reliable insights and identifies true anomalies or outliers while considering the natural patterns of the data.

The cyclical score is calculated at the instance level with a data selection of four weeks divided into two-week vector increments, excluding weekends. The score returns the similarity score between the two, where a higher score indicates a more aligned similarity trend in the compared vector data.