Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents
minLevel1
maxLevel7

Advance functions

Table of Contents

Anomaly Detection

Kloudfuse platform provides ability to perform anomaly detection on underlying data. Anomaly monitoring is effective when looking out for deviation in behavior in comparison to its past behavior. For example, are # of requests unusually different then it’s past behavior. Kloudfuse provides following algorithms for outlier detection:

  • Basic (Rolling-Quantile) is a statistical learning algorithm used to detect anomaly based on earlier behavior measured by the specified quantile. Following parameters are provided to tune the algorithm:

    • Threshold: # of standard deviations the value have to be away from mean for it to be considered anomalous. For example, a value of 1 would indicate 0.68 quantile value.

  • RRCF (Robust Random Cut Forest) is a machine learning algorithm used for detecting anomalies in large datasets. It uses a tree-based ensemble method to identify outliers based on their relative isolation within the dataset. The algorithm constructs a set of binary trees from random subsamples of the data, and determines the level of isolation of each point in the dataset by counting the number of trees that must be traversed before reaching that point. Anomalies are identified as points that require fewer traversals than the majority of points in the dataset, indicating that they are more isolated and potentially more unusual. The algorithm is robust to high-dimensional data, skewed distributions, and the presence of noisy or irrelevant features, making it well-suited for a wide range of applications in anomaly detection and outlier analysis. Following parameters are provided to tune the algorithm further:

    • global window: Time window to use for the rolling dataset (from the metric query done over this time window). At any point in time, RRCF algorithm captures the signal behavior seen over this time window.

    • local window: Time window to use for capturing the signal behavior in recent past.

Anomaly detection is a powerful monitoring feature that uses algorithmic analysis to automatically identify unexpected behavior in metric data. Traditional threshold-based alerting often fails to account for trends, seasonality, or complex fluctuations in metrics. Anomaly detection algorithms overcome this limitation by analyzing historical patterns to establish dynamic boundaries, making it possible to detect deviations from normal behavior even as the data changes over time. Kloudfuse provides 4 Anomaly detection algorithms.

Key Arguments

Rolling Window Size:

  1. The rolling window size is used to calculate the standard deviation (std) for setting the band limits around expected values, which helps define the "normal" range.

    • A larger window size smooths the standard deviation calculation, reducing sensitivity to short-term fluctuations and providing a stable range for expected behavior. However, this may delay detection of rapid changes.

Bands :

  1. Band 1 (Narrow): Sets a tight range around expected values, making the algorithm highly sensitive to even small deviations. This band is ideal for detecting subtle anomalies, which could be early indicators of a potential issue.

  2. Band 2 (Moderate): Offers a balanced range, capturing moderate deviations without excessive sensitivity to minor fluctuations. This is suitable for general anomaly detection, where both significant and moderate changes are relevant.

  3. Band 3 (Wide): Provides the widest range, capturing only large deviations from the expected values. This is useful for minimizing false positives, focusing only on major anomalies that could indicate significant issues.

Seasonality :

Selects a single primary seasonal pattern based on the expected periodicity of the data. For instance, Daily seasonality is ideal for data that follows a consistent daily pattern, while Weekly is useful for data that repeats weekly.

Basic (Rolling-Quantile)

The Basic Anomaly Detection algorithm provides a straightforward way to identify unusual behavior in metric data by calculating rolling quantiles. It is well-suited for metrics without strong seasonal patterns or trends, where simple threshold-based monitoring may not be sufficient for capturing all anomalies. This algorithm allows you to define a range of expected values based on historical data, with deviations outside this range flagged as anomalies.

For example, Basic Anomaly Detection can help you spot unexpected drops in CPU utilization that might signal an issue with a server, or sudden spikes in network traffic that could indicate potential security incidents.

...

Agile(SARIMA)

The Agile Anomaly Detection algorithm leverages the SARIMA (Seasonal AutoRegressive Integrated Moving Average) model to detect anomalies in metrics with predictable, short-term seasonal patterns and occasional abrupt level shifts. Agile is well-suited for metrics with daily or hourly cycles, allowing for rapid adaptation to sudden changes while accurately capturing short, repeating patterns.

Key Arguments:

Bands: Controls detection sensitivity (1,2,3)

...

Robust(Seasonal Decompose)

  • The Robust Anomaly Detection algorithm uses seasonal decomposition to separate trend, seasonal, and residual components, making it effective for metrics with stable, well-defined seasonality and trend patterns. It’s ideal for metrics with predictable cycles, like weekly peaks in web traffic, where regular seasonal trends need to be accounted for in anomaly detection.

  • Key Arguments

    1. Rolling Window Size

    2. Bands

  • Image Added

    Image Added

AgileRobust(Prophet)

  • The Agile Robust Anomaly Detection algorithm leverages the Prophet model to detect anomalies in metrics with complex seasonal patterns and trend shifts. Prophet is designed to handle multi-seasonality patterns with underlying trends and is particularly effective for metrics that show strong seasonal cycles alongside irregular level shifts. This makes it ideal for metrics that combine predictable patterns with occasional, unpredictable changes.

Outlier Detection

Kloudfuse platform provides ability to perform outlier detection on underlying data. Outliers monitoring is effective when looking out for deviation in behavior in comparison to other similar entities in the cluster. For example, CPU usage per pod for a service with 3 replicas should be similar across all 3 pods. If one pod uses more or less CPU then others then it is an outlier. Kloudfuse provides following algorithms for outlier detection:

  • DBSCAN: (density-based spatial clustering of applications with noise) is a popular clustering algorithm. Following parameters are used to tune it further:.

Forecasting

Forecasting allows you to predict future values of a metric based on historical data trends, helping you anticipate changes and plan resources proactively. By analyzing past patterns, forecasting algorithms can model trends, seasonal behaviors, and cyclical patterns, creating forward-looking insights that support decision-making and preemptive action.

Forecasting is especially useful for metrics that follow consistent trends or seasonal cycles, such as user traffic, CPU utilization, or application performance metrics. For example, forecasting can help you predict daily traffic peaks on a website or anticipate CPU load during specific times of the week, allowing teams to allocate resources accordingly.

Kloudfuse offers two forecasting algorithms to help users anticipate metric behavior.

Linear regression

Linear forecasting in Kloudfuse is powered by PromQL’s linear regression capabilities, allowing straightforward prediction of future metric values based on a consistent linear trend. This approach is ideal for metrics that demonstrate steady growth or decline over time, without strong seasonal fluctuations. For example, linear forecasting can effectively model gradual increases in memory usage or a steady upward trend in active user counts.

Seasonal (Prophet)

Seasonal forecasting leverages the Prophet model to capture metrics with strong, regular seasonal patterns. Prophet is particularly effective for metrics that show predictable hourly, daily or weekly cycles, such as website traffic or workload variations that repeat over time. By modeling these recurring patterns, Prophet-based forecasting provides more accurate predictions for metrics with complex seasonal behaviors.

Image Added

Auto Alerting: Hawkeye

Kloudfuse Analytics provides auto alerting feature on various entities, out of the box. In most cases, these require simple configuration and the auto alerting internally uses the advance functions that the kloudfuse platform supports/provides to monitor your cluster.

Having the required data and the unification of streams is central to Kloudfuse platform being able to do the auto alerting. Hawkeye service is designed to monitor user controllable entities in their infrastructure for abnormal behavior depending on the entity in an intelligent fashion.

Anomalies: Services

Knight discovers peer to peer communication between services automatically. The communication is tracked for various protocols. The discovered services and their connection to other services (entities) is discovered (and shown in the service list UI). The service map is also discovered using the communication as edges. Each of the service list and service map records the RED metric for the service or the edge.

Using this data, HawkEye looks for anomalies in real-time fashion using state-of-the-art statistical learning algorithms or service level objectives as configured. If an anomalous behavior is detected then an alert is raise which is then evaluated by BullsEye.

Persistent Volumes

This feature is not enabled by default. Contact us for more information. Follow these steps to enable.

Outliers: Resources *

Using Kubernetes data Kloudfuse automatically monitors resource consumption for outliers.

Forecasting: Persistent Volumes *

Using Kubernetes data Kloudfuse automatically monitors the forecasted behavior of persistent volumes to watch out for capacity exhaustion.

BullsEye: Auto Analysis

Bullseye service is designed for analyzing signals, correlating them with signals within the same stream or across the streams. This analysis helps reduce the effort required to debug issues and narrowing down problematic areas in minutes.

...

BullsEye is designed to narrow down to other anomalous areas of your infrastructure starting from the source which is captured in an alert. Due to the Kloudfuse Platform being unified, i.e., all data being present in a single platform, it can cast a wider net for looking into data derived from each of the streams present in the system, making it the most likely to identify problematic behavior in minutes.

Additionally, if instrumentation less tracing is enabled, it can sieve through this wider net in much more efficient manner to eliminate noise and present only the most relevant information which can tremendously reduce the time to resolution

Configuring Bullseye to run analysis when an alert fires, or an ad-hoc basis is possible. Please follow these steps to enable.

Bullseye is also configured to be executed for all alerts/signals monitored by Hawkeye using auto alerting.

Code Block
* In upcoming releases.