Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

Now that all the data from all our telemetry streams are stored efficiently in the Kloudfuse platform, it is time to make use of this unified data. Kloudfuse platform does this by providing functionality that monitors Kubernetes services running in your clusters for any abnormal behavior and analyzing (upon detection of abnormal behavior) by correlating signals from each stream and narrowing down problematic areas in minutes. Please read more about HawkEye and BullsEye to gain more insight.

Table of Contents
minLevel1
maxLevel7

Advance functions

Anomaly Detection

Anomaly detection is a powerful monitoring feature that uses algorithmic analysis to automatically identify unexpected behavior in metric data. Traditional threshold-based alerting often fails to account for trends, seasonality, or complex fluctuations in metrics. Anomaly detection algorithms overcome this limitation by analyzing historical patterns to establish dynamic boundaries, making it possible to detect deviations from normal behavior even as the data changes over time. Kloudfuse provides 4 Anomaly detection algorithms.

Key Arguments

Rolling Window Size:

  1. The rolling window size is used to calculate the standard deviation (std) for setting the band limits around expected values, which helps define the "normal" range.

    • A larger window size smooths the standard deviation calculation, reducing sensitivity to short-term fluctuations and providing a stable range for expected behavior. However, this may delay detection of rapid changes.

Bands :

  1. Band 1 (Narrow): Sets a tight range around expected values, making the algorithm highly sensitive to even small deviations. This band is ideal for detecting subtle anomalies, which could be early indicators of a potential issue.

  2. Band 2 (Moderate): Offers a balanced range, capturing moderate deviations without excessive sensitivity to minor fluctuations. This is suitable for general anomaly detection, where both significant and moderate changes are relevant.

  3. Band 3 (Wide): Provides the widest range, capturing only large deviations from the expected values. This is useful for minimizing false positives, focusing only on major anomalies that could indicate significant issues.

Seasonality :

Selects a single primary seasonal pattern based on the expected periodicity of the data. For instance, Daily seasonality is ideal for data that follows a consistent daily pattern, while Weekly is useful for data that repeats weekly.

Basic (Rolling-Quantile)

The Basic Anomaly Detection algorithm provides a straightforward way to identify unusual behavior in metric data by calculating rolling quantiles. It is well-suited for metrics without strong seasonal patterns or trends, where simple threshold-based monitoring may not be sufficient for capturing all anomalies. This algorithm allows you to define a range of expected values based on historical data, with deviations outside this range flagged as anomalies.

For example, Basic Anomaly Detection can help you spot unexpected drops in CPU utilization that might signal an issue with a server, or sudden spikes in network traffic that could indicate potential security incidents.

...

Agile(SARIMA)

The Agile Anomaly Detection algorithm leverages the SARIMA (Seasonal AutoRegressive Integrated Moving Average) model to detect anomalies in metrics with predictable, short-term seasonal patterns and occasional abrupt level shifts. Agile is well-suited for metrics with daily or hourly cycles, allowing for rapid adaptation to sudden changes while accurately capturing short, repeating patterns.

Key Arguments:

Bands: Controls detection sensitivity (1,2,3)

...

Robust(Seasonal Decompose)

  • The Robust Anomaly Detection algorithm uses seasonal decomposition to separate trend, seasonal, and residual components, making it effective for metrics with stable, well-defined seasonality and trend patterns. It’s ideal for metrics with predictable cycles, like weekly peaks in web traffic, where regular seasonal trends need to be accounted for in anomaly detection.

  • Key Arguments

    1. Rolling Window Size

    2. Bands

  • Image Added

    Image Added

AgileRobust(Prophet)

  • The Agile Robust Anomaly Detection algorithm leverages the Prophet model to detect anomalies in metrics with complex seasonal patterns and trend shifts. Prophet is designed to handle multi-seasonality patterns with underlying trends and is particularly effective for metrics that show strong seasonal cycles alongside irregular level shifts. This makes it ideal for metrics that combine predictable patterns with occasional, unpredictable changes.

Outlier Detection

Kloudfuse platform provides ability to perform outlier detection on underlying data. Outliers monitoring is effective when looking out for deviation in behavior in comparison to other similar entities in the cluster. For example, CPU usage per pod for a service with 3 replicas should be similar across all 3 pods. If one pod uses more or less CPU then others then it is an outlier. Kloudfuse provides following algorithms for outlier detection:

  • DBSCAN: (density-based spatial clustering of applications with noise) is a popular clustering algorithm.

Forecasting

Forecasting allows you to predict future values of a metric based on historical data trends, helping you anticipate changes and plan resources proactively. By analyzing past patterns, forecasting algorithms can model trends, seasonal behaviors, and cyclical patterns, creating forward-looking insights that support decision-making and preemptive action.

Forecasting is especially useful for metrics that follow consistent trends or seasonal cycles, such as user traffic, CPU utilization, or application performance metrics. For example, forecasting can help you predict daily traffic peaks on a website or anticipate CPU load during specific times of the week, allowing teams to allocate resources accordingly.

Kloudfuse offers two forecasting algorithms to help users anticipate metric behavior.

Linear regression

Linear forecasting in Kloudfuse is powered by PromQL’s linear regression capabilities, allowing straightforward prediction of future metric values based on a consistent linear trend. This approach is ideal for metrics that demonstrate steady growth or decline over time, without strong seasonal fluctuations. For example, linear forecasting can effectively model gradual increases in memory usage or a steady upward trend in active user counts.

Seasonal (Prophet)

Seasonal forecasting leverages the Prophet model to capture metrics with strong, regular seasonal patterns. Prophet is particularly effective for metrics that show predictable hourly, daily or weekly cycles, such as website traffic or workload variations that repeat over time. By modeling these recurring patterns, Prophet-based forecasting provides more accurate predictions for metrics with complex seasonal behaviors.

Image Added

Auto Alerting: Hawkeye

Kloudfuse Analytics provides auto alerting feature on various entities, out of the box. In most cases, these require simple configuration and the auto alerting internally uses the advance functions that the kloudfuse platform supports/provides to monitor your cluster.

Having the required data and the unification of streams is central to Kloudfuse platform being able to do the auto alerting. Hawkeye service is designed to monitor user controllable entities in their infrastructure for abnormal behavior depending on the entity in an intelligent fashion.

Anomalies: Services

Knight discovers peer to peer communication between services automatically. The communication is tracked for various protocols. The discovered services and their connection to other services (entities) is discovered (and shown in the service list UI). The service map shows is also discovered using the communication as edges. Each of the service list and service map shows records the RED metric for the service or the edge.

HawkEye

HawkEye is a service designed to monitor any user configured signal (metric data from any stream) or all peer to peer communication signals (derived the RED metric). “Auto monitoring” feature monitors the communication between various endpoints in a detailed manner using instrumentation-less tracing. Using this data, HawkEye looks for anomalies in real-time fashion using state-of-the-art statistical learning algorithms or service level objectives as configured. If an anomalous behavior is detected then an alert is raise which is then evaluated by BullsEye.

This feature is not enabled by default. Contact us for more information. Follow these steps to enable.

Outliers: Resources *

Using Kubernetes data Kloudfuse automatically monitors resource consumption for outliers.

Forecasting: Persistent Volumes *

Using Kubernetes data Kloudfuse automatically monitors the forecasted behavior of persistent volumes to watch out for capacity exhaustion.

BullsEye: Auto Analysis

BullsEye is designed to narrow down to other anomalous areas of your infrastructure starting from the source which is captured in an alert. Due to the Kloudfuse Platform being unified, i.e., all data being present in a single platform, it can cast a wider net for looking into data derived from each of the streams present in the system, making it the most likely to identify problematic behavior in minutes.

Additionally, if instrumentation less tracing is enabled, it can sieve through this wider net in much more efficient manner to eliminate noise and present only the most relevant information which can tremendously reduce the time to resolutionBullseye service is designed for analyzing signals, correlating them with signals within the same stream or across the streams. This analysis helps reduce the effort required to debug issues and narrowing down problematic areas in minutes.

Configuring Bullseye to run analysis when an alert fires, or an ad-hoc basis is possible. Please follow these steps to enable.

Bullseye is also configured to be executed for all alerts/signals monitored by Hawkeye using auto alerting.

Code Block
* In upcoming releases.