...
Advance functions
Anomaly Detection
Kloudfuse platform provides ability to perform anomaly detection on underlying data. Anomaly monitoring is effective when looking out for deviation in behavior in comparison to its past behavior. For example, are # of requests unusually different then it’s past behavior. Kloudfuse provides following algorithms for anomaly detection:
Basic (Rolling-Quantile) is a statistical learning algorithm used to detect anomaly based on earlier behavior measured by the specified quantile. Following parameters are provided to tune the algorithm:
Threshold: # of standard deviations the value have to be away from mean for it to be considered anomalous. For example, a value of 1 would indicate 0.68 quantile value.
RRCF (Robust Random Cut Forest) is a machine learning algorithm used for detecting anomalies in large datasets. It uses a tree-based ensemble method to identify outliers based on their relative isolation within the dataset. The algorithm constructs a set of binary trees from random subsamples of the data, and determines the level of isolation of each point in the dataset by counting the number of trees that must be traversed before reaching that point. Anomalies are identified as points that require fewer traversals than the majority of points in the dataset, indicating that they are more isolated and potentially more unusual. The algorithm is robust to high-dimensional data, skewed distributions, and the presence of noisy or irrelevant features, making it well-suited for a wide range of applications in anomaly detection and outlier analysis. Following parameters are provided to tune the algorithm further:
global window: Time window to use for the rolling dataset (from the metric query done over this time window). At any point in time, RRCF algorithm captures the signal behavior seen over this time window.
local window: Time window to use for capturing the signal behavior in recent past.
Anomaly detection is a powerful monitoring feature that uses algorithmic analysis to automatically identify unexpected behavior in metric data. Traditional threshold-based alerting often fails to account for trends, seasonality, or complex fluctuations in metrics. Anomaly detection algorithms overcome this limitation by analyzing historical patterns to establish dynamic boundaries, making it possible to detect deviations from normal behavior even as the data changes over time. Kloudfuse provides 4 Anomaly detection algorithms.
Key Arguments
Rolling Window Size:
The rolling window size is used to calculate the standard deviation (std) for setting the band limits around expected values, which helps define the "normal" range.
A larger window size smooths the standard deviation calculation, reducing sensitivity to short-term fluctuations and providing a stable range for expected behavior. However, this may delay detection of rapid changes.
Bands :
Band 1 (Narrow): Sets a tight range around expected values, making the algorithm highly sensitive to even small deviations. This band is ideal for detecting subtle anomalies, which could be early indicators of a potential issue.
Band 2 (Moderate): Offers a balanced range, capturing moderate deviations without excessive sensitivity to minor fluctuations. This is suitable for general anomaly detection, where both significant and moderate changes are relevant.
Band 3 (Wide): Provides the widest range, capturing only large deviations from the expected values. This is useful for minimizing false positives, focusing only on major anomalies that could indicate significant issues.
Seasonality :
Selects a single primary seasonal pattern based on the expected periodicity of the data. For instance, Daily seasonality is ideal for data that follows a consistent daily pattern, while Weekly is useful for data that repeats weekly.
Basic (Rolling-Quantile)
The Basic Anomaly Detection algorithm provides a straightforward way to identify unusual behavior in metric data by calculating rolling quantiles. It is well-suited for metrics without strong seasonal patterns or trends, where simple threshold-based monitoring may not be sufficient for capturing all anomalies. This algorithm allows you to define a range of expected values based on historical data, with deviations outside this range flagged as anomalies.
For example, Basic Anomaly Detection can help you spot unexpected drops in CPU utilization that might signal an issue with a server, or sudden spikes in network traffic that could indicate potential security incidents.
Key Arguments
...
Agile(SARIMA)
The Agile Anomaly Detection algorithm leverages the SARIMA (Seasonal AutoRegressive Integrated Moving Average) model to detect anomalies in metrics with predictable, short-term seasonal patterns and occasional abrupt level shifts. Agile is well-suited for metrics with daily or hourly cycles, allowing for rapid adaptation to sudden changes while accurately capturing short, repeating patterns.
Key Arguments:
Bands: Controls detection sensitivity (1,2,3)
...
Robust(Seasonal Decompose)
The Robust Anomaly Detection algorithm uses seasonal decomposition to separate trend, seasonal, and residual components, making it effective for metrics with stable, well-defined seasonality and trend patterns. It’s ideal for metrics with predictable cycles, like weekly peaks in web traffic, where regular seasonal trends need to be accounted for in anomaly detection.
Key Arguments
AgileRobust(Prophet)
The Agile Robust Anomaly Detection algorithm leverages the Prophet model to detect anomalies in metrics with complex seasonal patterns and trend shifts. Prophet is designed to handle multi-seasonality patterns with underlying trends and is particularly effective for metrics that show strong seasonal cycles alongside irregular level shifts. This makes it ideal for metrics that combine predictable patterns with occasional, unpredictable changes.
Key Arguments
Outlier Detection
Kloudfuse platform provides ability to perform outlier detection on underlying data. Outliers monitoring is effective when looking out for deviation in behavior in comparison to other similar entities in the cluster. For example, CPU usage per pod for a service with 3 replicas should be similar across all 3 pods. If one pod uses more or less CPU then others then it is an outlier. Kloudfuse provides following algorithms for outlier detection:
DBSCAN: (density-based spatial clustering of applications with noise) is a popular clustering algorithm.
Forecasting
Forecasting allows you to predict future values of a metric based on historical data trends, helping you anticipate changes and plan resources proactively. By analyzing past patterns, forecasting algorithms can model trends, seasonal behaviors, and cyclical patterns, creating forward-looking insights that support decision-making and preemptive action.
Forecasting is especially useful for metrics that follow consistent trends or seasonal cycles, such as user traffic, CPU utilization, or application performance metrics. For example, forecasting can help you predict daily traffic peaks on a website or anticipate CPU load during specific times of the week, allowing teams to allocate resources accordingly.
Kloudfuse offers two forecasting algorithms to help users anticipate metric behavior.
Linear
...
regression
Linear forecasting in Kloudfuse is powered by PromQL’s linear regression capabilities, allowing straightforward prediction of future metric values based on a consistent linear trend. This approach is ideal for metrics that demonstrate steady growth or decline over time, without strong seasonal fluctuations. For example, linear forecasting can effectively model gradual increases in memory usage or a steady upward trend in active user counts.
Seasonal (Prophet)
Seasonal forecasting leverages the Prophet model to capture metrics with strong, regular seasonal patterns. Prophet is particularly effective for metrics that show predictable hourly, daily or weekly cycles, such as website traffic or workload variations that repeat over time. By modeling these recurring patterns, Prophet-based forecasting provides more accurate predictions for metrics with complex seasonal behaviors.
Key Arguments
Auto Alerting: Hawkeye
Kloudfuse Analytics provides auto alerting feature on various entities, out of the box. In most cases, these require simple configuration and the auto alerting internally uses the advance functions that the kloudfuse platform supports/provides to monitor your cluster.
Having the required data and the unification of streams is central to Kloudfuse platform being able to do the auto alerting. Hawkeye service is designed to monitor user controllable entities in their infrastructure for abnormal behavior depending on the entity in an intelligent fashion.
Anomalies: Services
Knight discovers peer to peer communication between services automatically. The communication is tracked for various protocols. The discovered services and their connection to other services (entities) is discovered (and shown in the service list UI). The service map is also discovered using the communication as edges. Each of the service list and service map records the RED metric for the service or the edge.
...