Kloudfuse platform provides ability to perform anomaly detection on underlying data. Anomaly monitoring is effective when looking out for deviation in behavior in comparison to its past behavior. For example, are # of requests unusually different then it’s past behavior. Kloudfuse provides following algorithms for anomaly detection:
Basic (Rolling-Quantile) is a statistical learning algorithm used to detect anomaly based on earlier behavior measured by the specified quantile. Following parameters are provided to tune the algorithm:
Threshold: # of standard deviations the value have to be away from mean for it to be considered anomalous. For example, a value of 1 would indicate 0.68 quantile value.
RRCF (Robust Random Cut Forest) is a machine learning algorithm used for detecting anomalies in large datasets. It uses a tree-based ensemble method to identify outliers based on their relative isolation within the dataset. The algorithm constructs a set of binary trees from random subsamples of the data, and determines the level of isolation of each point in the dataset by counting the number of trees that must be traversed before reaching that point. Anomalies are identified as points that require fewer traversals than the majority of points in the dataset, indicating that they are more isolated and potentially more unusual. The algorithm is robust to high-dimensional data, skewed distributions, and the presence of noisy or irrelevant features, making it well-suited for a wide range of applications in anomaly detection and outlier analysis. Following parameters are provided to tune the algorithm further:
global window: Time window to use for the rolling dataset (from the metric query done over this time window). At any point in time, RRCF algorithm captures the signal behavior seen over this time window.
local window: Time window to use for capturing the signal behavior in recent past.
Kloudfuse platform provides ability to perform outlier detection on underlying data. Outliers monitoring is effective when looking out for deviation in behavior in comparison to other similar entities in the cluster. For example, CPU usage per pod for a service with 3 replicas should be similar across all 3 pods. If one pod uses more or less CPU then others then it is an outlier. Kloudfuse provides following algorithms for outlier detection:
DBSCAN: (density-based spatial clustering of applications with noise) is a popular clustering algorithm.
Kloudfuse Analytics provides auto alerting feature on various entities, out of the box. In most cases, these require simple configuration and the auto alerting internally uses the advance functions that the kloudfuse platform supports/provides to monitor your cluster.
Having the required data and the unification of streams is central to Kloudfuse platform being able to do the auto alerting. Hawkeye service is designed to monitor user controllable entities in their infrastructure for abnormal behavior depending on the entity in an intelligent fashion.
Knight discovers peer to peer communication between services automatically. The communication is tracked for various protocols. The discovered services and their connection to other services (entities) is discovered (and shown in the service list UI). The service map is also discovered using the communication as edges. Each of the service list and service map records the RED metric for the service or the edge.
Using this data, HawkEye looks for anomalies in real-time fashion using state-of-the-art statistical learning algorithms or service level objectives as configured. If an anomalous behavior is detected then an alert is raise which is then evaluated by BullsEye.
Using Kubernetes data Kloudfuse automatically monitors resource consumption for outliers.
Forecasting: Persistent Volumes *
Using Kubernetes data Kloudfuse automatically monitors the forecasted behavior of persistent volumes to watch out for capacity exhaustion.
BullsEye: Auto Analysis
Bullseye service is designed for analyzing signals, correlating them with signals within the same stream or across the streams. This analysis helps reduce the effort required to debug issues and narrowing down problematic areas in minutes.
Configuring Bullseye to run analysis when an alert fires, or an ad-hoc basis is possible. Please follow these steps to enable.