Overview
Logging is a crucial stream for debugging and troubleshooting incidents. With the kloudfuse platform, users don’t have to worry about what logs to be ingested or indexed. Kloudfuse stack efficiently stores the logs and can achieve a high compression ratio for the ingested logs. It achieves this through its innovative fingerprinting technology. Fingerprinting allows the kfuse stack to auto-index the facets that are found in the application and system logs and enables faster searches on these facets without requiring the creation of manual indexes. Users can add arbitrary label/tags (key/value) pairs to the ingested log that can be further used to narrow down the log searches. Infrastructure labels/tags from Kubernetes and the cloud are automatically added as the logs flow through the log collection agent and the stack. This reduces the operational overhead further and allows for faster searches by giving users the ability to narrow down the scope of log searches.
Kfuse stack allows users to work with log data in various ways. The logs can be searched using the logs “Search” view. This view auto-populates the various facets that are automatically derived from the logs and the tags added by the user/infrastructure. The “Fingerprint” view can be used to summarize the logs by their unique fingerprints rather than looking at numerous raw log events. The “Metrics” view allows for the derivation of metrics from the ingested logs. Furthermore, the log-derived metrics can be saved and exported to the metric system for longer retention.
Log Fingerprinting
Application and infrastructure logs are voluminous and repetitive. In a traditional logging system, users are forced to put aggressive filters and limits on what logs get indexed and instantly queryable. Though when troubleshooting incidents it's better to have all applicable logs indexed. Hence there is a constant tussle between the need to index logs and reduce the cost imposed by traditional logging systems. Kloudfuse side-steps this problem by reducing the storage footprint of the logs tremendously (up to 100X for typical deployments). Kloudfuse logging system extracts a unique fingerprint of the raw logs that exploits the repetitive nature of the log events. A typical log line contains static strings that are written by developers and a bunch of strings/numbers that are derived based on runtime information (like customer id, item price, user name, etc). Kfuse stack creates a fingerprint of the log line by splitting the log line into its static and dynamic parts. By storing the static fingerprint and dynamic values separately the kfuse stack can store the log events in a cost-effective manner.
For example, for the following logline:
ts=2023-02-01T22:51:33Z caller=logging.go:29 method=Authorise result=false took=9.775µs
the fingerprint is:
ts=<v_0> caller=<v_1> method=<v_2> result=<v_3> took=<v_4>
the auto-extracted log facets are:
caller: logging.go:29
method: Authorise
result: false
took: 9.775µs
ts: 2023-02-01T22:51:33Z
Logs Search View
Kfuse allows users to search for logs instantly as the log events are ingested. Kfuse stack can auto extracts the facets from the log events and make them available for further search and analytics. The “Logs” view provides a summarized view of the logs ingested in a selected time range broken down by the severity level. The facets on the left-hand side list the various sources of log events. A source is defined by the kind of application/infra component that is emitting the logs and is set by adding a tag called “source” on corresponding log events. Within the source facets, the various log facets that are autodetected are listed. Kfuse lists both the facet name (e.g: method) and the various facet values and the counts of the various value cardinalities as shown in the figure below. The facets can be used to narrow down the logs of interest by either including or excluding the facets. The Labels list on the left-hand side allows the user to narrow down the logs by the tags added to the log events by the external environment. “Cloud” lists the tags such as cloud region, availability zone, etc. “Kubernetes” lists tags such as pod_names, Kubernetes service etc. “Additional” label lists any other user-defined tags. These tags allow the user to filter the log search scope and results in faster search times. The search box can be used to filter logs by doing any text search. The text search box can be used to enter complex search queries as well.
The log events list presents the filtered logs based on the log query that is comprised of selected facets, labels, and free text search strings for the time range selected by the time picker. Clicking on an individual log event shows a detailed view of the log event including its fingerprint, auto-extracted tags, and environment labels.
The following sections describe the various kinds of filters/search available in the Logs search view
Labels filter
Label filter let you narrow down logs based on infrastructure labels that are sent as part of the log events. To filter based on a label, expand the label category like source
, Cloud
, Kubernetes
or Additional
and Include or Exclude a label value. These filters will automatically be added to the Search Logs
box at the top.
Label filters can be added directly by typing the required label name and value in the Search Logs
box using auto-complete functionality.
Log Facets filter
Log Facets are auto-extracted facets derived from the log line itself. To explore and filter by Log Facets, expand the source under the Sources
section to view the log facets associated with the source. Expand on the log facets list to view the values and its count. To view the log lines with a given value, click on the facet value to Include/Exclude that line.
Log facet search can also be done directly from the search bar. Include the @
prefix to denote a log facet. For ex: @source.facet="value"
Auto-complete support is available for log facet names. Auto-complete support is not available for log facet values
Grep filter
Grep filters (log line contains) can be done directly by typing the grep string in the search box
Filter operators
The following filter operators are supported:
= (equals)
!= (not equals)
=~ (regex)
!~ (not regex)
These operators can be applied to labels, log facets, and grep queries. Select the operator from the search box drop-down before typing the filter query in the logs search box.
Note: equals and not equals perform an exact match for labels and log facets. For grep filters, they are treated as contains and not contains respectively.
Log Event detail view
The log event detail view has log details including:
log line
fingerprint
log facets
labels
The detail view can be accessed by clicking on the log event detail icon that shows up when you hover on a log line, or by click+enter from the log list view. From the log event detail view, the user can filter using the fingerprint, log facets, and labels associated with the fingerprint.
Rename auto-detected facets
As a result of Fingerprinting incoming log lines, we auto-detect tokens like numbers, IP address, duration, size, and UUID. These tokens are automatically assigned log facet names as _number_0, _ip_address_0, _duration_0
respectively. These auto-assigned log facet names can be renamed to something more readable by the user from the log event detail view. The renaming is scoped to a fingerprint.
Fingerprints View
Fingerprints view provides a concise view of the unique logs ingested by the stack. This bird’s eye view is very helpful when looking at the huge number of logs that are typically emitted by production systems. For example, by filtering the logs by severity level of ERROR, the user can easily highlight different kinds of error events that are being observed by the system without hunting for them one by one. The fingerprints can be searched just like log events. Moreover, the logs can be filtered to only include/exclude logs with the selected fingerprint. This serves as a smarter grep where you can quickly filter logs without finding unique strings to grep or grep -v by.
Log Analytics
Logs contain a lot of valuable information. Quite often developers instrument the application to log the various application metrics within the log line. Though convenient at development time this makes it harder to analyze the system in production as now the logs must be indexed properly to extract such metrics. Kloudfuse platform makes the metric generation from the log line fairly easy though due to its unique fingerprinting technology. Kloudfuse stack can auto-extract metric facets and highlight them under each source category. The “Log Analytics” view allows the users to select the numeric facets to be charted along with what aggregates to apply. The log lines containing the metric of interest can be filtered as its done in log search (or can be skipped as well). Range Aggregate allows time aggregation to be applied so that events from multiple log lines can be aggregated across everything or some common dimensions that can be selected using the “Grouping options”. These dimensions can include facets extracted from the log line or environment tags like pod_name, service name, etc. The chart is generated dynamically from the log lines and can be used for ad-hoc analysis during troubleshooting. For saving this metric refer to the next section.
Log facet selector
selector for log facet or count based metric to chart
Facet normalization function
function used to normalize the log facet to a numerical value
number - parse the log facet as a double value
count - normalize to 1 if the selected facet exists
duration - normalize a duration string to seconds. Valid time units are
ns, us (or "µs"), ms, s, m, h
. example: 1h30mbytes - normalize a size string to bytes. Valid size units are
KB, MB, GB, TB, PB, KiB, MiB, GiB, TiB, PiB
. example: 10MB
Range (time) aggregate: aggregate discrete points in time in time to produce one value per time-series and time-step. The aggregates are applied to log events that satisfy the log filters
Count based log metrics
rate : rate of log events at every time-step. i,e. count/time-step_seconds
count_over_time : count of log events at every time-step
Log Facet based log metrics
rate_counter : rate of monotonically increasing counter
sum_over_time
avg_over_time
max_over_time
min_over_time
first_over_time
last_over_time
stdvar_over_time
stddev_over_time
quantile_over_time
Range aggregate grouping
labels that define the time-series. log events are grouped by the labels and each group becomes a time-series
default grouping behavior is to group everything into one time series (except for rate and rate_counter which do not support grouping)
Vector (space) aggregate: Reduce the number of time series by aggregating across time-series at a given time step
sum
avg
min
max
stddev
stdvar
count
topk
bottomk
Vector aggregate grouping
labels that define the final time-series to collapse into. Must be a subset of the range aggregate grouping
default grouping behavior is to group everything into one time series
Generate chart button to chart the log-derived metric
Visualization type
Save metric icon
Log Analytics exploration workflow
Add any log filters as described in the Log Search View to filter down logs for charting
Count based log metrics
Choose
count_log_events
from the log facet selectorChoose
number
as the normalization functionChoose
rate
orcount_over_time
as the Range/time aggregation functionClick on
Generate chart
to chart the count based metric
Log facet metrics
Choose the log facet to chart from the log facet selector
Choose
number/bytes/duration
as the normalization function to normalize the facet value. Choosecount
to count the number of times the log facet appears in the time-stepChoose one of the Log facet based range aggregation function
Click on
Generate chart
Save Metrics
The metrics that are explored can be saved as well to keep them for longer retention or further analysis. To save the explored metric, “Save Metrics” button can be used. The user can enter a unique name for the metric along with the dimensions that need to be saved for the metric series. By default the UI selects the dimensions that were used for metric exploration. The saved metric is pushed to the in-built metric storage.
The metrics that are saved are listed in the “Metrics” view at the bottom of the page. Any saved metric that is no longer required can be deleted from this list. The user can explore the saved metric using standard kfuse metric exploration (by clicking on the icon) or through Grafana metric explorer. Support for exporting the metrics to an external metric system will be coming in the future.
Log Source Integration
Kloudfuse stack can ingest from a variety of agents and cloud services. The following lists the various sources and how to configure them.
Adding tokenizer to derive log facets during ingest
In addition to auto-extracted Log Facets, Log facets can be derived at ingest time using a user-provided tokenizer. This is useful in cases where a user wants to capture a string in the log event as a named log facet that is not auto-derived.
For example for the following logline:10.12.0.35 - - [26/May/2021:18:59:10 +0000] "GET /unavailable HTTP/1.1" 503 21 "-" "hey/0.0.1
can be parsed with the following tokenizer:'%{sourceIp} - - [%{timestamp}] "%{requestMethod} %{uri} %{_}" %{responseCode} %{contentLength}'
This will generate the following log facets:
sourceIp: 10.12.0.35 requestMethod: GET responseCode: 503 contentLength: 21
The tokenizer can be applied to an incoming log line based on source and line filters. This is done by configuring the logs-parser values.yaml followed by a helm upgrade/install.
The following example values.yaml shows how a pattern can be applied conditionally to an incoming log event. Add the pipeline:
section from the following example in addition to any values that may already exist in the logs-parser section.
logs-parser: pipeline: configPath: "/conf" config: |- - nginx: - pipeline: - func: dissect params: - tokenizer: '%{sourceIp} - - [%{timestamp}] "%{requestMethod} %{uri} %{_} %{responseCode} %{contentLength}' - pinot: - pipeline: - if: 'msg contains "LLRealtimeSegmentDataManager_"' then: - func: dissect params: - tokenizer: '%{timestamp} %{level} [LLRealtimeSegmentDataManager_%{segment_name}]'
logs-parser
on line 1 specifies the values for logs-parser helm chart (sub-chart of the kfuse stack)pipeline
on line 2 represents the values for the logs-parser pipeline definition. This pipeline definition is across all sources. A pipeline is a sequence of functions that is applied to an incoming log event to extract and process a log eventconfigPath
on line 3 represents the path where the pipeline file is loaded into the logs-parser podconfig
on line 4 represents the yaml that is dumped into the pipeline file at configPathnginx
on line 5 represents the pipeline applied to the events with labelsource=nginx
pipeline
on line 6 represents the pipeline definition for the given source (nginx in this case)func: dissect
on line 7 instructs the logs-parser to apply the dissect function to the incoming log line. This function is used to apply the user-provided pattern to the incoming log linetokenizer: 'pattern'
on line 9 is a required argument to func: dissect that is used to define the patternline 10-16 represents another source specific pipeline for
source=pinot
if: 'msg contains line-filter'
on line 12 is an optional line filter that can be used to apply a tokenizer conditionally on a line filter