Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Source-Level Diagnosis: Grouping fingerprints by source allows you to understand which parts of your system are generating specific log patterns. For example, if a certain error fingerprint is seen predominantly from a specific service (such as an authentication service), this could indicate that service is the source of the issue. Without grouping by source, you may miss the root cause.

  • Resource Allocation and Scaling: If one particular source (like an API gateway or database) is generating a disproportionate number of fingerprints, it may indicate a bottleneck or resource contention issue. Understanding this allows for more targeted scaling or resource allocation to that part of the system to ensure overall system health.

  • Faster Troubleshooting: When logs are grouped by source, it becomes much easier to identify which part of the system is responsible for certain issues. If you know that certain fingerprints correspond to recurring problems (e.g., database errors, network issues, etc.), tracking those patterns by source helps you focus on the right area quickly.

Average of a Duration/Number Facet

...

  • Failure Detection: A spike in the error rate could indicate that a system component has failed or is malfunctioning. For example, a sudden rise in errors across the logs could point to a service crash, a network failure, or a hardware issue (e.g., disk failures). Quickly catching these spikes allows teams to react faster and bring the system back to normal operation.

  • Trend Analysis: Over time, monitoring the error rate helps identify trends that might not be immediately apparent. Gradual increases in error rates, even if subtle, can signal an issue that needs to be addressed (e.g., a misconfigured system or slowly degrading performance). Monitoring these trends allows teams to take action before a small issue becomes a major failure.

Advanced Functions

Anomaly on Count of Error Logs

Image Added

In the above Image Around 8:40, there is a sudden, sharp spike in error logs that breaches the gray band. This anomaly is highlighted in red to indicate that the error count has exceeded the expected range, suggesting an unusual event, such as a system malfunction, a deployment issue, or an unexpected traffic surge that is causing increased errors.

Outlier

Image Added

In this scenario, the error logs are being monitored across various sources within a distributed system.Two namespaces—are marked as outliers, meaning their error log rates differ significantly from other namespaces. This suggests potential issues within these specific components, such as increased load, configuration issues, or code changes that may be causing higher-than-normal errors.

This outlier detection allows teams to prioritize investigation into these specific sources, helping to identify and resolve issues before they impact the broader system.

Forecast

Log Math Operator to Scale the Y-Axis Down

...