Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Failure Detection: A spike in the error rate could indicate that a system component has failed or is malfunctioning. For example, a sudden rise in errors across the logs could point to a service crash, a network failure, or a hardware issue (e.g., disk failures). Quickly catching these spikes allows teams to react faster and bring the system back to normal operation.

  • Trend Analysis: Over time, monitoring the error rate helps identify trends that might not be immediately apparent. Gradual increases in error rates, even if subtle, can signal an issue that needs to be addressed (e.g., a misconfigured system or slowly degrading performance). Monitoring these trends allows teams to take action before a small issue becomes a major failure.

Advanced Functions

Anomaly on Count of Error Logs

Image RemovedImage Added

In the above Image Around 8:40, there is a sudden, sharp spike in error logs that breaches the gray band. This anomaly is highlighted in red to indicate that the error count has exceeded the expected range, suggesting an unusual event, such as a system malfunction, a deployment issue, or an unexpected traffic surge that is causing increased errors.

Outlier

Image RemovedImage Added

In this scenario, the error logs are being monitored across various sources within a distributed system.Three sourcesfrontend-gopinot-broker, and one other—are Two namespaces—are marked as outliers, meaning their error log rates differ significantly from other sourcesnamespaces. This suggests potential issues within these specific components, such as increased load, configuration issues, or code changes that may be causing higher-than-normal errors.

This outlier detection allows teams to prioritize investigation into these specific sources, helping to identify and resolve issues before they impact the broader system.

...

Log Math Operator to Scale the Y-Axis Down

...