FuseQL Cheatsheet

A list of useful FuseQL queries and their use cases.

Count of All Logs

image-20241107-230025.png

Use Cases:

  • Activity Patterns: Examining logs over a time range can help spot patterns in system usage, traffic, or performance. For example, a system might experience higher load during specific hours of the day, and analyzing log volumes over a set period (like a day or week) can reveal predictable trends.

  • Scaling Decisions: If logs show consistent spikes in traffic or resource usage during certain time ranges, teams may be able to predict when the system will need additional capacity (e.g., servers, storage, network bandwidth) and plan ahead to scale appropriately.

  • Impact of Changes or Deployments: After deploying a new feature or making a system update, teams often analyze logs from a time range around the deployment to ensure that the change did not cause any unexpected issues (e.g., errors, performance degradation). For example, reviewing logs from the past 48 hours can reveal any issues arising from a recent deployment.

Count of All Fingerprints

 

image-20241107-231733.png
  • Identify Unexpected Usage Patterns: By tracking how the variety of user-related fingerprints changes over time, you can also spot unexpected usage patterns. For example, if a certain feature starts generating a wide variety of logs (e.g., new queries or interactions), it might mean that users are adopting the feature in ways you didn’t anticipate, and it could require further optimization or user support.

  • Spot New Problems Early: A sudden increase in the count of different kinds of fingerprints might indicate that new issues are emerging in your system. For example, if new error patterns appear or previously rare issues start becoming more frequent, tracking the diversity of fingerprints over time can help you detect these problems early, allowing you to mitigate them before they escalate.

     

Count of All Logs Grouped by Level

 

Use Cases:

  • Spot Spikes in Errors or Warnings: If the count of ERROR or WARN logs increases suddenly, it’s a signal that something might have gone wrong. Whether it’s a bug in the system, an overload of requests, or a failing component, monitoring the log counts over time by severity level helps you quickly detect issues as they arise. This allows you to react proactively, possibly preventing system outages or service degradation.

  • Monitor System Usage Trends: INFO logs often provide general operational details, such as how many users are accessing the system, how many transactions are happening, or how many requests are being made. By grouping logs by level over time, you can track normal system behavior, identifying whether the system is performing as expected or if usage has significantly increased.

Count of All Fingerprints Grouped by Source

Use Cases:

  • Source-Level Diagnosis: Grouping fingerprints by source allows you to understand which parts of your system are generating specific log patterns. For example, if a certain error fingerprint is seen predominantly from a specific service (such as an authentication service), this could indicate that service is the source of the issue. Without grouping by source, you may miss the root cause.

  • Resource Allocation and Scaling: If one particular source (like an API gateway or database) is generating a disproportionate number of fingerprints, it may indicate a bottleneck or resource contention issue. Understanding this allows for more targeted scaling or resource allocation to that part of the system to ensure overall system health.

Average of a Duration/Number Facet

 

Use Cases:

  • Identify Bottlenecks and Latency Trends: If your logs contain durations (e.g., response times for API requests, transaction times, query execution times), calculating the average duration over time helps identify performance trends. For example, if the average duration of an API call is gradually increasing over time, this might signal that something in the system is slowing down and requires optimization (e.g., database queries taking longer, network latency increasing, etc.).

  • Estimate Resource Requirements: Knowing the average duration of specific processes or operations (e.g., API calls, data processing tasks) helps estimate resource requirements. For example, if the average duration of a batch job is increasing over time, it may indicate that more CPU or memory resources are needed to handle the load. By calculating averages, teams can plan for future scaling needs and ensure that the system can handle increasing load without performance degradation.

Error Rate Formula

Use Cases:

  • Failure Detection: A spike in the error rate could indicate that a system component has failed or is malfunctioning. For example, a sudden rise in errors across the logs could point to a service crash, a network failure, or a hardware issue (e.g., disk failures). Quickly catching these spikes allows teams to react faster and bring the system back to normal operation.

  • Trend Analysis: Over time, monitoring the error rate helps identify trends that might not be immediately apparent. Gradual increases in error rates, even if subtle, can signal an issue that needs to be addressed (e.g., a misconfigured system or slowly degrading performance). Monitoring these trends allows teams to take action before a small issue becomes a major failure.

Advanced Functions

Anomaly on Count of Error Logs

 

In the above Image Around 8:40, there is a sudden, sharp spike in error logs that breaches the gray band. This anomaly is highlighted in red to indicate that the error count has exceeded the expected range, suggesting an unusual event, such as a system malfunction, a deployment issue, or an unexpected traffic surge that is causing increased errors.

Outlier

 

In this scenario, the error logs are being monitored across various sources within a distributed system.Two namespaces—are marked as outliers, meaning their error log rates differ significantly from other namespaces. This suggests potential issues within these specific components, such as increased load, configuration issues, or code changes that may be causing higher-than-normal errors.

This outlier detection allows teams to prioritize investigation into these specific sources, helping to identify and resolve issues before they impact the broader system.

 

Log Math Operator to Scale the Y-Axis Down

Use Cases:

  • Compress wide ranges of values to make large spikes and small changes comparable.

  • Reduce the impact of extreme outliers, revealing subtle trends.

Related pages