Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What is Continuous Profiling?

Continuous Profiling is a powerful addition to our observability platform. While traditional monitoring methods—metrics, logs, and tracing—provide valuable insights, they often leave gaps when it comes to understanding application performance at a granular level. Continuous Profiling fills this void by offering in-depth, line-level insights into your application’s code, allowing developers to see precisely how resources are utilized.

This low-overhead feature gathers profiles from production systems and stores them in a database for later analysis. This helps provide a comprehensive view of the application and its behavior in production, including CPU usage, memory allocation, and disk I/O, ensuring that every line of code operates efficiently.

Key Benefits of Continuous Profiling:

  1. Granular Insights: Continuous Profiling offers a detailed view of application performance that goes beyond traditional observability tools, providing line-level insights into resource utilization.

  2. In-Depth Code Analysis: With a comprehensive understanding of code performance and system interactions, developers can easily identify how specific code segments use resources, facilitating thorough analysis and optimization.

Read more on our blog post.

Configuration setup:

  1. Enable kfuse-profiling in custom-values.yaml file -

...

  1. By default, the data will be saved in the PVC with size of 50GB.

Long-Term Retention

To retain profiling data for a longer duration, additional configuration is required. Depending on the storage provider, configure one of the following options in the custom-values.yaml file:

  • For AWS S3 Storage:
    Add the necessary AWS S3 configuration to store profiles.

  • For GCP Bucket:
    Include the required GCP Bucket configuration configur

  • ation to store profiles data.

Info

Note: Profiles are stored in parquet format on AWS or GCP.

Code Block
pyroscope:
  pyroscope:
    # Add support for storage in s3 and gcs for saving profiles data
        
    # Additional configuration is needed depending on where the storage is hosted (AWS S3 or GCP GCS)
    # Choose the appropriate configuration based on your storage provider.
    
    # AWS S3 Configuration Instructions:
    # 1. Set the 'backend' to 's3'
    # 2. Configure the following S3-specific settings:
    #    - bucket_name: Name of your S3 bucket
    #    - region: AWS region where your bucket is located
    #    - endpoint: S3 endpoint for your region
    #    - access_key_id: Your AWS access key ID
    #    - secret_access_key: Your AWS secret access key
    #    - insecure: Set to true if using HTTP instead of HTTPS (not recommended for production)
    
    # Example AWS S3 configuration:
    config: |
      storage:
        backend: s3
        s3:
          bucket_name: your-bucket-name
          region: us-west-2
          endpoint: s3.us-west-2.amazonaws.com
          access_key_id: YOUR_ACCESS_KEY_ID
          secret_access_key: YOUR_SECRET_ACCESS_KEY
          insecure: false
    
    # GCP GCS Configuration Instructions:
    # 1. Set the 'backend' to 'gcs'
    # 2. Configure the following GCS-specific settings:
    #    - bucket_name: Name of your GCS bucket
    #    - service_account: JSON key file for your GCP service account
    
    # Prerequisites for GCP GCS:
    # - Create a GCP service account with access to the GCS bucket
    # - Download the JSON key file for the service account
    
    # Example GCP GCS configuration:
    config: |
      storage:
        backend: gcs
        gcs:
          bucket_name: your-gcs-bucket-name
          service_account: |
            {
              "type": "service_account",
              "project_id": "your-project-id",
              "private_key_id": "your-private-key-id",
              "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
              "client_email": "your-service-account-email@your-project-id.iam.gserviceaccount.com",
              "client_id": "your-client-id",
              "auth_uri": "https://accounts.google.com/o/oauth2/auth",
              "token_uri": "https://oauth2.googleapis.com/token",
              "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
              "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-service-account-email%40your-project-id.iam.gserviceaccount.com",
              "universe_domain": "googleapis.com"
            }    

Setup Kfuse-profiler agent to scrape the profiling data

Info

Prerequisite:

1. Ensure your Golang application exposes pprof endpoints.

2. In pull mode, the collector, Alloy, periodically retrieves profiles from Golang applications, specifically targeting the /debug/pprof/* endpoints.

3. If your go code is not setup to generate profiles, you need to setup Golang profiling as mentioned here (for Go Pull mode).

For java, follow the instructions here to setup profiling.

4. Alloy then queries the pprof endpoints of your Golang application, collects the profiles, and forwards them to the Kfuse Profiler server.

To setup scraping of data:

  1. Configure alloy scraper config in a new file alloy-values.yaml. Download a copy of the default alloy-values.yaml from here and customize the alloy configMap section following the instructions below.

Code Block
alloy:
  configMap:
    # -- Create a new ConfigMap for the config file.
    create: true
    # -- Content to assign to the new ConfigMap.  This is passed into `tpl` allowing for templating from values.
    content: |-
      // Write your Alloy config here:
      logging {
        level = "info"
        format = "logfmt"
      }
      discovery.kubernetes "pyroscope_kubernetes" {
        	role = "pod"
        }

        discovery.relabel "kubernetes_pods" {
        	targets = concat(discovery.kubernetes.pyroscope_kubernetes.targets)

        	rule {
        		action        = "drop"
        		source_labels = ["__meta_kubernetes_pod_phase"]
        		regex         = "Pending|Succeeded|Failed|Completed"
        	}

        	rule {
        		action = "labelmap"
        		regex  = "__meta_kubernetes_pod_label_(.+)"
        	}

        	rule {
        		action        = "replace"
        		source_labels = ["__meta_kubernetes_namespace"]
        		target_label  = "kubernetes_namespace"
        	}

        	rule {
        		action        = "replace"
        		source_labels = ["__meta_kubernetes_pod_name"]
        		target_label  = "kubernetes_pod_name"
        	}

        	rule {
        		action        = "keep"
        		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_scrape"]
        		regex = "true"
        	}

        	rule {
        		action        = "replace"
        		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_application_name"]
        		target_label = "service_name"
        	}

        	rule {
        		action        = "replace"
        		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_spy_name"]
        		target_label = "__spy_name__"
        	}

        	rule {
        		action        = "replace"
        		source_labels = ["__meta_kubernetes_pod_annotation_pyroscope_io_scheme"]
        		regex = "(https?)"
        		target_label = "__scheme__"
        	}

        	rule {
        		action        = "replace"
        		source_labels = ["__address__", "__meta_kubernetes_pod_annotation_pyroscope_io_port"]
        		regex = "(.+?)(?::\\d+)?;(\\d+)"
        		replacement = "$1:$2"
        		target_label = "__address__"
        	}

        	rule {
        		action = "labelmap"
        		regex  = "__meta_kubernetes_pod_annotation_pyroscope_io_profile_(.+)"
        		replacement = "__profile_$1"
        	}
        }
        pyroscope.scrape "pyroscope_scrape" {
        	clustering {
        		enabled = true
        	}

        	targets    = concat(discovery.relabel.kubernetes_pods.output)
        	forward_to = [pyroscope.write.pyroscope_write.receiver]

        	profiling_config {
        		profile.memory {
        			enabled = true
        		}
        
        		profile.process_cpu {
        			enabled = true
        		}
        
        		profile.goroutine {
        			enabled = true
        		}
        
        		profile.block {
        			enabled = false
        		}
        
        		profile.mutex {
        			enabled = false
        		}
        
        		profile.fgprof {
        			enabled = false
        		}
        	}
        }
        pyroscope.write "pyroscope_write" {
        	endpoint {
            url = "https://<KFUSE ENDPOINT/DNS NAME>/profile"
          }
        }
    # -- Name of existing ConfigMap to use. Used when create is false.
    name: null
    # -- Key in ConfigMap to get config from.
    key: null

  clustering:
    # -- Deploy Alloy in a cluster to allow for load distribution.
    enabled: false

    # -- Name for the Alloy cluster. Used for differentiating between clusters.
    name: ""

    # -- Name for the port used for clustering, useful if running inside an Istio Mesh
    portName: http

  # -- Minimum stability level of components and behavior to enable. Must be
  # one of "experimental", "public-preview", or "generally-available".
  stabilityLevel: "generally-available"

  # -- Path to where Grafana Alloy stores data (for example, the Write-Ahead Log).
  # By default, data is lost between reboots.
  storagePath: /tmp/alloy

  # -- Address to listen for traffic on. 0.0.0.0 exposes the UI to other
  # containers.
  listenAddr: 0.0.0.0

  # -- Port to listen for traffic on.
  listenPort: 12345

  # -- Scheme is needed for readiness probes. If enabling tls in your configs, set to "HTTPS"
  listenScheme: HTTP

  # --  Base path where the UI is exposed.
  uiPathPrefix: /

  # -- Enables sending Grafana Labs anonymous usage stats to help improve Grafana
  # Alloy.
  enableReporting: true

  # -- Extra environment variables to pass to the Alloy container.
  extraEnv: []

  # -- Maps all the keys on a ConfigMap or Secret as environment variables. https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.24/#envfromsource-v1-core
  envFrom: []

  # -- Extra args to pass to `alloy run`: https://grafana.com/docs/alloy/latest/reference/cli/run/
  extraArgs: []

  # -- Extra ports to expose on the Alloy container.
  extraPorts: []
  # - name: "faro"
  #   port: 12347
  #   targetPort: 12347
  #   protocol: "TCP"
  #   appProtocol: "h2c"

  mounts:
    # -- Mount /var/log from the host into the container for log collection.
    varlog: false
    # -- Mount /var/lib/docker/containers from the host into the container for log
    # collection.
    dockercontainers: false

    # -- Extra volume mounts to add into the Grafana Alloy container. Does not
    # affect the watch container.
    extra: []

  # -- Security context to apply to the Grafana Alloy container.
  securityContext: {}

  # -- Resource requests and limits to apply to the Grafana Alloy container.
  resources: {}

Configure these two blocks in the above Alloy configuration file:

1. pyroscope.write

2. pyroscope.scrape

1. Configure pyroscope.write block

The pyroscope.write block is used to define the endpoint where profiling data will be sent.

  1. Change url to https://<KFUSE ENDPOINT/DNS NAME>/profile.

  2. Change write_job_name to appropriate name like kfuse_profiler_write.

Code Block
pyroscope.write "write_job_name" {
    endpoint {
        url = "https://<KFUSE ENDPOINT/DNS NAME>/profile"
    }
}

2. Configure pyroscope.scrape block

The pyroscope.scrape block is used to define the scraping configuration for profiling data.

  1. Change scrape_job_name to appropriate name like kfuse_profiler_scrape.

  2. Use discovery.relabel.kubernetes_pods.output as a target for pyroscope.scrape block to discover kubernetes targets. Follow steps here to setup specific regex rules for discovering kubernetes targets.

Code Block
pyroscope.scrape "scrape_job_name" {
        targets    = concat(discovery.relabel.kubernetes_pods.output)
        forward_to = [pyroscope.write.write_job_name.receiver]

        profiling_config {
                profile.process_cpu {
                        enabled = true
                }

                profile.godeltaprof_memory {
                        enabled = true
                }

                profile.memory { // disable memory, use godeltaprof_memory instead
                        enabled = false
                }

                profile.godeltaprof_mutex {
                        enabled = true
                }

                profile.mutex { // disable mutex, use godeltaprof_mutex instead
                        enabled = false
                }

                profile.godeltaprof_block {
                        enabled = true
                }

                profile.block { // disable block, use godeltaprof_block instead
                        enabled = false
                }

                profile.goroutine {
                        enabled = true
                }
        }
}

Configuration Details

  • pyroscope.scrape:

    • Specifies the targets to scrape profiling data from.

    • The forward_to field connects the scrape job to the write job.

    • The profiling_config block enables or disables specific profiles:

      • profile.process_cpu: Enables CPU profiling.

      • profile.godeltaprof_memory: Enables delta memory profiling.

      • profile.godeltaprof_mutex: Enables delta mutex profiling.

      • profile.godeltaprof_block: Enables delta block profiling.

      • profile.goroutine: Enables goroutine profiling.

...

3. Applying the Configuration

  1. After adding the above blocks to the Alloy configuration file, save the changes.

  2. Install alloy in the namespace where you want to scrape the data from by following steps here.

  3. Update alloy using the alloy-values.yaml file which we setup above. Replace <namespace> with where you installed alloy in 2nd step above.

    Code Block
    helm upgrade --namespace <namespace> alloy grafana/alloy -f <path/to/alloy-values.yaml>