Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
ingress-nginx:
  controller:
    config:
      proxy-body-size: <REPLACE THE BODY SIZE HERE, e.g., 8m. Setting to 0 will disable any limit.>

Pinot

Pinot Server Realtime Pods in Crash Loop Back Off

Symptoms

  • Container logs shows the following JFR initialization errors:

    Code Block
    jdk.jfr.internal.dcmd.DCmdException: Could not use /var/pinot/server/data/jfr as repository. Unable to create JFR repository directory using base location (/var/pinot/server/data/jfr)Error occurred during initialization of VM
    Failure when starting JFR on_create_vm_2
  • Pinot server realtime disk usage is at 100%.

Resolution

  • In Kfuse version 2.6.5 or earlier

  • From Kfuse version 2.6.7 onwards, there is no need to resize the pinot server realtime disks. Follow the following steps.

    1. Restart pinot-server-offline.

    2. Edit pinot-server-realtime sts remove or set BALLOON_DISK env variable to false.

    3. Wait for pinot server realtime to start up and has complete moving segments to offline servers.

    4. Edit pinot-server-realtime sts to add back BALLOON_DISK env variable to true.

DeepStore access issues

Symptoms

  • Pinot-related jobs are stuck in crash loop back-off (e.g., kfuse-set-tag-hook, pinot-metrics-table-creation, etc).

  • Pinot-controller logs deep store access-related exception.

    • On AWS S3, the exception has the following format

      Code Block
      Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403, Request ID: MAYE68P6SYZMTTMP, Extended Request ID: L7mSpEzHz9gdxZQ8iNM00jKtoXYhkNrUzYntbbGkpFmUF+tQ8zL+fTpjJRlp2MDLNvhaVYCie/Q=)

Resolution

  • Refer to Configure GCP/AWS/Azure Object Store for Pinot for setting the access for Pinot.

    • On GCP, ensure that the secret has correct access to the cloud storage bucket.

    • On AWS S3, if the node does not have permission to the S3 bucket, then ensure that the access key and secret access key is populated

      Code Block
      pinot:
          deepStore:
            enabled: true
            type: "s3"
            useSecret: true
            createSecret: true
            dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data"
            s3:
              region: "YOUR REGION"
              accessKey: "YOUR AWS ACCESS KEY"
              secretKey: "YOUR AWS SECRET KEY"
  • If Pinot has the correct access credentials to the deep store, then the configured bucket will have the directory created that matches the dataDir.

...

If you find out that the PV usage has reached 100% and cannot be restarted gracefully, you need to increase the pvc size of pinot-realtime pvcs by 10% or so to accommodate the increased requirement and restart the pinot-server offline & realtime.

...

Duplicate logs show up in Kfuse stack

Symptoms

You notice that there are duplicate logs with the same timestamp and log event in kfuse stack. But if you check the application logs (either on the host or in the container), there is no evidence of duplication. This issue happens only when the agent is Fluent-Bit.

Resolution

If you look at Fluent-Bit logs you’ll notice the following error in the logs:

...

MELT data ingested from Datadog agent is missing the kube_cluster_name label.

Resolution

There is a known issue in Datadog agent cluster name detection that requires the cluster agent to be up. If the agent starts up before the cluster agent, then it fails to detect the cluster name. See https://github.com/DataDog/datadog-agent/issues/24406 .

...

kubectl rollout restart daemonset datadog-agent

Access denied while creating Alert / Contact point

Symptom

A non admin (SSO user) may get a permission error when creating Alert / Contact point as follows.

{"accessErrorId":"ACE0947587429","message":"You'll need additional permissions to perform this action. Permissions needed: any of alert.notifications:write","title":"Access denied"}

or

{"accessErrorId":"ACE3104889351","message":"You'll need additional permissions to perform this action. Permissions needed: any of alert.provisioning:read, alert.provisioning.secrets:read","title":"Access denied"}

Resolution

Workaround: Login as admin to create contact point / alert as an SSO user isn’t provided permissions to create contact points or alerts manually.