Configure GCP/AWS/Azure Object Store for Pinot

 

Please add the following configurations in the custom_values.yaml to be used with the helm installation of Kfuse. Make sure the deepstore is in the same region as where the compute instances for kfuse stack are running.

GCP Configuration

Option 1: Using a service account key

Prerequisites

  • Download the key from GCP console for the GCP service account.

  • Create Kubernetes secret with GCP secret credentials that allows access to GCS bucket. Ensure that the file is named secretKey

kubectl create secret generic pinot-sd-secret --from-file=./secretKey -n kfuse

Helm values

  • Add the following values in the custom_values.yaml. Replace the GCS details accordingly.

pinot: # deepStore - Enable/disable storing of Pinot segments in deep store deepStore: enabled: true type: "gcs" useSecret: true createSecret: false secretName: "pinot-sd-secret" ## bucket for deep storage dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data" gcs: projectId: "REPLACE PROJECT ID HERE"

Option 2: Using Google Cloud Workload Identity

Prerequisites

You can also create a custom role with following permissions,
storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.list
storage.objects.update

  • In step 6 & 7, use NAMESPACE as kfuse (or namespace where Kloudfuse is deployed) and KSA_NAME as default

Please create the kfuse namespace if you have not created.

Helm values

  • Add the following values in the custom_values.yaml. Replace the GCS details accordingly.

pinot: # deepStore - Enable/disable storing of Pinot segments in deep store deepStore: enabled: true type: "gcs" useSecret: false createSecret: false ## bucket for deep storage dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data" gcs: projectId: "REPLACE PROJECT ID HERE"

AWS Configuration

Pinot needs an IAM policy with read and write permissions to the S3 bucket for deep storage. Currently, kfuse supports two options for consuming this policy.

Option 1: Using an IAM User secret access key

Refer to the AWS document Create an IAM user in your AWS account - AWS Identity and Access Management for creating an IAM user. Ensure that the user has the IAM policy for reading and writing the S3 bucket for deep storage. Once the IAM user is created, generate an access key credentials. Note the access key and secret key.

  • Add the following values in the custom_values.yaml. Replace the S3 details accordingly with the. Note that createSecret and useSecret are set to true and accessKey and secretKey refers to the credentials of the corresponding IAM user.

Option 2: Attach the IAM policy to the NodeInstanceRole of the EKS cluster node group

Attach the IAM policy to the NodeInstanceRole for the node that the Kloudfuse stack is installed on. On an existing EKS cluster, the NodeInstanceRole can be access from the EKS console under the corresponding EKS cluster’s node group detail page, under the Node IAM role ARN section.

  • Add the following values in the custom_values.yaml. Replace the S3 details accordingly with the. Note that createSecret and useSecret are set to false.

Option 3: Use a Kubernetes ServiceAccount resource that assume an IAM role

Ensure that the Kubernetes cluster has a ServiceAccount that is associated with an IAM role with permissions to read/write from S3. Refer to Assign IAM roles to Kubernetes service accounts - Amazon EKS for creating a ServiceAccount.

  • Ensure that pinot is configured to use the ServiceAccount and deepStore is configured accordingly in the custom_values.yaml. Make sure useSecret and createSecret is set to false.

Azure Configuration

Prerequisites

  • The storage account needs to have Azure Data Lake Storage Gen 2 enabled. This can be done by selecting Enable hierarchical namespace in the Advanced section when creating a storage account:

  • Access Key can be found by navigating to Access Keys from the left pane of the storage account, under the security + networking section. In the screenshot below, an example kfuse storage account is used.

  • fileSystemName refers to the container name. A container can be created by navigating to the Containers from the left pane of the storage account, under the Data storage section.

Helm values

  • Add the following values in the custom_values.yaml. Replace the Azure Data Lake details accordingly.