Configure GCP/AWS/Azure Object Store for Pinot
Please add the following configurations in the custom_values.yaml
to be used with the helm installation of Kfuse. Make sure the deepstore is in the same region as where the compute instances for kfuse stack are running.
GCP Configuration
Option 1: Using a service account key
Prerequisites
Download the key from GCP console for the GCP service account.
Create Kubernetes secret with GCP secret credentials that allows access to GCS bucket. Ensure that the file is named
secretKey
kubectl create secret generic pinot-sd-secret --from-file=./secretKey -n kfuse
Helm values
Add the following values in the
custom_values.yaml
. Replace the GCS details accordingly.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "gcs"
useSecret: true
createSecret: false
secretName: "pinot-sd-secret"
## bucket for deep storage
dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data"
gcs:
projectId: "REPLACE PROJECT ID HERE"
Option 2: Using Google Cloud Workload Identity
Prerequisites
Follow the steps in https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity to create and associate service account with the GKE cluster.
In step 5, use ROLE_NAME as
roles/storage.admin
You can also create a custom role with following permissions,storage.buckets.get
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.getIamPolicy
storage.objects.list
storage.objects.update
In step 6 & 7, use NAMESPACE as kfuse (or namespace where Kloudfuse is deployed) and KSA_NAME as default
Please create the kfuse namespace if you have not created.
Helm values
Add the following values in the
custom_values.yaml
. Replace the GCS details accordingly.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "gcs"
useSecret: false
createSecret: false
## bucket for deep storage
dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data"
gcs:
projectId: "REPLACE PROJECT ID HERE"
AWS Configuration
Pinot needs an IAM policy with read and write permissions to the S3 bucket for deep storage. Currently, kfuse supports two options for consuming this policy.
Option 1: Using an IAM User secret access key
Refer to the AWS document https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html for creating an IAM user. Ensure that the user has the IAM policy for reading and writing the S3 bucket for deep storage. Once the IAM user is created, generate an access key credentials. Note the access key
and secret key
.
Add the following values in the
custom_values.yaml
. Replace the S3 details accordingly with the. Note thatcreateSecret
anduseSecret
are set to true andaccessKey
andsecretKey
refers to the credentials of the corresponding IAM user.
Option 2: Attach the IAM policy to the NodeInstanceRole of the EKS cluster node group
Attach the IAM policy to the NodeInstanceRole for the node that the Kloudfuse stack is installed on. On an existing EKS cluster, the NodeInstanceRole can be access from the EKS console under the corresponding EKS cluster’s node group detail page, under the Node IAM role ARN
section.
Add the following values in the
custom_values.yaml
. Replace the S3 details accordingly with the. Note thatcreateSecret
anduseSecret
are set to false.
Option 3: Use a Kubernetes ServiceAccount resource that assume an IAM role
Ensure that the Kubernetes cluster has a ServiceAccount that is associated with an IAM role with permissions to read/write from S3. Refer to https://docs.aws.amazon.com/eks/latest/userguide/associate-service-account-role.html for creating a ServiceAccount.
Ensure that
pinot
is configured to use the ServiceAccount and deepStore is configured accordingly in thecustom_values.yaml
. Make sureuseSecret
andcreateSecret
is set to false.
Azure Configuration
Prerequisites
The storage account needs to have Azure Data Lake Storage Gen 2 enabled. This can be done by selecting
Enable hierarchical namespace
in the Advanced section when creating a storage account:Access Key can be found by navigating to
Access Keys
from the left pane of the storage account, under the security + networking section. In the screenshot below, an example kfuse storage account is used.fileSystemName refers to the container name. A container can be created by navigating to the
Containers
from the left pane of the storage account, under the Data storage section.
Helm values
Add the following values in the
custom_values.yaml
. Replace the Azure Data Lake details accordingly.