Configure GCP/AWS/Azure Object Store for Pinot
Please add the following configurations in the custom_values.yaml to be used with the helm installation of Kfuse. Make sure the deepstore is in the same region as where the compute instances for kfuse stack are running.
GCP Configuration
Option 1: Using a service account key
Prerequisites
Download the key from GCP console for the GCP service account.
Create Kubernetes secret with GCP secret credentials that allows access to GCS bucket. Ensure that the file is named
secretKey
kubectl create secret generic pinot-sd-secret --from-file=./secretKey -n kfuseHelm values
Add the following values in the
custom_values.yaml. Replace the GCS details accordingly.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "gcs"
useSecret: true
createSecret: false
secretName: "pinot-sd-secret"
## bucket for deep storage
dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data"
gcs:
projectId: "REPLACE PROJECT ID HERE"Option 2: Using Google Cloud Workload Identity
Prerequisites
Follow the steps in
Authenticate to Google Cloud APIs from GKE workloads | GKE security to create and associate service account with the GKE cluster.
In step 5, use ROLE_NAME as
roles/storage.admin
You can also create a custom role with following permissions,storage.buckets.getstorage.objects.createstorage.objects.deletestorage.objects.getstorage.objects.getIamPolicystorage.objects.liststorage.objects.update
In step 6 & 7, use NAMESPACE as kfuse (or namespace where Kloudfuse is deployed) and KSA_NAME as default
Please create the kfuse namespace if you have not created.
Helm values
Add the following values in the
custom_values.yaml. Replace the GCS details accordingly.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "gcs"
useSecret: false
createSecret: false
## bucket for deep storage
dataDir: "gs://[REPLACE BUCKET HERE]/kfuse/controller/data"
gcs:
projectId: "REPLACE PROJECT ID HERE"AWS Configuration
Pinot needs an IAM policy with read and write permissions to the S3 bucket for deep storage. Currently, kfuse supports two options for consuming this policy.
Option 1: Using an IAM User secret access key
Refer to the AWS document Create an IAM user in your AWS account - AWS Identity and Access Management for creating an IAM user. Ensure that the user has the IAM policy for reading and writing the S3 bucket for deep storage. Once the IAM user is created, generate an access key credentials. Note the
access key and secret key.
Add the following values in the
custom_values.yaml. Replace the S3 details accordingly with the. Note thatcreateSecretanduseSecretare set to true andaccessKeyandsecretKeyrefers to the credentials of the corresponding IAM user.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "s3"
## Set this `useSecret` to false if secret does not need to be passed. Typically used when
## underlying have access to deep store using node level access credentials.
useSecret: true
## If set, a secret will be created with provided credentials.
createSecret: true
## bucket for deep storage
dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data"
# serverSideEncryption: "aws:kms"
# (Optional) The server-side encryption algorithm used when storing this object in Amazon S3 i.e. aws:kms
# ssekmsKeyId: ""
# (Optional, but required when serverSideEncryption=aws:kms) Specifies the AWS KMS key ID to use for object encryption.
# ssekmsEncryptionContext: ""
# (Optional) Specifies the AWS KMS Encryption Context to use for object encryption.
# The value of this header is a base64-encoded UTF-8 string holding JSON with the encryption context key-value pairs.
## Fill in aws s3 credentials
s3:
region: "YOUR REGION"
accessKey: "YOUR AWS ACCESS KEY"
secretKey: "YOUR AWS SECRET KEY"Option 2: Attach the IAM policy to the NodeInstanceRole of the EKS cluster node group
Attach the IAM policy to the NodeInstanceRole for the node that the Kloudfuse stack is installed on. On an existing EKS cluster, the NodeInstanceRole can be access from the EKS console under the corresponding EKS cluster’s node group detail page, under the Node IAM role ARN section.
Add the following values in the
custom_values.yaml. Replace the S3 details accordingly with the. Note thatcreateSecretanduseSecretare set to false.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "s3"
## Set this `useSecret` to false if secret does not need to be passed. Typically used when
## underlying have access to deep store using node level access credentials.
useSecret: false
## If set, a secret will be created with provided credentials.
createSecret: false
## bucket for deep storage
dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data"
## Fill in aws s3 credentials
s3:
region: "YOUR REGION"Option 3: Use a Kubernetes ServiceAccount resource that assume an IAM role
Ensure that the Kubernetes cluster has a ServiceAccount that is associated with an IAM role with permissions to read/write from S3. Refer to Assign IAM roles to Kubernetes service accounts - Amazon EKS for creating a ServiceAccount.
Ensure that
pinotis configured to use the ServiceAccount and deepStore is configured accordingly in thecustom_values.yaml. Make sureuseSecretandcreateSecretis set to false.
pinot:
serviceAccountName: <REPLACE SERVICE ACCOUNT NAME HERE>
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "s3"
## Set this `useSecret` to false if secret does not need to be passed. Typically used when
## underlying have access to deep store using node level access credentials.
useSecret: false
## If set, a secret will be created with provided credentials.
createSecret: false
## bucket for deep storage
dataDir: "s3://[REPLACE BUCKET HERE]/kfuse/controller/data"
## Fill in aws s3 credentials
s3:
region: "YOUR REGION"Azure Configuration
Prerequisites
The storage account needs to have Azure Data Lake Storage Gen 2 enabled. This can be done by selecting
Enable hierarchical namespacein the Advanced section when creating a storage account:Access Key can be found by navigating to
Access Keysfrom the left pane of the storage account, under the security + networking section. In the screenshot below, an example kfuse storage account is used.fileSystemName refers to the container name. A container can be created by navigating to the
Containersfrom the left pane of the storage account, under the Data storage section.
Helm values
Add the following values in the
custom_values.yaml. Replace the Azure Data Lake details accordingly.
pinot:
# deepStore - Enable/disable storing of Pinot segments in deep store
deepStore:
enabled: true
type: "adl2"
## bucket for deep storage
dataDir: "adl2://[REPLACE CONTAINER NAME HERE]/kfuse/controller/data"
## Fill in Azure Data Lake Storage credentials
adl2:
accountName: "YOUR AZURE STORAGE ACCOUNT NAME"
accessKey: "STORAGE ACCOUNT ACCESS KEY"
fileSystemName: "STORAGE ACCOUNT CONTAINER NAME"