Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
helm upgrade --install -n kfuse kfuse oci://us-east1-docker.pkg.dev/mvp-demo-301906/kfuse-helm/kfuse --version <SPECIFY VERSION HERE> -f custom_values.yaml

Upgrading to Kfuse version

...

The 2.7.3 version upgrade is a two step process.

  1. Set pinot.server.replicaCount to 0. Keep note of the original value of this field because we will set it back to that value in step 4.

  2. Run helm upgrade as usual.

  3. Make sure all pods and jobs have finished successfully.

  4. Set pinot.server.replicaCount back to its original value in the values.yaml file.

  5. Either run helm upgrade again.

    1. Alternatively, you could run kubectl scale sts pinot-server-realtime --replicas=N

Upgrading to Kfuse version from 2.7.1 to 2.7.2

There is no specific pre-upgrade or post-upgrade steps for 2.7.2 release. Please follow the upgrade command section.

Because this release changes the RBAC implementation, you may see numeric IDs in the email field of the users. To populate KloudFuse with correct emails, delete all users. KloudFuse recreates individual users as they log in, with correct email values.

We advise that you create new groups after completing this step. You can then assign users to groups, policies to users and groups, and so on.

Use gescript.py to migrate users using your organization’s values.yaml file.

Expand
titlegescript.py
Code Block
import yaml
import subprocess
import sys, requests
from requests.auth import HTTPBasicAuth
def construct_value(loader, node):
    if not isinstance(node, yaml.ScalarNode):
        raise yaml.constructor.ConstructorError(
            "while constructing a value",
            node.start_mark,
            "expected a scalar, but found %s" % node.id, node.start_mark
        )
    return str(node.value)
yaml.Loader.add_constructor(u'tag:yaml.org,2002:value', construct_value)

def load_values_yaml(file_path):
    with open(file_path, 'r') as file:
        try:
            values = yaml.load(file, Loader=yaml.Loader)  # Use the custom loader
            return values
        except yaml.YAMLError as e:
            print(f"Error parsing YAML file: {e}")
            sys.exit(1)

class RestClient:
    def __init__(self, base_url, username, password):
        self.base_url = base_url
        self.auth = HTTPBasicAuth(username, password)

    def _make_request(self, method, endpoint, data=None, headers=None):
        url = f"{self.base_url}{endpoint}"
        try:
            combined_headers = {
                'Accept': 'application/json',
                'Content-Type': 'application/json',
            }
            if headers:
                combined_headers.update(headers)

            response = requests.request(method, url, json=data, headers=combined_headers, auth = self.auth)
            response.raise_for_status()
            return response.json()
        except requests.exceptions.RequestException as e:
            print(f"Error: {e}")
            return None

    def create_group(self, group_name):
        endpoint = "/groups/"
        data = {"name": group_name}
        return self._make_request("POST", endpoint, data)
    
    def get_groups(self):
        endpoint = "/groups/"
        return self._make_request("GET", endpoint)
    
    def get_users(self):
        endpoint = "/users/"
        return self._make_request("GET", endpoint)

    def add_user_to_group(self, group_id, user_id):
        endpoint = f"/groups/{group_id}/users"
        data = {"userId": user_id, "Permission": "Member"}
        return self._make_request("POST", endpoint, data)
    
    def create_policy(self, name, scope):
        endpoint = "/policies/"
        data = {
            "name": name,
            "scope": scope
        }
        return self._make_request("POST", endpoint, data)
    
    def create_rbac_config(self, group_name, policy_name):
        endpoint = "/rbacconfig/"
        data = {
            "group": group_name,
            "policy": policy_name
        }
        return self._make_request("POST", endpoint, data)

def main(file_path):
    values = load_values_yaml(file_path)
    base_url = "https://pisco.kloudfuse.io/rbac/"  # Adjust base URL as needed
    client = RestClient(base_url, "admin", "password")

    config = values.get('user-mgmt-service', {})
    groups = config['config']['groups']
    policies = config['config']['rbac_policies']
    rbac_configs = config['config']['rbac_configs']

    users_db = client.get_users()   

    for group in groups:
        group_name = group['name']
        print(f"Creating group: {group_name}")
        client.create_group(group_name)

    groups_db = client.get_groups()

    for group in groups:
        group_id = None
        user_id = None
        group_name = group['name']
        for grp in groups_db:
            if grp['name'] == group_name:
                group_id = grp['id']
                print (f"Found group: {group_name} with id: {group_id}")
                break
        for user in group['users']:
            user_email = user['value']
            for usr in users_db:
                if usr['email'] == user_email:
                    user_id = usr['id']
                    print (f"Found user: {user_email} with id: {user_id}")
                    break
        print(f"Adding user: {user_email} to group: {group_name}")
        if group_id is not None and user_id is not None:
            response = client.add_user_to_group(group_id, user_id)
            if response:
                print("User added to group:", response)
            else:
                print("Failed to add user to group.")
        else:
            print("User not found.")

    for policy in policies:
        policy_name = policy['name']
        policy_scope = policy['scope']

        response = client.create_policy(policy_name, policy_scope)
        if response:
            print("Policy created:", response)
        else:
            print("Failed to create policy.")

    for rbac_config in rbac_configs:
        group_name = rbac_config['group']
        policy_name = rbac_config['policy']
        print (f"Creating RBAC config: {group_name} - {policy_name}")
        response = client.create_rbac_config(policy_name, group_name)
        if response:
            print("RBAC configuration created:", response)
        else:
            print("Failed to create RBAC configuration.")

if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: python gescript.py path-to-customer-values.yaml")
        sys.exit(1)

    file_path = sys.argv[1]
    main(file_path)

...

to 3.1.0

Pre-Upgrade

Because of the fix we did for the labels and labelselector for some of our components (kubernetes resources) to match with the rest, we need to run this command once before upgrading to 3.1.0 or TOT.

Code Block
kubectl delete deployments.apps catalog-service rulemanager advance-functions-service

Post-Upgrade

Restart Pinot Services

Code Block
kubectl rollout restart sts pinot-broker pinot-controller pinot-server-realtime pinot-server-offline

If logs archival/hydration service is enabled follow below instructions as well.

We moved hydration-service (HS) from a deployment to statefulset, post upgrade we need to manually delete the pod associated with it.

Since it is a deployment, HS pod will be running with a custom pod name (kubectl get pods | grep hydration-service) will fetch that.

Code Block
k delete pod hydration-service-<tag>

Upgrading to Kfuse version from 2.7.3 to 2.7.4

Pre-Upgrade

For rbac, before upgrading to 2.7.4, check for a blank user row in the user management tab in Admin tab. The login and email fields would be empty with a random id. Delete that row either through ui directly or by exec-ing into the configdb shell and run the following command. You can find the script to connect to rbacdb here

Code Block
./kfuse-postgres.sh kfuse-configdb-0 kfuse rbacdb 

rbacdb=# DELETE FROM users where email ISNULL and login ISNULL;
DELETE 1

Post-Upgrade

Code Block
kubectl rollout restart sts pinot-server-offline

kubectl port-forward --namespace kfuse deployments.apps/trace-query-service 8080:8080
curl -X POST http://localhost:8080/v1/trace/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "query { refreshServicesInApmStore(lookbackDays: 1) }"
  }'

Upgrading to Kfuse version from 2.7.2 to 2.7.3

The 2.7.3 version upgrade is a two step process.

  1. Set pinot.server.realtime.replicaCount to 0. Keep note of the original value of this field because we will set it back to that value in step 4.

  2. Run helm upgrade as usual.

  3. Make sure all pods and jobs have finished successfully.

  4. Set pinot.server.realtime.replicaCount back to its original value in the values.yaml file.

  5. Either run helm upgrade again.

  6. Alternatively, you could run kubectl scale sts pinot-server-realtime --replicas=N

See RBAC: Using the Admin Tab

Upgrading to Kfuse version from 2.7.1 to 2.7.2

  1. Because this release changes the RBAC implementation, you may see numeric IDs in the email field of the users. To populate Kloudfuse with correct emails, delete all users. Kloudfuse recreates individual users as they log in, with correct email values.

  2. We advise that you create new groups after completing this step. You can then assign users to groups, policies to users and groups, and so on.

See RBAC: Using the Admin Tab

Post-Upgrade

  1. Connect to rbacdb

Code Block
> ./kfuse-postgres.sh kfuse-configdb-0 kfuse rbacdb 

  1. Make a note of user_id with NULL value created during rbac migration

Code Block
rbacdb=# select id from users where grafana_id=NULL;

  1. Cleanup of empty users in the rbac db created during import from existing users.

Code Block
rbacdb=# delete from users where grafana_id=NULL;

  • For each user in the output of step 2, delete the user from group

Code Block
rbacdb=# delete from group_members where user_id='<user-id>';

The kfuse-postgres.sh script is available in the customer repository under scripts directory

Upgrading to Kfuse version from 2.7 to 2.7.1

...

Upgrading to Kfuse version from 2.6.7 to 2.6.8 or 2.7

Pre-upgrade steps:

Package upgrades to remove service vulnerabilities. Beforehelm upgrade you need to run a script that's related to the Kafka service. There will be some downtime between running the script and helm upgrade. You can find the script here

Edit the custom<custom_values.yaml yaml> file and move the block under kafka to kafka-broker section as follows

...

Note that Identity for Databases is introduced in Kfuse version 2.6.7. Database Identity only takes effect on new ingested APM-related data. In addition, timestamp granularities for APM/span data has been increased from millisecond to nanosecond to provide better accuracy in Trace Flamegraph/Waterfall. For older APM data to be rendered accurately, follow the instructions Converting old APM data to Kfuse 2.6.5 APM Service Identity format to convert old data to the new format.

Pre-Upgrade

  • SLO is reenabled in 2.6.7 with enhanced features.

Code Block
> ./kfuse-postgres.sh kfuse-configdb-0 kfuse slodb 

slodb=# drop table slodbs;

The kfuse-postgres.sh script is available in the customer repository under scripts directory

Post-Upgrade

  • There are few changes in the pinot database which requires Pinot server to be restarted post upgrade with following command

    Code Block
    kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime

...

Note that Service Identity for APM is introduced in Kfuse version 2.6.5. Service Identity only takes effect on new ingested APM-related data. Accordingly, old APM data will not get rendered properly in the UI. If older APM data is needed. Then follow the instructions Converting old APM data to Kfuse 2.6.5 APM Service Identity format to convert old data to the new format.

Pre-Upgrade

  • On Azure, The kfuse-ssd-offline storage class is changed toStandardSSD_LRS disk type. The kfuse-ssd-offline storage class needs to be deleted prior to upgrade to allow the new version to update the disk type. Note that if the installation is not on Azure, then this step can be skipped.

    Code Block
    kubectl delete storageclass kfuse-ssd-offline

Post-Upgrade

  • There are few changes in the pinot database which requires Pinot server to be restarted post upgrade with following command

    Code Block
    languagebash
    kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime

...

Upgrading to Kfuse version 2.6

Pre-Upgrade

  • A new kfuse-ssd-offline storage class has been introduced in Kfuse version 2.6. This storage class uses gp3 on AWS, pd-balanced on GCP, and Standard_LRS on Azure. This is now the default storage class for Pinot Offline Servers, which should give better disk IO performance.

  • If the custom values yaml is already set to use the specified disk type (e.g., kfuse-ssd-aws-gp3 or standard-rwo on GCP), then the remaining steps can be skipped.

  • If the custom values yaml does not explicitly set the pinot.server.offline.persistence.storageClass field or it is set to a different storage class. Ensure that the field is not set in the custom values yaml. Then run the following commands

    Code Block
    kubectl delete sts -n kfuse pinot-server-offline
    kubectl delete pvc -l app.kubernetes.io/instance=kfuse -l component=server-offline -n kfuse
  • Note that the above commands corresponds to the PVCs of the Pinot offline servers. After upgrade to Kfuse version 2.6, PVCs with the desired storage class will be created for the Pinot offline servers.

...



Upgrading to Kfuse version 2.5.3

Pre-Upgrade

  • Based on observation and feedback, it seems the current persistent volume size for zookeeper pods is getting full quite often. To remediate that we have increased the default size of all zookeeper pods to 32Gi. It needs changes in two places as shown below

Code Block
kafka:
  # zookeeper - Configuration for Kafka's Zookeeper.
  zookeeper:
    persistence:
      size: 32Gi
      .
      .
      .
pinot:
  # zookeeper - Configuration for Pinot's Zookeeper.
  zookeeper:
    persistence:
      size: 32Gi

  • Please update this prior to update with the resize_pvc.sh script. Please reach out if you need assistance.

Post-Upgrade

  • There are few changes in the pinot database which requires the some of the services to be restarted post upgrade with following command

Code Block
languagebash
kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime pinot-controller pinot-broker logs-parser logs-query-service
kubectl rollout restart deployment -n kfuse logs-transformer trace-transformer trace-query-service

Upgrading to Kfuse version 2.5.0

Post-Upgrade

  • There are few changes in the pinot database which requires the some of the services to be restarted post upgrade with following command

Code Block
languagebash
kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime pinot-controller pinot-broker logs-parser logs-query-service
kubectl rollout restart deployment -n kfuse logs-transformer

Upgrading to Kfuse version 2.2.4

Post-Upgrade

  • There are few changes in the pinot database which requires the pinot-* servers to be restarted post upgrade with following command

Code Block
kubectl rollout restart sts -n kfuse pinot-server-offline pinot-server-realtime pinot-controller pinot-broker

Upgrading to Kfuse version 2.2.3

Pre-Upgrade

  • The default value for pinot zookeeper persistence (PVC) value is now 32Gi.

  • If the existing Kfuse installation is using the default value (i.e., custom_values.yaml did not explicitly specify the persistence size for pinot zookeeper), update the pinot zookeeper persistence (PVC) value to 16Gi. Add the following snippet under pinot.zookeeper section:

Code Block
persistence:
  size: 16Gi

Upgrading from Kfuse version 2.1 or earlier

Post-Upgrade

  • Kloudfuse provided alerts organization has been updated for better maintenance. Make sure to remove old version.

Code Block
# Connect to kfuse cluster and log in to catalog service pod
kubens kfuse
kubectl exec -it catalog-servicexxx -- bash
# Remove older folders.
python3 /catalog_service/catalog.py --remove_installed --list kloudfuse,kloudfuse_alerts,kubernetes_alerts --artifact_type alerts

Upgrading from Kfuse version 2.0.1 or earlier

Post-Upgrade

  • The Kfuse-provisioned dashboard has been cleaned up. Run the following command:

Code Block
kubectl -n kfuse exec -it kfuse-configdb-0 -- bash -c "PGDATABASE=alertsdb PGPASSWORD=\$POSTGRES_PASSWORD psql -U postgres -c 'delete from dashboard_provisioning where name='\''hawkeye-outliers-resources'\'';'; "

Upgrading from Kfuse version 1.3.4 or earlier

Pre-Upgrade

  • Note: Kfuse services will go offline during this process. Kloudfuse storage class configuration has been simplified to keep future releases/features in mind. This requires running the migrate_storage_class.sh script provided by Kloudfuse team.

    Code Block
    ./migrate_storage_class.sh

    After running the script, ensure that the pvc’s storage class has kfuse-ssd, instead of kfuse-ssd-gcp or kfuse-ssd-aws.

    Code Block
    kubectl get pvc -n kfuse
  • Old alerts have to be removed. Kloudfuse alerts organization has changed with the introduction of additional alerts. New version does the organization automatically, however, the older alerts have to be removed.

    • Manually remove all alerts by navigating to the grafana tab and remove all alerts from kloudfuse_alerts and kubernetes_alerts folder.

...

View file
namemigrate_storage_class.sh

Post-Upgrade

  • Older kubernetes secret related configuration needs to be removed from the custom values.yaml file. Also kfuse-credentials secret can be removed.

Code Block
auth:
  config:
    AUTH_TYPE: "google"
    AUTH_COOKIE_MAX_AGE_IN_SECONDS: 259200
  auth:
    existingAdminSecret: "kfuse-credentials"
    existingSecret: "kfuse-credentials"

  • There is a schema change introduced in the traces table. Make sure to restart the Pinot servers after upgrade completes.

Code Block
kubectl rollout restart sts -n kfuse pinot-server-realtime
kubectl rollout restart sts -n kfuse pinot-server-offline

Upgrading from Kfuse version 1.3.2 or earlier

Post-Upgrade

  • There is a schema change introduced in version 1.3.3. Make sure to restart the Pinot servers after upgrade completes.

Code Block
kubectl rollout restart sts -n kfuse pinot-server-realtime
kubectl rollout restart sts -n kfuse pinot-server-offline

Upgrading from Kfuse version 1.2.1 or earlier

Pre-Upgrade

  • Advance monitoring is available as an optional component in 1.3 and later releases. To enable:

    • Knight agent is required to be installed. Please review steps/settings here.

    • Additional agent settings required. Please review settings here.

  • Starting Kfuse version 1.3.0, Kfuse has added retention support using Pinot Minion framework. This feature requires changes to the existing Pinot Minion statefulset. The statefulset needs to be deleted prior to upgrade.

Code Block
kubectl delete sts -n kfuse pinot-minion

  • Existing alerts have to be updated with new, efficient version. Please follow these steps to refresh the alerts.

    • Go to Kloudfuse UI → Alerts → Alert Rules

    • Using the filter panel on the left, expand “Component” item, and filter all the “Kloudfuse” and “Kubernetes” alerts to include only these. Remove each of these alerts from the filtered list. (post upgrade the new alerts will be installed automatically)

...

A breaking change related to the number of postgresql servers installed as part of the Kfuse install was introduced after Kfuse version 1.1.0. Due to this, the stored alerts will be deleted if directly upgrading Kfuse. In order to retain the stored alerts, run the following pre and post upgrade steps below. Note that until the post-upgrade steps are executed, alerts & dashboards from pre-upgrade will not show up.

Pre-Upgrade

Code Block
kubectl exec -n kfuse alerts-postgresql-0 --  bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_dump -U postgres -F c alertsdb' > alertsdb.tar

 

Post-Upgrade

Code Block
kubectl cp -n kfuse alertsdb.tar kfuse-configdb-0:/tmp/alertsdb.tar
kubectl exec -n kfuse kfuse-configdb-0 --  bash -c 'PGPASSWORD=$POSTGRES_PASSWORD pg_restore -U postgres -Fc --clean --if-exists -d alertsdb < /tmp/alertsdb.tar'
kubectl delete pvc -n kfuse data-alerts-postgresql-0
kubectl delete pvc -n kfuse data-beffe-postgresql-0
kubectl delete pvc -n kfuse data-fpdb-postgresql-0

...