Rehydration of Segments from Deep Store

Consider the following scenario. An old Kloudfuse installation is decommissioned and a new Kloudfuse installation is deployed. Segments from the old Kloudfuse installation stored in a deep store can be loaded into the new installation. Refer to the following steps:

 

Note that the deep store location for the new installation must be on a different path from the old installation. The command also assumes that the Pinot servers on the new Kloudfuse installation has permissions to read from the old deep store location.

 

kubectl port-forward pinot-controller-0 -n kfuse 9000:9000 # For each table (kf_metrics, kf_logs, kf_traces, kf_traces_errors, kf_events) run the following: curl -X POST --fail -H "Content-Type: application/json" -H "TABLE_TYPE:REALTIME" -H "UPLOAD_TYPE:BATCH" -H "DOWNLOAD_URI:<OLD DEEPSTORE PATH>/controller/data/<TABLE NAME>" -v "http://localhost:9000/v2/segments?tableName=<TABLE NAME>&tableType=REALTIME&enableParallelPushProtection=false&allowRefresh=false"

Do not delete the older deepstore folder (AWS/GCP/Azure). The new Kloudfuse installation will download the segments from older deeptstore location, but will still have a reference to it. So, deleting the older deepstore folder will lead to data loss.

Note that retention on the new cluster will not reset, but will rather be computed from the time when the data was initially ingested into the cluster (older installation). For instance, if a log line was ingested on Apr 7th 2024 for the first time into a cluster. And the segment was rehydrated into a new cluster installation on May 6th 2024. If the retention period on the new cluster is set to 1 month, then the log line will be deleted on May 7th 2024 (1 month from initial ingestion time - Apr 7th 2024).

 

Please use the script to monitor the status of rehydration.