...
create an app password in gmail. You will need to be using 2FA for the account (grafana_alerts@domain.com) to be able to create app password. https://support.google.com/mail/answer/185833?hl=en . Note down the app password as you will need it in step 3.
make sure you are connected to the cluster where kloudfuse stack is installed and you are in kfuse namespace
Code Block # connect to your cluster kubectx <cluster-name> kubens kfuse
create a kubernetes secret with the username and password you created in step 1.
Code Block kubectl create secret generic grafana-smtp-user-password --from-literal=user=grafana_alerts@domain.com --from-literal=password=<generated-app-password>
edit the values.yaml to uncomment settings related to smtp in grafana section (to look like the snippet below). Update following settings:
update host to your smtp mail server
update from_address to the smtp user you want to use
update from_name if needed
Code Block grafana: grafana: # grafana.ini - Grafana server configuration settings grafana.ini: ... # start -- Uncomment the following to enable smtp smtp: enabled: true host: your_smtp_hostname_colon_port skip_verify: true from_address: your_smtp_user@domain.com from_name: AlertsAdmin envValueFrom: GF_SMTP_USER: secretKeyRef: name: grafana-smtp-user-password key: user GF_SMTP_PASSWORD: secretKeyRef: name: grafana-smtp-user-password key: password # Uncomment the following to enable smtp -- end
issue the same kfuse helm install command which you used to install kfuse cluster again.
Code Block helm upgrade --create-namespace --install kfuse . -f [gcp|aws].yaml -f custom_values.yaml --set global.orgId=<your-company-name>
Please make sure to update the default email address in grafana-default-email otherwise
Setting Notifications to PagerDuty
...
Navigate to Grafana tab in the Kloudfuse UI.
Create a OpsGenie-Grafana integration with steps in the https://support.atlassian.com/opsgenie/docs/integrate-opsgenie-with-grafana/
After completing the steps, navigate to Notifications Policies in the Grafana
...
Create a New Nested Policy for OpsGenie contact point
...
Alerts → Contact Points on Kloudfuse UI.
Choose Create Contact Point and fill the required details
Now use the contact point from Kloudfuse UI to any of the alert.
...
.
Setting up Google
...
Chat contact point
Create a new Google Workspace space for alertinghttps://support.google.com/a/users/answer/9300611?hl=en or you can use an existing space.
Create an incoming webhook for space https://developers.google.com/chat/how-tos/webhooks#create_a_webhook
Navigate to Grafana Alerts tab in the Kloudfuse UI. Select Contact Point and Click on New Contact Point.
...
Create a new Google Hangout Chat Integration and paste the webhook link in the URL section.
...
After completing the above steps, navigate to Notifications Policies in the Grafana
...
Create a New Nested Policy for Google Hangouts contact point
...
Choose Create Contact Point and fill the required details
Now use the contact point from Kloudfuse UI to any of the alert.
...
Kloudfuse Provided Out of the box control plane alerts
Kloudfuse provides a number of out of the box alerts for getting the stats for data plane these alerts thresholds or other parameters can be updated as per each deployment. These alerts are part of kfuse-cp
folder in alerts. Following are the default thresholds for these alerts.
Type | Check | Alert Condition |
---|---|---|
Kubernetes Pods | In Failed state | For 5 mins |
Restarting multiple times | For 5 mins | |
CrashLoopBackOff | For 5 mins | |
Deployments | Lesser replicas than desired | For 15 mins |
Statefulsets | Lesser replicas than desired | For 15 mins |
Nodes | Unschedulable | For 10 mins |
Not Ready | For 5 mins | |
High CPU Usage | > 90% for 5 mins | |
Disk Usage | > 90% for 5 mins | |
Data Lake (pinot) | Segments in error condition | > 0 for 5 mins |
Segment creation threshold breached | > 10 mins | |
Persistent Volumes | Current Usage | > 90% |
Forecast Usage | Notify when it will run out of space | |
Agent/Collector | Not sending data | For 5 mins |
These alerts do not have any default contact point associated with them. The contact point for these alerts need to be updated as per each deployment requirement.