Senin, 23 September 2024

KUBERNETES-alert

 

Install and Configure Alertmanager with Slack Integration on Kubernetes

We are going to deploy Alertmanager to handle alerts sent by our Prometheus server. We will also configure Alertmanager to send alert notifications to our Slack channel using Incoming Webhooks.

Pre-requisites

We are using our Kubernetes homelab to deploy Alertmanager.

A working NFS server is required to create persistent volumes. Note that NFS server configuration is not covered in this article, but the way we set it up can be found here.

Our NFS server IP address is 10.11.1.20, and we have the following export configured for Alertmanager:

/mnt/storage-k8s/nfs/alertmanager

The owner:group of the NFS folder is set to 65534:65534, because of Alertmanager deployment runAsUser: 65534.

Alertmanager and Slack Integration with Incoming Webhooks

Alertmanager uses Incoming Webhooks to post messages into Slack. Add an Incoming Webhooks to your Slack workspace and copy the Webhook URL.

See here for more info: https://lisenet.slack.com/apps/A0F7XDUAZ-incoming-webhooks

Download Files from GitHub

Configuration files used in this article are hosted on GitHub. Clone the following repository:

$ git clone https://github.com/lisenet/kubernetes-homelab.git
$ cd ./kubernetes-homelab/

TLDR; Install and Configure Alertmanager: All in One Go

Create a monitoring namespace:

$ kubectl create ns monitoring

Add your Slack Webhook URL to the config map.

Create everything with a single command:

$ kubectl apply -f ./kubernetes/alertmanager/

Note to self: this can be a Helm chart.

Install and Configure Alertmanager: Step by Step

Step by step instructions. Note that this homelab project is under development, therefore please refer to GitHub for any source code changes.

Create a Namespace

Create a monitoring namespace:

$ kubectl create ns monitoring

Create a Cluster Role and a Service Account

$ kubectl apply -f ./kubernetes/alertmanager/alertmanager-cluster-role.yml

This is what the code looks like:

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: alertmanager
  labels:
    app: alertmanager
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: alertmanager
  labels:
    app: alertmanager
subjects:
  - kind: ServiceAccount
    name: alertmanager
    namespace: monitoring
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: alertmanager
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: alertmanager
  namespace: monitoring
  labels:
    app: alertmanager
secrets:
  - name: alertmanager-token

Create a Config Map

Configure Alertmanager to use Slack to send alerts. Add your Slack Webhook URL to the config map.

$ kubectl apply -f ./kubernetes/alertmanager/alertmanager-config-map.yml

This is what the code looks like:

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager
  namespace: monitoring
  labels:
    app: alertmanager
data:
  alertmanager.yml: |
    global: {}
    route:
      group_by: ['alertname', 'job']
      group_wait: 30s      # how long to wait to buffer alerts of the same group before sending a notification initially
      group_interval: 1h   # how long to wait before sending an alert that has been added to a group for which there has already been a notification
      repeat_interval: 30s # how long to wait before re-sending a given alert that has already been sent in a notification
      receiver: 'slack_homelab' # default/fallback request handler
      # Send severity=warning alerts to slack
      routes:
      - receiver: slack_homelab
        match:
          severity: warning      
    # See https://lisenet.slack.com/apps/A0F7XDUAZ-incoming-webhooks
    receivers:
    - name: 'slack_homelab'
      slack_configs:
      - api_url: https://hooks.slack.com/services/XYZXYZXYZ/ABCABCABC/1234567890
        channel: '#homelab'

Create a Persistent Volume

We want to keep Alertmanager data and store it on a persistent volume.

$ kubectl apply -f ./kubernetes/alertmanager/alertmanager-pv.yml

This is what the code looks like:

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv-alertmanager
  namespace: monitoring
  labels:
    app: alertmanager
spec:
  capacity:
    storage: 1Gi
  volumeMode: Filesystem
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: nfs
  mountOptions:
    - hard
    - nfsvers=4.1
  nfs:
    path: /mnt/storage-k8s/nfs/alertmanager
    server: 10.11.1.20

Create a Persistent Volume Claim

Allow Alertmanager to request persistent storage.

$ kubectl apply -f ./kubernetes/alertmanager/alertmanager-pvc.yml

This is what the code looks like:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc-alertmanager
  namespace: monitoring
  labels:
    app: alertmanager
spec:
  storageClassName: nfs
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi

Create a Deployment Configuration

$ kubectl apply -f ./kubernetes/alertmanager/alertmanager-deployment.yml

This is what the code looks like:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: alertmanager
  namespace: monitoring
  labels:
    app: alertmanager
spec:
  replicas: 1
  selector:
    matchLabels:
      app: alertmanager
  template:
    metadata:
      labels:
        app: alertmanager
    spec:
      serviceAccountName: alertmanager
      serviceAccount: alertmanager
      securityContext:
        runAsUser: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        fsGroup: 65534
      containers:
        - name: alertmanager
          image: prom/alertmanager:v0.21.0
          imagePullPolicy: IfNotPresent
          args:
            - '--config.file=/etc/config/alertmanager.yml'
            - '--storage.path=/data'
            - '--cluster.advertise-address=$(POD_IP):6783'
            - '--web.external-url=http://localhost:9093'
          ports:
            - containerPort: 9093
              protocol: TCP
          env:
            - name: POD_IP
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: status.podIP
          resources: {}
          volumeMounts:
            - name: alertmanager-config-volume
              mountPath: /etc/config
            - name: alertmanager-storage-volume
              mountPath: /data
          readinessProbe:
            httpGet:
              path: /-/ready
              port: 9093
              scheme: HTTP
            initialDelaySeconds: 30
            timeoutSeconds: 30
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 3
      # See https://kubernetes.io/docs/concepts/workloads/pods/init-containers/  
      #initContainers:
      #  - name: fix-nfs-permissions
      #    image: busybox
      #    command: ["sh", "-c", "chown -R 65534:65534 /data"]
      #    securityContext:
      #      runAsUser: 0
      #      runAsNonRoot: false
      #    volumeMounts:
      #      - name: alertmanager-storage-volume
      #        mountPath: /data
      restartPolicy: Always
      terminationGracePeriodSeconds: 30
      volumes:
        - name: alertmanager-config-volume
          configMap:
            name: alertmanager
            defaultMode: 420

        - name: alertmanager-storage-volume
          persistentVolumeClaim:
            claimName: nfs-pvc-alertmanager

Create a Service

$ kubectl apply -f ./kubernetes/alertmanager/alertmanager-service.yml

This is what the code looks like:

---
apiVersion: v1
kind: Service
metadata:
  name: alertmanager
  namespace: monitoring
  labels:
    app: alertmanager
spec:
  selector: 
    app: alertmanager
  type: NodePort  
  ports:
    - port: 9093
      targetPort: 9093
      nodePort: 32093

Check Monitoring Namespace

$ kubectl -n monitoring get all -l app=alertmanager
NAME                                READY   STATUS    RESTARTS   AGE
pod/alertmanager-7c698c7668-pjxds   1/1     Running   0          6d1h

NAME                   TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
service/alertmanager   NodePort   10.106.163.152   none          9093:32093/TCP   6d4h

NAME                           READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/alertmanager   1/1     1            1           6d4h

NAME                                      DESIRED   CURRENT   READY   AGE
replicaset.apps/alertmanager-7c698c7668   1         1         1       6d4h

Access Alertmanager Dashboard

We can access Alertmanager dashboard by using its service node port 32093.

Slack alerts should look something like this:

Tidak ada komentar:

Posting Komentar