實操Install Victoriametrics in K8s
背景
之前給大家介紹了victoriametrics以及安裝中的一些注意事項,今天來給大家實操一下,如何在k8s中進行安裝。本次是基于云上的k8s上安裝一個cluster版本的victoriametrics,需要使用到云上的負載均衡。
注:victoriametrics后續(xù)簡稱vm
安裝準備
- 一個k8s集群,我的k8s版本是v1.20.6
- 在集群上準備好一個storageclass,我這里用的NFS來做的
- operator鏡像tag為v0.17.2,vmstorage、vmselect和vminsert鏡像tag為v1.63.0??商崆袄$R像保存到本地鏡像倉庫
安裝須知
vm可以通過多種方式安裝,如二進制、docker鏡像以及源碼??筛鶕?jù)場景進行選擇。如果在k8s中進行安裝,我們可以直接使用operator來進行安裝。下面重點說一下安裝過程中的一些注意事項。
一個最小的集群必須包含以下節(jié)點:
- 一個vmstorage單節(jié)點,另外要指定-retentionPeriod和-storageDataPath兩個參數(shù)
- 一個vminsert單節(jié)點,要指定-storageNode=
一個vmselect單節(jié)點,要指定-storageNode= 注:高可用情況下,建議每個服務(wù)至少有個兩個節(jié)點
在vmselect和vminsert前面需要一個負載均衡,比如vmauth、nginx。這里我們使用云上的負載均衡。同時要求:
- 以/insert開頭的請求必須要被路由到vminsert節(jié)點的8480端口
- 以/select開頭的請求必須要被路由到vmselect節(jié)點的8481端口注:各服務(wù)的端口可以通過-httpListenAddr進行指定
建議為集群安裝監(jiān)控
如果是在一個主機上進行安裝測試集群,vminsert、vmselect和vmstorage各自的-httpListenAddr參數(shù)必須唯一,vmstorage的-storageDataPath、-vminsertAddr、-vmselectAddr這幾個參數(shù)必須有唯一的值。
當vmstorage通過-storageDataPath目錄大小小于通過-storage.minFreeDiskSpaceBytes指定的可用空間時,會切換到只讀模式;vminsert停止像這類節(jié)點發(fā)送數(shù)據(jù),轉(zhuǎn)而將數(shù)據(jù)發(fā)送到其他可用vmstorage節(jié)點
安裝過程
安裝vm
1、創(chuàng)建crd
- # 下載安裝文件
- export VM_VERSION=`basename $(curl -fs -o/dev/null -w %{redirect_url} https://github.com/VictoriaMetrics/operator/releases/latest)`
- wget https://github.com/VictoriaMetrics/operator/releases/download/$VM_VERSION/bundle_crd.zip
- unzip bundle_crd.zip
- kubectl apply -f release/crds
- # 檢查crd
- [root@test opt]# kubectl get crd |grep vm
- vmagents.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmalertmanagerconfigs.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmalertmanagers.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmalerts.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmauths.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmclusters.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmnodescrapes.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmpodscrapes.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmprobes.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmrules.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmservicescrapes.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmsingles.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmstaticscrapes.operator.victoriametrics.com 2022-01-05T07:26:01Z
- vmusers.operator.victoriametrics.com 2022-01-05T07:26:01Z
2、安裝operator
- # 安裝operator。記得提前修改operator的鏡像地址
- kubectl apply -f release/operator/
- # 安裝后檢查operator是否正常
- [root@test opt]# kubectl get po -n monitoring-system
- vm-operator-76dd8f7b84-gsbfs 1/1 Running 0 25h
3、安裝vmcluster operator安裝完成后,需要根據(jù)自己的需求去構(gòu)建自己的的cr。我這里安裝一個vmcluster。先看看vmcluster安裝文件
- # cat vmcluster-install.yaml
- apiVersion: operator.victoriametrics.com/v1beta1
- kind: VMCluster
- metadata:
- name: vmcluster
- namespace: monitoring-system
- spec:
- replicationFactor: 1
- retentionPeriod: "4"
- vminsert:
- image:
- pullPolicy: IfNotPresent
- repository: images.huazai.com/release/vminsert
- tag: v1.63.0
- podMetadata:
- labels:
- victoriaMetrics: vminsert
- replicaCount: 1
- resources:
- limits:
- cpu: "1"
- memory: 1000Mi
- requests:
- cpu: 500m
- memory: 500Mi
- vmselect:
- cacheMountPath: /select-cache
- image:
- pullPolicy: IfNotPresent
- repository: images.huazai.com/release/vmselect
- tag: v1.63.0
- podMetadata:
- labels:
- victoriaMetrics: vmselect
- replicaCount: 1
- resources:
- limits:
- cpu: "1"
- memory: 1000Mi
- requests:
- cpu: 500m
- memory: 500Mi
- storage:
- volumeClaimTemplate:
- spec:
- accessModes:
- - ReadWriteOnce
- resources:
- requests:
- storage: 2G
- storageClassName: nfs-csi
- volumeMode: Filesystem
- vmstorage:
- image:
- pullPolicy: IfNotPresent
- repository: images.huazai.com/release/vmstorage
- tag: v1.63.0
- podMetadata:
- labels:
- victoriaMetrics: vmstorage
- replicaCount: 1
- resources:
- limits:
- cpu: "1"
- memory: 1500Mi
- requests:
- cpu: 500m
- memory: 750Mi
- storage:
- volumeClaimTemplate:
- spec:
- accessModes:
- - ReadWriteOnce
- resources:
- requests:
- storage: 20G
- storageClassName: nfs-csi
- volumeMode: Filesystem
- storageDataPath: /vm-data
- # install vmcluster
- kubectl apply -f vmcluster-install.yaml
- # 檢查vmcluster install結(jié)果
- [root@test opt]# kubectl get po -n monitoring-system
- NAME READY STATUS RESTARTS AGE
- vm-operator-76dd8f7b84-gsbfs 1/1 Running 0 26h
- vminsert-vmcluster-main-69766c8f4-r795w 1/1 Running 0 25h
- vmselect-vmcluster-main-0 1/1 Running 0 25h
- vmstorage-vmcluster-main-0 1/1 Running 0 25h
4、創(chuàng)建vminsert和vmselect service
- # 查看創(chuàng)建的svc
- [root@test opt]# kubectl get svc -n monitoring-system
- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
- vminsert-vmcluster-main ClusterIP 10.0.182.73 <none> 8480/TCP 25h
- vmselect-vmcluster-main ClusterIP None <none> 8481/TCP 25h
- vmstorage-vmcluster-main ClusterIP None <none> 8482/TCP,8400/TCP,8401/TCP 25h
- # 這里為了方便不同k8s集群的數(shù)據(jù)都可以存儲到該vm來,同時方便后續(xù)查詢數(shù)據(jù),
- # 重新創(chuàng)建兩個svc,類型為nodeport,分別為vminsert-lbsvc和vmselect-lbsvc.同時配置云上的lb監(jiān)聽8480和8481端口,后端服務(wù)器為vm所在集群的節(jié)點ip,
- # 端口為vminsert-lbsvc和vmsleect-lbsvc兩個service暴露出來的nodeport
- # 但與vm同k8s集群的比如opentelemetry需要存儲數(shù)據(jù)時,仍然可以用:
- # vminsert-vmcluster-main.kube-system.svc.cluster.local:8480
- # 與vm不同k8s集群的如opentelemetry存儲數(shù)據(jù)時使用lb:8480
- # cat vminsert-lb-svc.yaml
- apiVersion: v1
- kind: Service
- metadata:
- labels:
- app.kubernetes.io/component: monitoring
- app.kubernetes.io/instance: vmcluster-main
- app.kubernetes.io/name: vminsert
- name: vminsert-vmcluster-main-lbsvc
- namespace: monitoring-system
- spec:
- externalTrafficPolicy: Cluster
- ports:
- - name: http
- nodePort: 30135
- port: 8480
- protocol: TCP
- targetPort: 8480
- selector:
- app.kubernetes.io/component: monitoring
- app.kubernetes.io/instance: vmcluster-main
- app.kubernetes.io/name: vminsert
- sessionAffinity: None
- type: NodePort
- # cat vmselect-lb-svc.yaml
- apiVersion: v1
- kind: Service
- metadata:
- labels:
- app.kubernetes.io/component: monitoring
- app.kubernetes.io/instance: vmcluster-main
- app.kubernetes.io/name: vmselect
- name: vmselect-vmcluster-main-lbsvc
- namespace: monitoring-system
- spec:
- externalTrafficPolicy: Cluster
- ports:
- - name: http
- nodePort: 31140
- port: 8481
- protocol: TCP
- targetPort: 8481
- selector:
- app.kubernetes.io/component: monitoring
- app.kubernetes.io/instance: vmcluster-main
- app.kubernetes.io/name: vmselect
- sessionAffinity: None
- type: NodePort
- # 創(chuàng)建svc
- kubectl apply -f vmselect-lb-svc.yaml
- kubectl apply -f vminsert-lb-svc.yaml
- # ?。∨渲迷粕蟣b,
- 自行配置
- # 最后檢查vm相關(guān)的pod和svc
- [root@test opt]# kubectl get po,svc -n monitoring-system
- NAME READY STATUS RESTARTS AGE
- pod/vm-operator-76dd8f7b84-gsbfs 1/1 Running 0 30h
- pod/vminsert-vmcluster-main-69766c8f4-r795w 1/1 Running 0 29h
- pod/vmselect-vmcluster-main-0 1/1 Running 0 29h
- pod/vmstorage-vmcluster-main-0 1/1 Running 0 29h
- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
- service/vminsert-vmcluster-main ClusterIP 10.0.182.73 <none> 8480/TCP 29h
- service/vminsert-vmcluster-main-lbsvc NodePort 10.0.255.212 <none> 8480:30135/TCP 7h54m
- service/vmselect-vmcluster-main ClusterIP None <none> 8481/TCP 29h
- service/vmselect-vmcluster-main-lbsvc NodePort 10.0.45.239 <none> 8481:31140/TCP 7h54m
- service/vmstorage-vmcluster-main ClusterIP None <none> 8482/TCP,8400/TCP,8401/TCP 29h
安裝prometheus-expoter
這里還是來安裝node exporter,暴露k8s節(jié)點數(shù)據(jù),由后續(xù)的opentelemetry來采集,并通過vminsert存儲到vmstorage。數(shù)據(jù)通過vmselect來進行查詢
- # kubectl apply -f prometheus-node-exporter-install.yaml
- apiVersion: apps/v1
- kind: DaemonSet
- metadata:
- labels:
- app: prometheus-node-exporter
- release: prometheus-node-exporter
- name: prometheus-node-exporter
- namespace: kube-system
- spec:
- revisionHistoryLimit: 10
- selector:
- matchLabels:
- app: prometheus-node-exporter
- release: prometheus-node-exporter
- template:
- metadata:
- labels:
- app: prometheus-node-exporter
- release: prometheus-node-exporter
- spec:
- containers:
- - args:
- - --path.procfs=/host/proc
- - --path.sysfs=/host/sys
- - --path.rootfs=/host/root
- - --web.listen-address=$(HOST_IP):9100
- env:
- - name: HOST_IP
- value: 0.0.0.0
- image: images.huazai.com/release/node-exporter:v1.1.2
- imagePullPolicy: IfNotPresent
- livenessProbe:
- failureThreshold: 3
- httpGet:
- path: /
- port: 9100
- scheme: HTTP
- periodSeconds: 10
- successThreshold: 1
- timeoutSeconds: 1
- name: node-exporter
- ports:
- - containerPort: 9100
- hostPort: 9100
- name: metrics
- protocol: TCP
- readinessProbe:
- failureThreshold: 3
- httpGet:
- path: /
- port: 9100
- scheme: HTTP
- periodSeconds: 10
- successThreshold: 1
- timeoutSeconds: 1
- resources:
- limits:
- cpu: 200m
- memory: 50Mi
- requests:
- cpu: 100m
- memory: 30Mi
- terminationMessagePath: /dev/termination-log
- terminationMessagePolicy: File
- volumeMounts:
- - mountPath: /host/proc
- name: proc
- readOnly: true
- - mountPath: /host/sys
- name: sys
- readOnly: true
- - mountPath: /host/root
- mountPropagation: HostToContainer
- name: root
- readOnly: true
- dnsPolicy: ClusterFirst
- hostNetwork: true
- hostPID: true
- restartPolicy: Always
- schedulerName: default-scheduler
- securityContext:
- fsGroup: 65534
- runAsGroup: 65534
- runAsNonRoot: true
- runAsUser: 65534
- serviceAccount: prometheus-node-exporter
- serviceAccountName: prometheus-node-exporter
- terminationGracePeriodSeconds: 30
- tolerations:
- - effect: NoSchedule
- operator: Exists
- volumes:
- - hostPath:
- path: /proc
- type: ""
- name: proc
- - hostPath:
- path: /sys
- type: ""
- name: sys
- - hostPath:
- path: /
- type: ""
- name: root
- updateStrategy:
- rollingUpdate:
- maxUnavailable: 1
- type: RollingUpdate
- # 檢查node-exporter
- [root@test ~]# kubectl get po -n kube-system |grep prometheus
- prometheus-node-exporter-89wjk 1/1 Running 0 31h
- prometheus-node-exporter-hj4gh 1/1 Running 0 31h
- prometheus-node-exporter-hxm8t 1/1 Running 0 31h
- prometheus-node-exporter-nhqp6 1/1 Running 0 31h
安裝opentelemetry
prometheus node exporter安裝好之后,再來安裝opentelemetry(以后有機會再介紹)
- # opentelemetry 配置文件。定義數(shù)據(jù)的接收、處理、導(dǎo)出
- # 1.receivers即從哪里獲取數(shù)據(jù)
- # 2.processors即對獲取的數(shù)據(jù)的處理
- # 3.exporters即將處理過的數(shù)據(jù)導(dǎo)出到哪里,本次數(shù)據(jù)通過vminsert最終寫入到vmstorage
- # kubectl apply -f opentelemetry-install-cm.yaml
- apiVersion: v1
- data:
- relay: |
- exporters:
- prometheusremotewrite:
- # 我這里配置lb_ip:8480,即vminsert地址
- endpoint: http://lb_ip:8480/insert/0/prometheus
- # 不同的集群添加不同的label,比如cluster: uat/prd
- external_labels:
- cluster: uat
- extensions:
- health_check: {}
- processors:
- batch: {}
- memory_limiter:
- ballast_size_mib: 819
- check_interval: 5s
- limit_mib: 1638
- spike_limit_mib: 512
- receivers:
- prometheus:
- config:
- scrape_configs:
- - job_name: opentelemetry-collector
- scrape_interval: 10s
- static_configs:
- - targets:
- - localhost:8888
- ...省略...
- - job_name: kube-state-metrics
- kubernetes_sd_configs:
- - namespaces:
- names:
- - kube-system
- role: service
- metric_relabel_configs:
- - regex: ReplicaSet;([\w|\-]+)\-[0-9|a-z]+
- replacement: $$1
- source_labels:
- - created_by_kind
- - created_by_name
- target_label: created_by_name
- - regex: ReplicaSet
- replacement: Deployment
- source_labels:
- - created_by_kind
- target_label: created_by_kind
- relabel_configs:
- - action: keep
- regex: kube-state-metrics
- source_labels:
- - __meta_kubernetes_service_name
- - job_name: node-exporter
- kubernetes_sd_configs:
- - namespaces:
- names:
- - kube-system
- role: endpoints
- relabel_configs:
- - action: keep
- regex: node-exporter
- source_labels:
- - __meta_kubernetes_service_name
- - source_labels:
- - __meta_kubernetes_pod_node_name
- target_label: node
- - source_labels:
- - __meta_kubernetes_pod_host_ip
- target_label: host_ip
- ...省略...
- service:
- # 上面定義的receivors、processors、exporters以及extensions需要在這里配置,不然不起作用
- extensions:
- - health_check
- pipelines:
- metrics:
- exporters:
- - prometheusremotewrite
- processors:
- - memory_limiter
- - batch
- receivers:
- - prometheus
- kind: ConfigMap
- metadata:
- annotations:
- meta.helm.sh/release-name: opentelemetry-collector-hua
- meta.helm.sh/release-namespace: kube-system
- labels:
- app.kubernetes.io/instance: opentelemetry-collector-hua
- app.kubernetes.io/name: opentelemetry-collector-hua
- name: opentelemetry-collector-hua
- namespace: kube-system
- # 安裝opentelemetry
- # kubectl apply -f opentelemetry-install.yaml
- apiVersion: apps/v1
- kind: Deployment
- metadata:
- labels:
- app.kubernetes.io/instance: opentelemetry-collector-hua
- app.kubernetes.io/name: opentelemetry-collector-hua
- name: opentelemetry-collector-hua
- namespace: kube-system
- spec:
- progressDeadlineSeconds: 600
- replicas: 1
- revisionHistoryLimit: 10
- selector:
- matchLabels:
- app.kubernetes.io/instance: opentelemetry-collector-hua
- app.kubernetes.io/name: opentelemetry-collector-hua
- strategy:
- rollingUpdate:
- maxSurge: 25%
- maxUnavailable: 25%
- type: RollingUpdate
- template:
- metadata:
- labels:
- app.kubernetes.io/instance: opentelemetry-collector-hua
- app.kubernetes.io/name: opentelemetry-collector-hua
- spec:
- containers:
- - command:
- - /otelcol
- - --config=/conf/relay.yaml
- - --metrics-addr=0.0.0.0:8888
- - --mem-ballast-size-mib=819
- env:
- - name: MY_POD_IP
- valueFrom:
- fieldRef:
- apiVersion: v1
- fieldPath: status.podIP
- image: images.huazai.com/release/opentelemetry-collector:0.27.0
- imagePullPolicy: IfNotPresent
- livenessProbe:
- failureThreshold: 3
- httpGet:
- path: /
- port: 13133
- scheme: HTTP
- periodSeconds: 10
- successThreshold: 1
- timeoutSeconds: 1
- name: opentelemetry-collector-hua
- ports:
- - containerPort: 4317
- name: otlp
- protocol: TCP
- readinessProbe:
- failureThreshold: 3
- httpGet:
- path: /
- port: 13133
- scheme: HTTP
- periodSeconds: 10
- successThreshold: 1
- timeoutSeconds: 1
- resources:
- limits:
- cpu: "1"
- memory: 2Gi
- requests:
- cpu: 500m
- memory: 1Gi
- volumeMounts:
- - mountPath: /conf
- # 上面創(chuàng)建的給oepntelnemetry用的configmap
- name: opentelemetry-collector-configmap-hua
- - mountPath: /etc/otel-collector/secrets/etcd-cert/
- name: etcd-tls
- readOnly: true
- dnsPolicy: ClusterFirst
- restartPolicy: Always
- schedulerName: default-scheduler
- securityContext: {}
- # sa這里自行創(chuàng)建吧
- serviceAccount: opentelemetry-collector-hua
- serviceAccountName: opentelemetry-collector-hua
- terminationGracePeriodSeconds: 30
- volumes:
- - configMap:
- defaultMode: 420
- items:
- - key: relay
- path: relay.yaml
- # 上面創(chuàng)建的給oepntelnemetry用的configmap
- name: opentelemetry-collector-hua
- name: opentelemetry-collector-configmap-hua
- - name: etcd-tls
- secret:
- defaultMode: 420
- secretName: etcd-tls
- # 檢查opentelemetry運行情況。如果opentelemetry與vm在同一個k8s集群,請寫service那一套,不要使用lb(受制于云上
- # 4層監(jiān)聽器的后端服務(wù)器暫不能支持同時作為客戶端和服務(wù)端)
- [root@kube-control-1 ~]# kubectl get po -n kube-system |grep opentelemetry-collector-hua
- opentelemetry-collector-hua-647c6c64c7-j6p4b 1/1 Running 0 8h
安裝檢查
所有的組件安裝完成后,在瀏覽器輸入http://lb:8481/select/0/vmui,然后在server url輸入;http://lb:8481/select/0/prometheus。最后再輸入對應(yīng)的指標就可以查詢數(shù)據(jù)了,左上角還可以開啟自動刷新!
總結(jié)
整個安裝過程還是比較簡單的。一旦安裝完成后,即可存儲多個k8s集群的監(jiān)控數(shù)據(jù)。vm是支持基于PromeQL的MetricsQL的,也能夠作為grafana的數(shù)據(jù)源。想想之前需要手動在每個k8s集群單獨安裝prometheus,還要去配置存儲,需要查詢數(shù)據(jù)時,要單獨打開每個集群的prometheus UI是不是顯得稍微麻煩一點呢。如果你也覺得vm不錯,動手試試看吧!
全文參考
- https://github.com/VictoriaMetrics/VictoriaMetrics/tree/cluster
- https://docs.victoriametrics.com/
- https://opentelemetry.io/docs/
- https://prometheus.io/docs/prometheus/latest/configuration/configuration/