Kubernetes節(jié)點(diǎn)之間的ping監(jiān)控
在診斷Kubernetes集群?jiǎn)栴}的時(shí)候,我們經(jīng)常注意到集群中某一節(jié)點(diǎn)在閃爍*,而這通常是隨機(jī)的且以奇怪的方式發(fā)生。這就是為什么我們一直需要一種工具,它可以測(cè)試一個(gè)節(jié)點(diǎn)與另一個(gè)節(jié)點(diǎn)之間的可達(dá)性,并以Prometheus度量形式呈現(xiàn)結(jié)果。有了這個(gè)工具,我們還希望在Grafana中創(chuàng)建圖表并快速定位發(fā)生故障的節(jié)點(diǎn)(并在必要時(shí)將該節(jié)點(diǎn)上所有Pod進(jìn)行重新調(diào)度并進(jìn)行必要的維護(hù))。
“閃爍”這里我是指某個(gè)節(jié)點(diǎn)隨機(jī)變?yōu)?ldquo;NotReady”但之后又恢復(fù)正常的某種行為。例如部分流量可能無(wú)法到達(dá)相鄰節(jié)點(diǎn)上的Pod。
為什么會(huì)發(fā)生這種情況?常見(jiàn)原因之一是數(shù)據(jù)中心交換機(jī)中的連接問(wèn)題。例如,我們?cè)?jīng)在Hetzner中設(shè)置一個(gè)vswitch,其中一個(gè)節(jié)點(diǎn)已無(wú)法通過(guò)該vswitch端口使用,并且恰好在本地網(wǎng)絡(luò)上完全不可訪問(wèn)。
我們的最后一個(gè)要求是可直接在Kubernetes中運(yùn)行此服務(wù),因此我們將能夠通過(guò)Helm圖表部署所有內(nèi)容。(例如在使用Ansible的情況下,我們必須為各種環(huán)境中的每個(gè)角色定義角色:AWS、GCE、裸機(jī)等)。由于我們尚未找到針對(duì)此環(huán)境的現(xiàn)成解決方案,因此我們決定自己來(lái)實(shí)現(xiàn)。
腳本和配置
我們解決方案的主要組件是一個(gè)腳本,該腳本監(jiān)視每個(gè)節(jié)點(diǎn)的.status.addresses值。如果某個(gè)節(jié)點(diǎn)的該值已更改(例如添加了新節(jié)點(diǎn)),則我們的腳本使用Helm value方式將節(jié)點(diǎn)列表以ConfigMap的形式傳遞給Helm圖表:
- apiVersion: v1
- kind: ConfigMap
- metadata:
- name: ping-exporter-config
- namespace: d8-system
- data:
- nodes.json: >
- {{ .Values.pingExporter.targets | toJson }}
- .Values.pingExporter.targets類似以下:
- "cluster_targets":[{"ipAddress":"192.168.191.11","name":"kube-a-3"},{"ipAddress":"192.168.191.12","name":"kube-a-2"},{"ipAddress":"192.168.191.22","name":"kube-a-1"},{"ipAddress":"192.168.191.23","name":"kube-db-1"},{"ipAddress":"192.168.191.9","name":"kube-db-2"},{"ipAddress":"51.75.130.47","name":"kube-a-4"}],"external_targets":[{"host":"8.8.8.8","name":"google-dns"},{"host":"youtube.com"}]}
下面是Python腳本:
- #!/usr/bin/env python3
- import subprocess
- import prometheus_client
- import re
- import statistics
- import os
- import json
- import glob
- import better_exchook
- import datetime
- better_exchook.install()
- FPING_CMDLINE = "/usr/sbin/fping -p 1000 -C 30 -B 1 -q -r 1".split(" ")
- FPING_REGEX = re.compile(r"^(\S*)\s*: (.*)$", re.MULTILINE)
- CONFIG_PATH = "/config/targets.json"
- registry = prometheus_client.CollectorRegistry()
- prometheus_exceptions_counter = \
- prometheus_client.Counter('kube_node_ping_exceptions', 'Total number of exceptions', [], registry=registry)
- prom_metrics_cluster = {"sent": prometheus_client.Counter('kube_node_ping_packets_sent_total',
- 'ICMP packets sent',
- ['destination_node', 'destination_node_ip_address'],
- registry=registry),
- "received": prometheus_client.Counter('kube_node_ping_packets_received_total',
- 'ICMP packets received',
- ['destination_node', 'destination_node_ip_address'],
- registry=registry),
- "rtt": prometheus_client.Counter('kube_node_ping_rtt_milliseconds_total',
- 'round-trip time',
- ['destination_node', 'destination_node_ip_address'],
- registry=registry),
- "min": prometheus_client.Gauge('kube_node_ping_rtt_min', 'minimum round-trip time',
- ['destination_node', 'destination_node_ip_address'],
- registry=registry),
- "max": prometheus_client.Gauge('kube_node_ping_rtt_max', 'maximum round-trip time',
- ['destination_node', 'destination_node_ip_address'],
- registry=registry),
- "mdev": prometheus_client.Gauge('kube_node_ping_rtt_mdev',
- 'mean deviation of round-trip times',
- ['destination_node', 'destination_node_ip_address'],
- registry=registry)}
- prom_metrics_external = {"sent": prometheus_client.Counter('external_ping_packets_sent_total',
- 'ICMP packets sent',
- ['destination_name', 'destination_host'],
- registry=registry),
- "received": prometheus_client.Counter('external_ping_packets_received_total',
- 'ICMP packets received',
- ['destination_name', 'destination_host'],
- registry=registry),
- "rtt": prometheus_client.Counter('external_ping_rtt_milliseconds_total',
- 'round-trip time',
- ['destination_name', 'destination_host'],
- registry=registry),
- "min": prometheus_client.Gauge('external_ping_rtt_min', 'minimum round-trip time',
- ['destination_name', 'destination_host'],
- registry=registry),
- "max": prometheus_client.Gauge('external_ping_rtt_max', 'maximum round-trip time',
- ['destination_name', 'destination_host'],
- registry=registry),
- "mdev": prometheus_client.Gauge('external_ping_rtt_mdev',
- 'mean deviation of round-trip times',
- ['destination_name', 'destination_host'],
- registry=registry)}
- def validate_envs():
- envs = {"MY_NODE_NAME": os.getenv("MY_NODE_NAME"), "PROMETHEUS_TEXTFILE_DIR": os.getenv("PROMETHEUS_TEXTFILE_DIR"),
- "PROMETHEUS_TEXTFILE_PREFIX": os.getenv("PROMETHEUS_TEXTFILE_PREFIX")}
- for k, v in envs.items():
- if not v:
- raise ValueError("{} environment variable is empty".format(k))
- return envs
- @prometheus_exceptions_counter.count_exceptions()
- def compute_results(results):
- computed = {}
- matches = FPING_REGEX.finditer(results)
- for match in matches:
- host = match.group(1)
- ping_results = match.group(2)
- if "duplicate" in ping_results:
- continue
- splitted = ping_results.split(" ")
- if len(splitted) != 30:
- raise ValueError("ping returned wrong number of results: \"{}\"".format(splitted))
- positive_results = [float(x) for x in splitted if x != "-"]
- if len(positive_results) > 0:
- computed[host] = {"sent": 30, "received": len(positive_results),
- "rtt": sum(positive_results),
- "max": max(positive_results), "min": min(positive_results),
- "mdev": statistics.pstdev(positive_results)}
- else:
- computed[host] = {"sent": 30, "received": len(positive_results), "rtt": 0,
- "max": 0, "min": 0, "mdev": 0}
- if not len(computed):
- raise ValueError("regex match\"{}\" found nothing in fping output \"{}\"".format(FPING_REGEX, results))
- return computed
- @prometheus_exceptions_counter.count_exceptions()
- def call_fping(ips):
- cmdline = FPING_CMDLINE + ips
- process = subprocess.run(cmdline, stdout=subprocess.PIPE,
- stderr=subprocess.STDOUT, universal_newlines=True)
- if process.returncode == 3:
- raise ValueError("invalid arguments: {}".format(cmdline))
- if process.returncode == 4:
- raise OSError("fping reported syscall error: {}".format(process.stderr))
- return process.stdout
- envs = validate_envs()
- files = glob.glob(envs["PROMETHEUS_TEXTFILE_DIR"] + "*")
- for f in files:
- os.remove(f)
- labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}
- while True:
- with open(CONFIG_PATH, "r") as f:
- config = json.loads(f.read())
- config["external_targets"] = [] if config["external_targets"] is None else config["external_targets"]
- for target in config["external_targets"]:
- target["name"] = target["host"] if "name" not in target.keys() else target["name"]
- if labeled_prom_metrics["cluster_targets"]:
- for metric in labeled_prom_metrics["cluster_targets"]:
- if (metric["node_name"], metric["ip"]) not in [(node["name"], node["ipAddress"]) for node in config['cluster_targets']]:
- for k, v in prom_metrics_cluster.items():
- v.remove(metric["node_name"], metric["ip"])
- if labeled_prom_metrics["external_targets"]:
- for metric in labeled_prom_metrics["external_targets"]:
- if (metric["target_name"], metric["host"]) not in [(target["name"], target["host"]) for target in config['external_targets']]:
- for k, v in prom_metrics_external.items():
- v.remove(metric["target_name"], metric["host"])
- labeled_prom_metrics = {"cluster_targets": [], "external_targets": []}
- for node in config["cluster_targets"]:
- metrics = {"node_name": node["name"], "ip": node["ipAddress"], "prom_metrics": {}}
- for k, v in prom_metrics_cluster.items():
- metrics["prom_metrics"][k] = v.labels(node["name"], node["ipAddress"])
- labeled_prom_metrics["cluster_targets"].append(metrics)
- for target in config["external_targets"]:
- metrics = {"target_name": target["name"], "host": target["host"], "prom_metrics": {}}
- for k, v in prom_metrics_external.items():
- metrics["prom_metrics"][k] = v.labels(target["name"], target["host"])
- labeled_prom_metrics["external_targets"].append(metrics)
- out = call_fping([prom_metric["ip"] for prom_metric in labeled_prom_metrics["cluster_targets"]] + \
- [prom_metric["host"] for prom_metric in labeled_prom_metrics["external_targets"]])
- computed = compute_results(out)
- for dimension in labeled_prom_metrics["cluster_targets"]:
- result = computed[dimension["ip"]]
- dimension["prom_metrics"]["sent"].inc(computed[dimension["ip"]]["sent"])
- dimension["prom_metrics"]["received"].inc(computed[dimension["ip"]]["received"])
- dimension["prom_metrics"]["rtt"].inc(computed[dimension["ip"]]["rtt"])
- dimension["prom_metrics"]["min"].set(computed[dimension["ip"]]["min"])
- dimension["prom_metrics"]["max"].set(computed[dimension["ip"]]["max"])
- dimension["prom_metrics"]["mdev"].set(computed[dimension["ip"]]["mdev"])
- for dimension in labeled_prom_metrics["external_targets"]:
- result = computed[dimension["host"]]
- dimension["prom_metrics"]["sent"].inc(computed[dimension["host"]]["sent"])
- dimension["prom_metrics"]["received"].inc(computed[dimension["host"]]["received"])
- dimension["prom_metrics"]["rtt"].inc(computed[dimension["host"]]["rtt"])
- dimension["prom_metrics"]["min"].set(computed[dimension["host"]]["min"])
- dimension["prom_metrics"]["max"].set(computed[dimension["host"]]["max"])
- dimension["prom_metrics"]["mdev"].set(computed[dimension["host"]]["mdev"])
- prometheus_client.write_to_textfile(
envs["PROMETHEUS_TEXTFILE_DIR"] + envs["PROMETHEUS_TEXTFILE_PREFIX"] + envs["MY_NODE_NAME"] + ".prom", registry)
該腳本在每個(gè)Kubernetes節(jié)點(diǎn)上運(yùn)行,并且每秒兩次發(fā)送ICMP數(shù)據(jù)包到Kubernetes集群的所有實(shí)例。收集的結(jié)果會(huì)存儲(chǔ)在文本文件中。
該腳本會(huì)包含在Docker鏡像中:
- FROM python:3.6-alpine3.8
- COPY rootfs /
- WORKDIR /app
- RUN pip3 install --upgrade pip && pip3 install -r requirements.txt && apk add --no-cache fping
- ENTRYPOINT ["python3", "/app/ping-exporter.py"]
另外,我們還創(chuàng)建了一個(gè)ServiceAccount和一個(gè)具有唯一權(quán)限的對(duì)應(yīng)角色用于獲取節(jié)點(diǎn)列表(這樣我們就可以知道它們的IP地址):
- apiVersion: v1
- kind: ServiceAccount
- metadata:
- name: ping-exporter
- namespace: d8-system
- ---
- kind: ClusterRole
- apiVersion: rbac.authorization.k8s.io/v1
- metadata:
- name: d8-system:ping-exporter
- rules:
- - apiGroups: [""]
- resources: ["nodes"]
- verbs: ["list"]
- ---
- kind: ClusterRoleBinding
- apiVersion: rbac.authorization.k8s.io/v1
- metadata:
- name: d8-system:kube-ping-exporter
- subjects:
- - kind: ServiceAccount
- name: ping-exporter
- namespace: d8-system
- roleRef:
- apiGroup: rbac.authorization.k8s.io
- kind: ClusterRole
- name: d8-system:ping-exporter
最后,我們需要DaemonSet來(lái)運(yùn)行在集群中的所有實(shí)例:
- apiVersion: apps/v1
- kind: DaemonSet
- metadata:
- name: ping-exporter
- namespace: d8-system
- spec:
- updateStrategy:
- type: RollingUpdate
- selector:
- matchLabels:
- name: ping-exporter
- template:
- metadata:
- labels:
- name: ping-exporter
- spec:
- terminationGracePeriodSeconds: 0
- tolerations:
- - operator: "Exists"
- hostNetwork: true
- serviceAccountName: ping-exporter
- priorityClassName: cluster-low
- containers:
- - image: private-registry.flant.com/ping-exporter/ping-exporter:v1
- name: ping-exporter
- env:
- - name: MY_NODE_NAME
- valueFrom:
- fieldRef:
- fieldPath: spec.nodeName
- - name: PROMETHEUS_TEXTFILE_DIR
- value: /node-exporter-textfile/
- - name: PROMETHEUS_TEXTFILE_PREFIX
- value: ping-exporter_
- volumeMounts:
- - name: textfile
- mountPath: /node-exporter-textfile
- - name: config
- mountPath: /config
- volumes:
- - name: textfile
- hostPath:
- path: /var/run/node-exporter-textfile
- - name: config
- configMap:
- name: ping-exporter-config
- imagePullSecrets:
- - name: private-registry
該解決方案的最后操作細(xì)節(jié)是:
- Python腳本執(zhí)行時(shí),其結(jié)果(即存儲(chǔ)在主機(jī)上/var/run/node-exporter-textfile目錄中的文本文件)將傳遞到DaemonSet類型的node-exporter。
- node-exporter使用--collector.textfile.directory /host/textfile參數(shù)啟動(dòng),這里的/host/textfile是hostPath目錄/var/run/node-exporter-textfile。(你可以點(diǎn)擊這里了解關(guān)于node-exporter中文本文件收集器的更多信息。)
- 最后node-exporter讀取這些文件,然后Prometheus從node-exporter實(shí)例上收集所有數(shù)據(jù)。
那么結(jié)果如何?
現(xiàn)在該來(lái)享受期待已久的結(jié)果了。指標(biāo)創(chuàng)建之后,我們可以使用它們,當(dāng)然也可以對(duì)其進(jìn)行可視化。以下可以看到它們是怎樣的。
首先,有一個(gè)通用選擇器可讓我們?cè)谄渲羞x擇節(jié)點(diǎn)以檢查其“源”和“目標(biāo)”連接。你可以獲得一個(gè)匯總表,用于在Grafana儀表板中指定的時(shí)間段內(nèi)ping選定節(jié)點(diǎn)的結(jié)果:
以下是包含有關(guān)選定節(jié)點(diǎn)的組合統(tǒng)計(jì)信息的圖形:
另外,我們有一個(gè)記錄列表,其中每個(gè)記錄都鏈接到在“源”節(jié)點(diǎn)中選擇的每個(gè)特定節(jié)點(diǎn)的圖:
如果將記錄展開(kāi),你將看到從當(dāng)前節(jié)點(diǎn)到目標(biāo)節(jié)點(diǎn)中已選擇的所有其他節(jié)點(diǎn)的詳細(xì)ping統(tǒng)計(jì)信息:
下面是相關(guān)的圖形:
節(jié)點(diǎn)之間的ping出現(xiàn)問(wèn)題的圖看起來(lái)如何?
如果你在現(xiàn)實(shí)生活中觀察到類似情況,那就該進(jìn)行故障排查了!
最后,這是我們對(duì)外部主機(jī)執(zhí)行ping操作的可視化效果:
我們可以檢查所有節(jié)點(diǎn)的總體視圖,也可以僅檢查任何特定節(jié)點(diǎn)的圖形:
當(dāng)你觀察到僅影響某些特定節(jié)點(diǎn)的連接問(wèn)題時(shí),這可能會(huì)有所幫助。