使用Golang編寫(xiě)自定義Prometheus Metrics
本文轉(zhuǎn)載自微信公眾號(hào)「運(yùn)維開(kāi)發(fā)故事」,作者xiyangxixia。轉(zhuǎn)載本文請(qǐng)聯(lián)系運(yùn)維開(kāi)發(fā)故事公眾號(hào)。
一、前言
為什么想到要用golang來(lái)編寫(xiě)metrics呢?這主要是我們的一個(gè)客戶那里,k8s網(wǎng)絡(luò)使用了ovs,并且做了bond,即bond0和bond1,每個(gè)bond下面2張網(wǎng)卡。在上了生產(chǎn)后,讓我每天都要檢查一下網(wǎng)卡是否正常,因?yàn)橹熬陀芯W(wǎng)卡DOWN了。而我呢,比較懶,不想手動(dòng)去檢查。想著通過(guò)prometheus最終展示到grafana,我就在grafana上看看有沒(méi)有處于異常的網(wǎng)卡就好了。其次呢,我最近剛好在學(xué)習(xí)go,也想練練手;同時(shí)也問(wèn)了一下研發(fā)同學(xué),說(shuō)很簡(jiǎn)單,叫我試試,遇到困難時(shí)也愿意幫助我。所以,我就打算試試了。
二、環(huán)境
組件 | 版本 | 備注 |
---|---|---|
k8s | v1.14 | |
ovs | v2.9.5 | |
go | 1.14.1 |
三、目標(biāo)
目標(biāo)就是要通過(guò)prometheus去拉取我的ovs bond的網(wǎng)卡狀態(tài)指標(biāo),那么這里我需要編寫(xiě)一個(gè)go程序去獲取我主機(jī)的ovs bond信息,并最終以metrics方式暴露供prometheus來(lái)拉取,在grafana上展示。示例如下:
- # 現(xiàn)獲取當(dāng)前bond信息
- [root@test~]$ ovs-appctl bond/show |grep '^slave' |grep -v grep |awk '{print $2""$3}'
- a1-b1:enabled
- a2-b2:enabled
- a3-b3:enabled
- a4-b4:disabled
- # 最終組件暴露的數(shù)據(jù)如下 5代表獲取bond信息的命令執(zhí)行執(zhí)行失敗了,0-4表示有幾張?zhí)幱赿isabled狀態(tài)的網(wǎng)卡
- curl http://$IP:$PORT/metrics
- ovs_bond_status{component="ovs"} 5
- ovs_bond_status{component="ovs","a1b1"="enabled","a2b2"="disabled","a3b3"="enabled",a4b4="disabled“} 2
四、構(gòu)想
由于要通過(guò)prometheus來(lái)抓取指標(biāo),所以bond 信息肯定要以metrics格式進(jìn)行暴露。metrics格式可以參考prometheus官網(wǎng)。
bond有兩個(gè),每個(gè)下面有兩張網(wǎng)卡,每張網(wǎng)卡的狀態(tài)只有enabled和disabled,因此用數(shù)字0-4來(lái)告訴用戶有幾張網(wǎng)卡disabled了,用數(shù)字5來(lái)表示命令執(zhí)行有問(wèn)題或沒(méi)有bond,需要人工介入。可以通過(guò)命令去獲取bond信息,因此還是采取命令方式去獲取。
要對(duì)執(zhí)行命令獲取的輸出結(jié)果進(jìn)行處理并放到metrics中去。注:metrics的label不能有【-】。
shell命令返回的bond正確信息用map去接收,key為網(wǎng)卡名,value為網(wǎng)卡狀態(tài)
可以參考client_golang/prometheus
五、實(shí)踐
先執(zhí)行shell命令去獲取bond信息
- # 現(xiàn)獲取當(dāng)前bond信息
- [root@test~]$ ovs-appctl bond/show |grep '^slave' |grep -v grep |awk '{print $2""$3}'
- a1-b1:enabled
- a2-b2:enabled
- a3-b3:enabled
- a4-b4:disabled
要針對(duì)shell的輸出結(jié)果進(jìn)行處理
- # 執(zhí)行shell命令,并對(duì)輸出進(jìn)行處理,記錄相關(guān)日志
- // return map
- // 一種是執(zhí)行命令錯(cuò)誤,一種是執(zhí)行命令成功,但是返回null
- func getBondStatus() (m map[string]string) {
- result, err := exec.Command("bash", "-c", "ovs-appctl bond/show | grep '^slave' | grep -v grep | awk '{print $2\"\"$3}'").Output()
- if err != nil {
- log.Error("result: ", string(result))
- log.Error("command failed: ", err.Error())
- m = make(map[string]string)
- m["msg"] = "failure"
- return m
- } else if len(result) == 0 {
- log.Error("command exec failed, result is null")
- m = make(map[string]string)
- m["msg"] = "return null"
- return m
- }
- // 對(duì)結(jié)果進(jìn)行進(jìn)行處理,先去除兩邊空格
- ret := strings.TrimSpace(string(result))
- // 通過(guò)換行符切割
- tt := strings.Split(ret, "\n")
- //tt := []string{"a1-b1:enabled","a2-b2:disabled"}
- //如果key帶有【-】,則需要去掉
- var nMap = make(map[string]string)
- for i := 0; i < len(tt); i++ {
- // if key contains "-"
- if strings.Contains(tt[i], "-") == true {
- nKey := strings.Split(strings.Split(tt[i], ":")[0], "-")
- nMap[strings.Join(nKey, "")] = (strings.Split(tt[i], ":"))[1]
- } else {
- nMap[(strings.Split(tt[i], ":"))[0]] = (strings.Split(tt[i], ":"))[1]
- }
- }
- return nMap
- }
定義metrics指標(biāo)
- // define a struct
- type ovsCollector struct {
- // 可以定義多個(gè)
- ovsMetric *prometheus.Desc
- }
- func (collector *ovsCollector) Describe(ch chan<- *prometheus.Desc) {
- ch <- collector.ovsMetric
- }
- // 網(wǎng)卡名
- var vLable = []string{}
- // 網(wǎng)卡狀態(tài)
- var vValue = []string{}
- // 固定label,表明是ovs
- var constLabel = prometheus.Labels{"component": "ovs"}
- // define metric
- func newOvsCollector() *ovsCollector {
- var rm = make(map[string]string)
- rm = getBondStatus()
- if _, ok := rm["msg"]; ok {
- log.Error("command execute failed:", rm["msg"])
- } else {
- //只獲取網(wǎng)卡名
- for k, _ := range rm {
- // get the net
- vLable = append(vLable, k)
- }
- }
- // metric
- return &ovsCollector{
- ovsMetric: prometheus.NewDesc("ovs_bond_status",
- "Show ovs bond status", vLable,
- constLabel),
- }
- }
指標(biāo)對(duì)應(yīng)值
- // 命令執(zhí)行正確則將對(duì)應(yīng)的網(wǎng)卡、網(wǎng)卡狀態(tài)以及處于異常的網(wǎng)卡數(shù)量注入到到metrics中去
- func (collector *ovsCollector) Collect(ch chan<- prometheus.Metric) {
- var metricValue float64
- var rm = make(map[string]string)
- rm = getBondStatus()
- if _, ok := rm["msg"]; ok {
- log.Error("command exec failed")
- metricValue = 5
- ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue)
- } else {
- vValue = vValue[0:0]
- //只取value
- for _, v := range rm {
- // get the net
- vValue = append(vValue, v)
- // 針對(duì)disabled計(jì)數(shù)
- if v == "disabled" {
- metricValue++
- }
- }
- ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue, vValue...)
- }
- }
程序入口
- func main() {
- ovs := newOvsCollector()
- prometheus.MustRegister(ovs)
- http.Handle("/metrics", promhttp.Handler())
- log.Info("begin to server on port 8080")
- // listen on port 8080
- log.Fatal(http.ListenAndServe(":8080", nil))
- }
完整代碼
- package main
- import (
- "github.com/prometheus/client_golang/prometheus"
- "github.com/prometheus/client_golang/prometheus/promhttp"
- log "github.com/sirupsen/logrus"
- "net/http"
- "os/exec"
- "strings"
- )
- // define a struct from prometheus's struct named Desc
- type ovsCollector struct {
- ovsMetric *prometheus.Desc
- }
- func (collector *ovsCollector) Describe(ch chan<- *prometheus.Desc) {
- ch <- collector.ovsMetric
- }
- var vLable = []string{}
- var vValue = []string{}
- var constLabel = prometheus.Labels{"component": "ovs"}
- // get the value of the metric from a function who would execute a command and return a float64 value
- func (collector *ovsCollector) Collect(ch chan<- prometheus.Metric) {
- var metricValue float64
- var rm = make(map[string]string)
- rm = getBondStatus()
- if _, ok := rm["msg"]; ok {
- log.Error("command exec failed")
- metricValue = 5
- ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue)
- } else {
- vValue = vValue[0:0]
- for _, v := range rm {
- // get the net
- vValue = append(vValue, v)
- if v == "disabled" {
- metricValue++
- }
- }
- ch <- prometheus.MustNewConstMetric(collector.ovsMetric, prometheus.CounterValue, metricValue, vValue...)
- }
- }
- // define metric's name、help
- func newOvsCollector() *ovsCollector {
- var rm = make(map[string]string)
- rm = getBondStatus()
- if _, ok := rm["msg"]; ok {
- log.Error("command execute failed:", rm["msg"])
- } else {
- for k, _ := range rm {
- // get the net
- vLable = append(vLable, k)
- }
- }
- return &ovsCollector{
- ovsMetric: prometheus.NewDesc("ovs_bond_status",
- "Show ovs bond status", vLable,
- constLabel),
- }
- }
- func getBondStatus() (m map[string]string) {
- result, err := exec.Command("bash", "-c", "ovs-appctl bond/show | grep '^slave' | grep -v grep | awk '{print $2\"\"$3}'").Output()
- if err != nil {
- log.Error("result: ", string(result))
- log.Error("command failed: ", err.Error())
- m = make(map[string]string)
- m["msg"] = "failure"
- return m
- } else if len(result) == 0 {
- log.Error("command exec failed, result is null")
- m = make(map[string]string)
- m["msg"] = "return null"
- return m
- }
- ret := strings.TrimSpace(string(result))
- tt := strings.Split(ret, "\n")
- var nMap = make(map[string]string)
- for i := 0; i < len(tt); i++ {
- // if key contains "-"
- if strings.Contains(tt[i], "-") == true {
- nKey := strings.Split(strings.Split(tt[i], ":")[0], "-")
- nMap[strings.Join(nKey, "")] = (strings.Split(tt[i], ":"))[1]
- } else {
- nMap[(strings.Split(tt[i], ":"))[0]] = (strings.Split(tt[i], ":"))[1]
- }
- }
- return nMap
- }
- func main() {
- ovs := newOvsCollector()
- prometheus.MustRegister(ovs)
- http.Handle("/metrics", promhttp.Handler())
- log.Info("begin to server on port 8080")
- // listen on port 8080
- log.Fatal(http.ListenAndServe(":8080", nil))
- }
六、部署
因?yàn)樽罱K要部署到k8s環(huán)境中, 先構(gòu)建鏡像,參考如下Dockerfile
- FROM golang:1.14.1 AS builder
- WORKDIR /go/src
- COPY ./ .
- RUN go build -o ovs_check main.go
- # runtime
- FROM centos:7.7
- COPY --from=builder /go/src/ovs_check /xiyangxixia/ovs_check
- ENTRYPOINT ["/xiyangxixia/ovs_check"]
我這里部署使用的yaml如下所示:
- ---
- apiVersion: apps/v1
- kind: DaemonSet
- metadata:
- name: ovs-agent
- namespace: kube-system
- spec:
- minReadySeconds: 5
- selector:
- matchLabels:
- name: ovs-agent
- template:
- metadata:
- annotations:
- # 這里三個(gè)都要加上,告訴promethue抓取路徑
- prometheus.io/scrape: "true"
- prometheus.io/port: "8080"
- prometheus.io/path: "/metrics"
- labels:
- name: ovs-agent
- spec:
- containers:
- - name: ovs-agent
- image: ovs_bond:v1
- imagePullPolicy: IfNotPresent
- resources:
- limits:
- cpu: 100m
- memory: 200Mi
- requests:
- cpu: 100m
- memory: 200Mi
- securityContext:
- privileged: true
- procMount: Default
- volumeMounts:
- - mountPath: /lib/modules
- name: lib-modules
- readOnly: true
- - mountPath: /var/run/openvswitch
- name: ovs-run
- - mountPath: /usr/bin/ovs-appctl
- name: ovs-bin
- subPath: ovs-appctl
- serviceAccountName: xiyangxixia
- hostPID: true
- hostIPC: true
- volumes:
- - hostPath:
- path: /lib/modules
- type: ""
- name: lib-modules
- - hostPath:
- path: /var/run/openvswitch
- type: ""
- name: ovs-run
- - hostPath:
- path: /usr/bin/
- type: ""
- name: ovs-bin
- updateStrategy:
- type: RollingUpdate
七、測(cè)試
- [root@test ~]$ kubectl get po -n kube-system -o wide |grep ovs
- ovs-agent-h8zc6 1/1 Running 0 2d14h 10.211.55.41 master-1 <none> <none>
- [root@test ~]$ curl 10.211.55.41:8080/metrics |grep ovs_bond
- # HELP ovs_bond_status Show ovs bond status
- # TYPE ovs_bond_status counter
- ovs_bond_status{component="ovs",a1b1="enabled",a2b2="enabled",a3b3="enabled",a4b4="enabled"} 0
八、總結(jié)
以上就是這篇文章的所有了,原諒我學(xué)藝不精只能粗糙的介紹一下。感謝一直以來(lái)關(guān)注公眾號(hào)的朋友們!