自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<em id="9p3nl"><b id="9p3nl"></b></em>
<blockquote id="9p3nl"></blockquote>

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專(zhuān)欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

Istio 可觀測(cè)性之指標(biāo)，指標(biāo)提供了一種以聚合的方式監(jiān)控和理解行為的方法

作者：陽(yáng)明 2023-12-04 07:29:34

指標(biāo)提供了一種以聚合的方式監(jiān)控和理解行為的方法。為了監(jiān)控服務(wù)行為，Istio 為服務(wù)網(wǎng)格中所有出入網(wǎng)格，以及網(wǎng)格內(nèi)部的服務(wù)流量都生成了指標(biāo)，這些指標(biāo)提供了關(guān)于行為的信息，例如總流量、錯(cuò)誤率和請(qǐng)求響應(yīng)時(shí)間。除了監(jiān)控網(wǎng)格中服務(wù)的行為外，監(jiān)控網(wǎng)格本身的行為也很重要。

Istio 為網(wǎng)格內(nèi)所有的服務(wù)通信生成詳細(xì)的遙測(cè)數(shù)據(jù)。這種遙測(cè)技術(shù)提供了服務(wù)行為的可觀測(cè)性，使運(yùn)維人員能夠排查故障、維護(hù)和優(yōu)化應(yīng)用程序，而不會(huì)給開(kāi)發(fā)人員帶來(lái)其他額外的負(fù)擔(dān)。通過(guò) Istio，運(yùn)維人員可以全面了解到受監(jiān)控的服務(wù)如何與其他服務(wù)以及 Istio 組件進(jìn)行交互。

Istio 生成以下類(lèi)型的遙測(cè)數(shù)據(jù)，以提供對(duì)整個(gè)服務(wù)網(wǎng)格的可觀測(cè)性：

Metrics（指標(biāo)）：Istio 基于 4 個(gè)監(jiān)控的黃金標(biāo)識(shí)（延遲、流量、錯(cuò)誤、飽和）生成了一系列服務(wù)指標(biāo)，Istio 還為網(wǎng)格控制平面提供了更詳細(xì)的指標(biāo)。除此以外還提供了一組默認(rèn)的基于這些指標(biāo)的網(wǎng)格監(jiān)控儀表板。
Tracing（分布式追蹤）：Istio 為每個(gè)服務(wù)生成分布式追蹤 span，運(yùn)維人員可以理解網(wǎng)格內(nèi)服務(wù)的依賴(lài)和調(diào)用流程。
Log（訪問(wèn)日志）：當(dāng)流量流入網(wǎng)格中的服務(wù)時(shí)，Istio 可以生成每個(gè)請(qǐng)求的完整記錄，包括源和目標(biāo)的元數(shù)據(jù)，該信息使運(yùn)維人員能夠?qū)⒎?wù)行為的審查控制到單個(gè)工作負(fù)載實(shí)例的級(jí)別。

接下來(lái)我們將分別來(lái)學(xué)習(xí) Istio 的指標(biāo)、分布式追蹤和訪問(wèn)日志是如何工作的。

指標(biāo)

指標(biāo)提供了一種以聚合的方式監(jiān)控和理解行為的方法。為了監(jiān)控服務(wù)行為，Istio 為服務(wù)網(wǎng)格中所有出入網(wǎng)格，以及網(wǎng)格內(nèi)部的服務(wù)流量都生成了指標(biāo)，這些指標(biāo)提供了關(guān)于行為的信息，例如總流量、錯(cuò)誤率和請(qǐng)求響應(yīng)時(shí)間。除了監(jiān)控網(wǎng)格中服務(wù)的行為外，監(jiān)控網(wǎng)格本身的行為也很重要。Istio 組件還可以導(dǎo)出自身內(nèi)部行為的指標(biāo)，以提供對(duì)網(wǎng)格控制平面的功能和健康情況的洞察能力。

指標(biāo)類(lèi)別

整體上 Istio 的指標(biāo)可以分成 3 個(gè)級(jí)別：代理級(jí)別、服務(wù)級(jí)別、控制平面級(jí)別。

代理級(jí)別指標(biāo)

Istio 指標(biāo)收集從 Envoy Sidecar 代理開(kāi)始，每個(gè)代理為通過(guò)它的所有流量（入站和出站）生成一組豐富的指標(biāo)。代理還提供關(guān)于它本身管理功能的詳細(xì)統(tǒng)計(jì)信息，包括配置信息和健康信息。

Envoy 生成的指標(biāo)提供了資源（例如監(jiān)聽(tīng)器和集群）粒度上的網(wǎng)格監(jiān)控。因此，為了監(jiān)控 Envoy 指標(biāo)，需要了解網(wǎng)格服務(wù)和 Envoy 資源之間的連接。

Istio 允許運(yùn)維人員在每個(gè)工作負(fù)載實(shí)例上選擇生成和收集哪些 Envoy 指標(biāo)。默認(rèn)情況下，Istio 只支持 Envoy 生成的統(tǒng)計(jì)數(shù)據(jù)的一小部分，以避免依賴(lài)過(guò)多的后端服務(wù)，還可以減少與指標(biāo)收集相關(guān)的 CPU 開(kāi)銷(xiāo)。但是運(yùn)維人員可以在需要時(shí)輕松地?cái)U(kuò)展收集到的代理指標(biāo)數(shù)據(jù)。這樣我們可以有針對(duì)性地調(diào)試網(wǎng)絡(luò)行為，同時(shí)降低了跨網(wǎng)格監(jiān)控的總體成本。

服務(wù)級(jí)別指標(biāo)

除了代理級(jí)別指標(biāo)之外，Istio 還提供了一組用于監(jiān)控服務(wù)通信的面向服務(wù)的指標(biāo)。這些指標(biāo)涵蓋了四個(gè)基本的服務(wù)監(jiān)控需求：延遲、流量、錯(cuò)誤和飽和情況。而且 Istio 還自帶了一組默認(rèn)的儀表板，用于監(jiān)控基于這些指標(biāo)的服務(wù)行為。默認(rèn)情況下，標(biāo)準(zhǔn) Istio 指標(biāo)會(huì)導(dǎo)出到 Prometheus。而且服務(wù)級(jí)別指標(biāo)的使用完全是可選的，運(yùn)維人員可以根據(jù)自身的需求來(lái)選擇關(guān)閉指標(biāo)的生成和收集。

控制平面指標(biāo)

另外 Istio 控制平面還提供了一組自我監(jiān)控指標(biāo)。這些指標(biāo)允許監(jiān)控 Istio 自己的行為。

通過(guò) Prometheus 查詢(xún)指標(biāo)

Istio 默認(rèn)使用 Prometheus 來(lái)收集和存儲(chǔ)指標(biāo)。Prometheus 是一個(gè)開(kāi)源的系統(tǒng)監(jiān)控和警報(bào)工具包，它可以從多個(gè)源收集指標(biāo)，并允許運(yùn)維人員通過(guò) PromQL 查詢(xún)語(yǔ)言來(lái)查詢(xún)收集到的指標(biāo)。

首先要確保 Istio 的 prometheus 組件已經(jīng)啟用，如果沒(méi)有啟用可以通過(guò)以下命令啟用：

kubectl apply -f samples/addons

上面的命令會(huì)安裝 Kiali，包括 Prometheus、Grafana 以及 jaeger。當(dāng)然這僅僅只能用于測(cè)試環(huán)境，在生產(chǎn)環(huán)境可以單獨(dú)安裝 Prometheus 進(jìn)行有針對(duì)性的配置優(yōu)化。

安裝后可以通過(guò)以下命令查看 Prometheus 服務(wù)狀態(tài)：

$ kubectl get svc prometheus -n istio-system
NAME         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
prometheus   ClusterIP   10.106.228.196   <none>        9090/TCP   25d
$ kubectl get pods -n istio-system -l app=prometheus
NAME                         READY   STATUS    RESTARTS       AGE
prometheus-5d5d6d6fc-2gtxm   2/2     Running   0              25d

然后我們還是以 Bookinfo 應(yīng)用為例，首先在瀏覽器中訪問(wèn) http://$GATEWAY_URL/productpage 應(yīng)用，然后我們就可以打開(kāi) Prometheus UI 來(lái)查看指標(biāo)了。在 Kubernetes 環(huán)境中，執(zhí)行如下命令就可以打開(kāi) Prometheus UI：

istioctl dashboard prometheus
# 也可以創(chuàng)建 Ingress 或者 Gateway 來(lái)訪問(wèn) Prometheus UI

打開(kāi)后我們可以在頁(yè)面中隨便查詢(xún)一個(gè)指標(biāo)，比如我們查詢(xún) istio_requests_total 指標(biāo)，如下所示：

查詢(xún)指標(biāo)

istio_requests_total 這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于記錄 Istio 代理處理的總請(qǐng)求數(shù)。

當(dāng)然然后可以根據(jù)自己需求來(lái)編寫(xiě) promql 語(yǔ)句進(jìn)行查詢(xún)，比如查詢(xún) productpage 服務(wù)的總次數(shù)，可以用下面的語(yǔ)句：

istio_requests_total{destination_service="productpage.default.svc.cluster.local"}

查詢(xún) reviews 服務(wù) v3 版本的總次數(shù)：

istio_requests_total{destination_service="reviews.default.svc.cluster.local", destination_versinotallow="v3"}

該查詢(xún)返回所有請(qǐng)求 reviews 服務(wù) v3 版本的當(dāng)前總次數(shù)。

過(guò)去 5 分鐘 productpage 服務(wù)所有實(shí)例的請(qǐng)求頻次：

rate(istio_requests_total{destination_service=~"productpage.*", response_code="200"}[5m])

在 Graph 選項(xiàng)卡中，可以看到查詢(xún)結(jié)果的圖形化表示。

Graph

對(duì)于 PromQL 語(yǔ)句的使用可以參考官方文檔 Prometheus Querying Basics，或者我們的《Prometheus 入門(mén)到實(shí)戰(zhàn)》課程，這并不是我們這里的重點(diǎn)，所以就不再詳細(xì)介紹了。

雖然我們這里并沒(méi)有做任何的配置，但是 Istio 默認(rèn)已經(jīng)為我們收集了一些指標(biāo)，所以我們可以直接查詢(xún)到這些指標(biāo)了。

使用 Grafana 可視化指標(biāo)

Prometheus 提供了一個(gè)基本的 UI 來(lái)查詢(xún)指標(biāo)，但是它并不是一個(gè)完整的監(jiān)控系統(tǒng)，更多的時(shí)候我們可以使用 Grafana 來(lái)可視化指標(biāo)。

首先同樣要保證 Istio 的 grafana 組件已經(jīng)啟用，如果沒(méi)有啟用可以通過(guò)以下命令啟用：

kubectl apply -f samples/addons

并且要保證 Prometheus 服務(wù)正在運(yùn)行，服務(wù)安裝后可以通過(guò)下面的命令來(lái)查看狀態(tài)：

$ kubectl -n istio-system get svc grafana
NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
grafana   ClusterIP   10.96.197.74   <none>        3000/TCP   25d
$ kubectl -n istio-system get pods -l app=grafana
NAME                       READY   STATUS    RESTARTS       AGE
grafana-5f9b8c6c5d-jv65v   1/1     Running   0              25d

然后我們可以通過(guò)以下命令來(lái)打開(kāi) Grafana UI：

istioctl dashboard grafana
# 也可以創(chuàng)建 Ingress 或者 Gateway 來(lái)訪問(wèn) Grafana

然后我們就可以在瀏覽器中打開(kāi) Grafana UI 了，默認(rèn)情況下 Grafana 已經(jīng)配置了 Prometheus 數(shù)據(jù)源，所以我們可以直接使用 Prometheus 數(shù)據(jù)源來(lái)查詢(xún)指標(biāo)。

數(shù)據(jù)源

此外 Grafana 也已經(jīng)內(nèi)置了 Istio 的一些儀表盤(pán)，我們可以直接使用這些儀表盤(pán)來(lái)查看指標(biāo)，比如我們可以打開(kāi) Istio Mesh Dashboard 儀表盤(pán)來(lái)查看網(wǎng)格的指標(biāo)：

Dashboard

從圖中可以看出現(xiàn)在有一些數(shù)據(jù)，但是并不是很多，這是因?yàn)槲覀儸F(xiàn)在還沒(méi)產(chǎn)生一些流量請(qǐng)求，下面我們可以用下面的命令向 productpage 服務(wù)發(fā)送 100 個(gè)請(qǐng)求：

for i in $(seq 1 100); do curl -s -o /dev/null "http://$GATEWAY_URL/productpage"; done

然后我們?cè)俅尾榭?Istio Mesh Dashboard，它應(yīng)該反映所產(chǎn)生的流量，如下所示：

Mesh Dashboard

當(dāng)然除此之外我們也可以查看到 Service 或者 Workload 的指標(biāo)，比如我們可以查看 productpage 工作負(fù)載的指標(biāo)：

workload dashboard

這里給出了每一個(gè)工作負(fù)載，以及該工作負(fù)載的入站工作負(fù)載（將請(qǐng)求發(fā)送到該工作負(fù)載的工作負(fù)載）和出站服務(wù)（此工作負(fù)載向其發(fā)送請(qǐng)求的服務(wù)）的詳細(xì)指標(biāo)。

Istio Dashboard 主要包括三個(gè)主要部分：

網(wǎng)格摘要視圖：這部分提供網(wǎng)格的全局摘要視圖，并顯示網(wǎng)格中（HTTP/gRPC 和 TCP）的工作負(fù)載。
單獨(dú)的服務(wù)視圖：這部分提供關(guān)于網(wǎng)格中每個(gè)單獨(dú)的（HTTP/gRPC 和 TCP）服務(wù)的請(qǐng)求和響應(yīng)指標(biāo)。這部分也提供關(guān)于該服務(wù)的客戶(hù)端和服務(wù)工作負(fù)載的指標(biāo)。
單獨(dú)的工作負(fù)載視圖：這部分提供關(guān)于網(wǎng)格中每個(gè)單獨(dú)的（HTTP/gRPC 和 TCP）工作負(fù)載的請(qǐng)求和響應(yīng)指標(biāo)。這部分也提供關(guān)于該工作負(fù)載的入站工作負(fù)載和出站服務(wù)的指標(biāo)。

指標(biāo)采集原理

從上面的例子我們可以看出當(dāng)我們安裝了 Istio 的 Prometheus 插件后，Istio 就會(huì)自動(dòng)收集一些指標(biāo)，但是我們并沒(méi)有做任何的配置，那么 Istio 是如何收集指標(biāo)的呢？如果我們想使用我們自己的 Prometheus 來(lái)收集指標(biāo)，那么我們應(yīng)該如何配置呢？

首先我們需要去查看下 Istio 的 Prometheus 插件的配置，通過(guò) cat samples/addons/prometheus.yaml 命令查看配置文件，如下所示：

# Source: prometheus/templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    component: "server"
    app: prometheus
    release: prometheus
    chart: prometheus-19.6.1
    heritage: Helm
  name: prometheus
  namespace: istio-system
spec:
  ports:
    - name: http
      port: 9090
      protocol: TCP
      targetPort: 9090
  selector:
    component: "server"
    app: prometheus
    release: prometheus
  sessionAffinity: None
  type: "ClusterIP"
---
# Source: prometheus/templates/deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: "server"
    app: prometheus
    release: prometheus
    chart: prometheus-19.6.1
    heritage: Helm
  name: prometheus
  namespace: istio-system
spec:
  selector:
    matchLabels:
      component: "server"
      app: prometheus
      release: prometheus
  replicas: 1
  strategy:
    type: Recreate
    rollingUpdate: null
  template:
    metadata:
      labels:
        component: "server"
        app: prometheus
        release: prometheus
        chart: prometheus-19.6.1
        heritage: Helm
        sidecar.istio.io/inject: "false"
    spec:
      enableServiceLinks: true
      serviceAccountName: prometheus
      containers:
        - name: prometheus-server-configmap-reload
          image: "jimmidyson/configmap-reload:v0.8.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - --volume-dir=/etc/config
            - --webhook-url=http://127.0.0.1:9090/-/reload
          resources: {}
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
              readOnly: true
        - name: prometheus-server
          image: "prom/prometheus:v2.41.0"
          imagePullPolicy: "IfNotPresent"
          args:
            - --storage.tsdb.retention.time=15d
            - --config.file=/etc/config/prometheus.yml
            - --storage.tsdb.path=/data
            - --web.console.libraries=/etc/prometheus/console_libraries
            - --web.console.templates=/etc/prometheus/consoles
            - --web.enable-lifecycle
          ports:
            - containerPort: 9090
          readinessProbe:
            httpGet:
              path: /-/ready
              port: 9090
              scheme: HTTP
            initialDelaySeconds: 0
            periodSeconds: 5
            timeoutSeconds: 4
            failureThreshold: 3
            successThreshold: 1
          livenessProbe:
            httpGet:
              path: /-/healthy
              port: 9090
              scheme: HTTP
            initialDelaySeconds: 30
            periodSeconds: 15
            timeoutSeconds: 10
            failureThreshold: 3
            successThreshold: 1
          resources: {}
          volumeMounts:
            - name: config-volume
              mountPath: /etc/config
            - name: storage-volume
              mountPath: /data
              subPath: ""
      dnsPolicy: ClusterFirst
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
      terminationGracePeriodSeconds: 300
      volumes:
        - name: config-volume
          configMap:
            name: prometheus
        - name: storage-volume
          emptyDir: {}
# 省略了部分配置

從上面的資源清單中可以看出 Prometheus 服務(wù)的核心配置文件為 --config.file=/etc/config/prometheus.yml，而該配置文件是通過(guò)上面的 prometheus 這個(gè) ConfigMap 以 volume 形式掛載到容器中的，所以我們重點(diǎn)是查看這個(gè) ConfigMap 的配置，如下所示：

# Source: prometheus/templates/cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  labels:
    component: "server"
    app: prometheus
    release: prometheus
    chart: prometheus-19.6.1
    heritage: Helm
  name: prometheus
  namespace: istio-system
data:
  allow-snippet-annotations: "false"
  alerting_rules.yml: |
    {}
  alerts: |
    {}
  prometheus.yml: |
    global:
      evaluation_interval: 1m
      scrape_interval: 15s
      scrape_timeout: 10s
    rule_files:
    - /etc/config/recording_rules.yml
    - /etc/config/alerting_rules.yml
    - /etc/config/rules
    - /etc/config/alerts
    scrape_configs:
    - job_name: prometheus
      static_configs:
      - targets:
        - localhost:9090
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: kubernetes-apiservers
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: default;kubernetes;https
        source_labels:
        - __meta_kubernetes_namespace
        - __meta_kubernetes_service_name
        - __meta_kubernetes_endpoint_port_name
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: kubernetes-nodes
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - regex: (.+)
        replacement: /api/v1/nodes/$1/proxy/metrics
        source_labels:
        - __meta_kubernetes_node_name
        target_label: __metrics_path__
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      job_name: kubernetes-nodes-cadvisor
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - regex: (.+)
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
        source_labels:
        - __meta_kubernetes_node_name
        target_label: __metrics_path__
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        insecure_skip_verify: true
    - honor_labels: true
      job_name: kubernetes-service-endpoints
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape
      - action: drop
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (.+?)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: service
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node
    - honor_labels: true
      job_name: kubernetes-service-endpoints-slow
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (.+?)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_service_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_service_name
        target_label: service
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_node_name
        target_label: node
      scrape_interval: 5m
      scrape_timeout: 30s
    - honor_labels: true
      job_name: prometheus-pushgateway
      kubernetes_sd_configs:
      - role: service
      relabel_configs:
      - action: keep
        regex: pushgateway
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
    - honor_labels: true
      job_name: kubernetes-services
      kubernetes_sd_configs:
      - role: service
      metrics_path: /probe
      params:
        module:
        - http_2xx
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_service_annotation_prometheus_io_probe
      - source_labels:
        - __address__
        target_label: __param_target
      - replacement: blackbox
        target_label: __address__
      - source_labels:
        - __param_target
        target_label: instance
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - source_labels:
        - __meta_kubernetes_service_name
        target_label: service
    - honor_labels: true
      job_name: kubernetes-pods
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: drop
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
        - __meta_kubernetes_pod_phase
    - honor_labels: true
      job_name: kubernetes-pods-slow
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
      - action: replace
        regex: (https?)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scheme
        target_label: __scheme__
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: (\d+);(([A-Fa-f0-9]{1,4}::?){1,7}[A-Fa-f0-9]{1,4})
        replacement: '[$2]:$1'
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: replace
        regex: (\d+);((([0-9]+?)(\.|$)){4})
        replacement: $2:$1
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        - __meta_kubernetes_pod_ip
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
        replacement: __param_$1
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: pod
      - action: drop
        regex: Pending|Succeeded|Failed|Completed
        source_labels:
        - __meta_kubernetes_pod_phase
      scrape_interval: 5m
      scrape_timeout: 30s
  recording_rules.yml: |
    {}
  rules: |
    {}
---

這個(gè)配置文件中描述了 6 個(gè)指標(biāo)抓取任務(wù)的配置：

prometheus：抓取 Prometheus 服務(wù)自身的指標(biāo)。
kubernetes-apiservers：抓取 Kubernetes API 服務(wù)器的指標(biāo)。
kubernetes-nodes：抓取 Kubernetes 節(jié)點(diǎn)的指標(biāo)。
kubernetes-nodes-cadvisor：抓取 Kubernetes 節(jié)點(diǎn)的 cadvisor 指標(biāo)，主要包括容器的 CPU、內(nèi)存、網(wǎng)絡(luò)、磁盤(pán)等指標(biāo)。
kubernetes-service-endpoints：抓取 Kubernetes 服務(wù)端點(diǎn)的指標(biāo)。
kubernetes-pods：抓取 Kubernetes Pod 的指標(biāo)。

prometheus 配置

這里我們可以重點(diǎn)關(guān)注下 kubernetes-pods 這個(gè)指標(biāo)抓取任務(wù)的配置，因?yàn)槲覀兇蟛糠值闹笜?biāo)數(shù)據(jù)都是通過(guò) Pod 的 Envoy Sidecar 來(lái)提供的。

從配置上可以看到這是基于 pod 的服務(wù)發(fā)現(xiàn)方式：

首先只會(huì)保留 __meta_kubernetes_pod_annotation_prometheus_io_scrape 這個(gè)源標(biāo)簽為 true 的指標(biāo)數(shù)據(jù)，這個(gè)源標(biāo)簽表示的是如果 Pod 的 annotation 注解中有 prometheus.io/scrape 標(biāo)簽，且值為 true，則會(huì)保留該指標(biāo)數(shù)據(jù)，否則會(huì)丟棄該指標(biāo)數(shù)據(jù)。
然后根據(jù) prometheus.io/scheme 注解來(lái)配置協(xié)議為 http 或者 https。
根據(jù) prometheus.io/path 注解來(lái)配置抓取路徑。
根據(jù) prometheus.io/port 注解來(lái)配置抓取端口。
將 prometheus.io/param 注解的值映射為 Prometheus 的標(biāo)簽。
然后還會(huì)將 pod 的標(biāo)簽通過(guò) labelmap 映射為 Prometheus 的標(biāo)簽；最后還會(huì)將 pod 的 namespace 和 pod 的名稱(chēng)映射為 Prometheus 的標(biāo)簽。
最后需要判斷 Pod 的 phase 狀態(tài)，只有當(dāng) Pod 的 phase 狀態(tài)為 Running 時(shí)才會(huì)保留該指標(biāo)數(shù)據(jù)，否則會(huì)丟棄該指標(biāo)數(shù)據(jù)。

比如我們查詢(xún) istio_requests_total{app="productpage", destination_app="details"} 這個(gè)指標(biāo)，如下所示：

測(cè)試

該查詢(xún)語(yǔ)句的查詢(xún)結(jié)果為：

istio_requests_total{
    app="details",
    connection_security_policy="mutual_tls",
    destination_app="details",
    destination_canonical_revisinotallow="v1",
    destination_canonical_service="details",
    destination_cluster="Kubernetes",
    destination_principal="spiffe://cluster.local/ns/default/sa/bookinfo-details",
    destination_service="details.default.svc.cluster.local",
    destination_service_name="details",
    destination_service_namespace="default",
    destination_versinotallow="v1",
    destination_workload="details-v1",
    destination_workload_namespace="default",
    instance="10.244.2.74:15020",
    job="kubernetes-pods",
    namespace="default",
    pod="details-v1-5f4d584748-9fflw",
    pod_template_hash="5f4d584748",
    reporter="destination",
    request_protocol="http",
    response_code="200",
    response_flags="-",
    security_istio_io_tlsMode="istio",
    service_istio_io_canonical_name="details",
    service_istio_io_canonical_revisinotallow="v1",
    source_app="productpage",
    source_canonical_revisinotallow="v1",
    source_canonical_service="productpage",
    source_cluster="Kubernetes",
    source_principal="spiffe://cluster.local/ns/default/sa/bookinfo-productpage",
    source_versinotallow="v1",
    source_workload="productpage-v1",
    source_workload_namespace="default",
    versinotallow="v1"
}  362

該查詢(xún)表示的是從 productpage 服務(wù)到 details 服務(wù)的請(qǐng)求總次數(shù)，從查詢(xún)結(jié)果可以看出該指標(biāo)就是來(lái)源于 job="kubernetes-pods" 這個(gè)指標(biāo)抓取任務(wù)，那說(shuō)明這個(gè)指標(biāo)數(shù)據(jù)是通過(guò)服務(wù)發(fā)現(xiàn)方式從 Pod 中抓取的。我們可以查看下 productpage Pod 的信息，如下所示：

$ kubectl get pods productpage-v1-564d4686f-l8kxr -oyaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    istio.io/rev: default
    kubectl.kubernetes.io/default-container: productpage
    kubectl.kubernetes.io/default-logs-container: productpage
    prometheus.io/path: /stats/prometheus
    prometheus.io/port: "15020"
    prometheus.io/scrape: "true"
    sidecar.istio.io/status: '{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-envoy","istio-data","istio-podinfo","istio-token","istiod-ca-cert"],"imagePullSecrets":null,"revision":"default"}'
  labels:
    app: productpage
    pod-template-hash: 564d4686f
    security.istio.io/tlsMode: istio
    service.istio.io/canonical-name: productpage
    service.istio.io/canonical-revision: v1
    version: v1
  name: productpage-v1-564d4686f-l8kxr
  namespace: default
spec:
  containers:
  - image: docker.io/istio/examples-bookinfo-productpage-v1:1.18.0
    imagePullPolicy: IfNotPresent
# ......

我們從上面的資源清單中可以看到該 Pod 包含如下幾個(gè)注解：

prometheus.io/path: /stats/prometheus
prometheus.io/port: "15020"
prometheus.io/scrape: "true"

這些注解就是用來(lái)配置 Prometheus 服務(wù)發(fā)現(xiàn)的，其中 prometheus.io/scrape: "true" 表示該 Pod 的指標(biāo)數(shù)據(jù)是需要被抓取的，而 prometheus.io/path: /stats/prometheus 和 prometheus.io/port: "15020" 則是用來(lái)配置抓取路徑和抓取端口的，當(dāng) Prometheus 發(fā)現(xiàn)這個(gè) Pod 后根據(jù)配置就可以通過(guò) <pod ip>:15020/stats/prometheus 這個(gè)路徑來(lái)抓取該 Pod 的指標(biāo)數(shù)據(jù)了，這個(gè)路徑就是 Envoy Sidecar 提供的 /stats/prometheus 路徑，而 15020 則是 Envoy Sidecar 的端口，這個(gè)端口是通過(guò) istio-proxy 這個(gè)容器配置的靜態(tài)監(jiān)聽(tīng)器暴露出來(lái)的。

當(dāng)然定義的標(biāo)簽也被映射為 Prometheus 的標(biāo)簽了，從結(jié)果來(lái)看除了 Pod 的這些標(biāo)簽之外，Envoy Sidecar 也會(huì)自己添加很多相關(guān)標(biāo)簽，主要是標(biāo)明 destination 和 source 的信息，有了這些標(biāo)簽我們就可以很方便的對(duì)指標(biāo)進(jìn)行查詢(xún)了。Envoy Sidecar 自行添加的一些主要標(biāo)簽如下所示：

reporter：標(biāo)識(shí)請(qǐng)求指標(biāo)的上報(bào)端，如果指標(biāo)由服務(wù)端 Istio 代理上報(bào)，則設(shè)置為 destination，如果指標(biāo)由客戶(hù)端 Istio 代理或網(wǎng)關(guān)上報(bào)，則設(shè)置為 source。
source_workload：標(biāo)識(shí)源工作負(fù)載的名稱(chēng)，如果缺少源信息，則標(biāo)識(shí)為 unknown。
source_workload_namespace：標(biāo)識(shí)源工作負(fù)載的命名空間，如果缺少源信息，則標(biāo)識(shí)為 unknown。
source_principal：標(biāo)識(shí)流量源的對(duì)等主體，當(dāng)使用對(duì)等身份驗(yàn)證時(shí)設(shè)置。
source_app：根據(jù)源工作負(fù)載的 app 標(biāo)簽標(biāo)識(shí)源應(yīng)用程序，如果源信息丟失，則標(biāo)識(shí)為 unknown。
source_version：標(biāo)識(shí)源工作負(fù)載的版本，如果源信息丟失，則標(biāo)識(shí)為 unknown。
destination_workload：標(biāo)識(shí)目標(biāo)工作負(fù)載的名稱(chēng)，如果目標(biāo)信息丟失，則標(biāo)識(shí)為 unknown。
destination_workload_namespace：標(biāo)識(shí)目標(biāo)工作負(fù)載的命名空間，如果目標(biāo)信息丟失，則標(biāo)識(shí)為 unknown。
destination_principal：標(biāo)識(shí)流量目標(biāo)的對(duì)等主體，使用對(duì)等身份驗(yàn)證時(shí)設(shè)置。
destination_app：它根據(jù)目標(biāo)工作負(fù)載的 app 標(biāo)簽標(biāo)識(shí)目標(biāo)應(yīng)用程序，如果目標(biāo)信息丟失，則標(biāo)識(shí)為 unknown。
destination_version：標(biāo)識(shí)目標(biāo)工作負(fù)載的版本，如果目標(biāo)信息丟失，則標(biāo)識(shí)為 unknown。
destination_service：標(biāo)識(shí)負(fù)責(zé)傳入請(qǐng)求的目標(biāo)服務(wù)主機(jī)，例如：details.default.svc.cluster.local。
destination_service_name：標(biāo)識(shí)目標(biāo)服務(wù)名稱(chēng)，例如 details。
destination_service_namespace：標(biāo)識(shí)目標(biāo)服務(wù)的命名空間。
request_protocol：標(biāo)識(shí)請(qǐng)求的協(xié)議，設(shè)置為請(qǐng)求或連接協(xié)議。
response_code：標(biāo)識(shí)請(qǐng)求的響應(yīng)代碼，此標(biāo)簽僅出現(xiàn)在 HTTP 指標(biāo)上。
connection_security_policy：標(biāo)識(shí)請(qǐng)求的服務(wù)認(rèn)證策略，當(dāng) Istio 使用安全策略來(lái)保證通信安全時(shí)，如果指標(biāo)由服務(wù)端 Istio 代理上報(bào)，則將其設(shè)置為 mutual_tls。如果指標(biāo)由客戶(hù)端 Istio 代理上報(bào)，由于無(wú)法正確填充安全策略，因此將其設(shè)置為 unknown。
response_flags：有關(guān)來(lái)自代理的響應(yīng)或連接的其他詳細(xì)信息。
Canonical Service：工作負(fù)載屬于一個(gè) Canonical 服務(wù)，而 Canonical 服務(wù)卻可以屬于多個(gè)服務(wù)。Canonical 服務(wù)具有名稱(chēng)和修訂版本，因此會(huì)產(chǎn)生以下標(biāo)簽：

source_canonical_service
source_canonical_revision
destination_canonical_service
destination_canonical_revision

destination_cluster：目標(biāo)工作負(fù)載的集群名稱(chēng)，這是由集群安裝時(shí)的 global.multiCluster.clusterName 設(shè)置的。
source_cluster：源工作負(fù)載的集群名稱(chēng)，這是由集群安裝時(shí)的 global.multiCluster.clusterName 設(shè)置的。
grpc_response_status: 這標(biāo)識(shí)了 gRPC 的響應(yīng)狀態(tài)，這個(gè)標(biāo)簽僅出現(xiàn)在 gRPC 指標(biāo)上。

對(duì)于 Istio 來(lái)說(shuō)包括 COUNTER 和 DISTRIBUTION 兩種指標(biāo)類(lèi)型，這兩種指標(biāo)類(lèi)型對(duì)應(yīng)我們比較熟悉的計(jì)數(shù)器和直方圖。

對(duì)于 HTTP，HTTP/2 和 GRPC 通信，Istio 生成以下指標(biāo)：

請(qǐng)求數(shù) (istio_requests_total)：這都是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于記錄 Istio 代理處理的總請(qǐng)求數(shù)。
請(qǐng)求時(shí)長(zhǎng) (istio_request_duration_milliseconds)：這是一個(gè) DISTRIBUTION 類(lèi)型的指標(biāo)，用于測(cè)量請(qǐng)求的持續(xù)時(shí)間。
請(qǐng)求體大小 (istio_request_bytes)：這是一個(gè) DISTRIBUTION 類(lèi)型的指標(biāo)，用來(lái)測(cè)量 HTTP 請(qǐng)求主體大小。
響應(yīng)體大小 (istio_response_bytes)：這是一個(gè) DISTRIBUTION 類(lèi)型的指標(biāo)，用來(lái)測(cè)量 HTTP 響應(yīng)主體大小。
gRPC 請(qǐng)求消息數(shù) (istio_request_messages_total)：這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于記錄從客戶(hù)端發(fā)送的 gRPC 消息總數(shù)。
gRPC 響應(yīng)消息數(shù) (istio_response_messages_total)：這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于記錄從服務(wù)端發(fā)送的 gRPC 消息總數(shù)。

對(duì)于 TCP 流量，Istio 生成以下指標(biāo)：

TCP 發(fā)送字節(jié)大小 (istio_tcp_sent_bytes_total)：這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于測(cè)量在 TCP 連接情況下響應(yīng)期間發(fā)送的總字節(jié)數(shù)。
TCP 接收字節(jié)大小 (istio_tcp_received_bytes_total)：這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于測(cè)量在 TCP 連接情況下請(qǐng)求期間接收到的總字節(jié)數(shù)。
TCP 已打開(kāi)連接數(shù) (istio_tcp_connections_opened_total)：這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于記錄 TCP 已打開(kāi)的連接總數(shù)。
TCP 已關(guān)閉連接數(shù) (istio_tcp_connections_closed_total)：這是一個(gè) COUNTER 類(lèi)型的指標(biāo)，用于記錄 TCP 已關(guān)閉的連接總數(shù)。

當(dāng)我們了解了 Istio 指標(biāo)數(shù)據(jù)采集的原理后，我們就可以根據(jù)自身的需求來(lái)定制了，比如在我們的監(jiān)控系統(tǒng)采樣的是 Prometheus Operator 方式，那么也應(yīng)該知道該如何來(lái)配置采集這些指標(biāo)數(shù)據(jù)了。

自定義指標(biāo)

除了 Istio 自帶的指標(biāo)外，我們還可以自定義指標(biāo)，要自定指標(biāo)需要用到 Istio 提供的 Telemetry API，該 API 能夠靈活地配置指標(biāo)、訪問(wèn)日志和追蹤數(shù)據(jù)。Telemetry API 現(xiàn)在已經(jīng)成為 Istio 中的主流 API。

需要注意的是，Telemetry API 無(wú)法與 EnvoyFilter 一起使用。請(qǐng)查看此問(wèn)題 issue。

從 Istio 版本 1.18 版本開(kāi)始，Prometheus 的 EnvoyFilter 默認(rèn)不會(huì)被安裝，而是通過(guò) meshConfig.defaultProviders 來(lái)啟用它，我們應(yīng)該使用 Telemetry API 來(lái)進(jìn)一步定制遙測(cè)流程，新的 Telemetry API 不但語(yǔ)義更加清晰，功能也一樣沒(méi)少。對(duì)于 Istio 1.18 之前的版本，應(yīng)該使用以下的 IstioOperator 配置進(jìn)行安裝：

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
spec:
  values:
    telemetry:
      enabled: true
      v2:
        enabled: false

Telemetry 資源對(duì)象的定義如下所示：

$ kubectl explain Telemetry.spec
GROUP:      telemetry.istio.io
KIND:       Telemetry
VERSION:    v1alpha1

FIELD: spec <Object>

DESCRIPTION:
    Telemetry configuration for workloads. See more details at:
    https://istio.io/docs/reference/config/telemetry.html

FIELDS:
  accessLogging <[]Object>
    Optional.

  metrics       <[]Object>
    Optional.

  selector      <Object>
    Optional.

  tracing       <[]Object>
    Optional.

可以看到 Telemetry 資源對(duì)象包含了 accessLogging、metrics、selector 和 tracing 四個(gè)字段，其中 accessLogging 和 tracing 字段用于配置訪問(wèn)日志和追蹤數(shù)據(jù)，而 metrics 字段用于配置指標(biāo)數(shù)據(jù)，selector 字段用于配置哪些工作負(fù)載需要采集指標(biāo)數(shù)據(jù)。

我們這里先來(lái)看下 metrics 字段的配置，該字段的定義如下所示：

$ kubectl explain Telemetry.spec.metrics
GROUP:      telemetry.istio.io
KIND:       Telemetry
VERSION:    v1alpha1

FIELD: metrics <[]Object>

DESCRIPTION:
    Optional.

FIELDS:
  overrides     <[]Object>
    Optional.

  providers     <[]Object>
    Optional.

  reportingInterval     <string>
    Optional.

可以看到 metrics 字段包含了 overrides、providers 和 reportingInterval 三個(gè)字段。

overrides 字段用于配置指標(biāo)數(shù)據(jù)的采集方式。
providers 字段用于配置指標(biāo)數(shù)據(jù)的提供者，這里一般配置為 prometheus。
reportingInterval 字段用于配置指標(biāo)數(shù)據(jù)的上報(bào)間隔，可選的。目前僅支持 TCP 度量，但將來(lái)可能會(huì)將其用于長(zhǎng)時(shí)間的 HTTP 流。默認(rèn)持續(xù)時(shí)間為 5 秒。

刪除標(biāo)簽

比如以前需要在 Istio 配置的 meshConfig 部分配置遙測(cè)，這種方式不是很方便。比如我們想從 Istio 指標(biāo)中刪除一些標(biāo)簽以減少基數(shù)，那么你的配置中可能有這樣一個(gè)部分：

# istiooperator.yaml
telemetry:
  enabled: true
  v2:
    enabled: true
    prometheus:
      enabled: true
      configOverride:
        outboundSidecar:
          debug: false
          stat_prefix: istio
          metrics:
            - tags_to_remove:
                - destination_canonical_service
                  ...

現(xiàn)在我們可以通過(guò) Telemetry API 來(lái)配置，如下所示：

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: remove-tags
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus # 指定指標(biāo)數(shù)據(jù)的提供者
      overrides:
        - match: # 提供覆蓋的范圍，可用于選擇個(gè)別指標(biāo)，以及生成指標(biāo)的工作負(fù)載模式（服務(wù)器和/或客戶(hù)端）。如果未指定，則overrides 將應(yīng)用于兩種操作模式（客戶(hù)端和服務(wù)器）的所有指標(biāo)。
            metric: ALL_METRICS # Istio 標(biāo)準(zhǔn)指標(biāo)之一
            mode: CLIENT_AND_SERVER # 控制選擇的指標(biāo)生成模式：客戶(hù)端和/或服務(wù)端。
          tagOverrides: # 要覆蓋的標(biāo)簽列表
            destination_canonical_service:
              operation: REMOVE
          # disabled: true  # 是否禁用指標(biāo)

在上面的 Telemetry 資源對(duì)象中我們指定了一個(gè) metrics 字段，表示用來(lái)自定義指標(biāo)的，然后通過(guò) providers.name 字段指定指標(biāo)數(shù)據(jù)的提供者為 prometheus，然后最重要的是 overrides 字段，用于配置指標(biāo)數(shù)據(jù)的采集方式。

其中 overrides.match.metric 字段用來(lái)指定要覆蓋的 Istio 標(biāo)準(zhǔn)指標(biāo)，支持指標(biāo)如下所示：

名稱(chēng)	描述
ALL_METRICS	使用這個(gè)枚舉表示應(yīng)將覆蓋應(yīng)用于所有 Istio 默認(rèn)指標(biāo)。
REQUEST_COUNT	對(duì)應(yīng)用程序的請(qǐng)求計(jì)數(shù)器，適用于 HTTP、HTTP/2 和 GRPC 流量。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_requests_total。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/request_count（服務(wù)器模式）istio.io/service/client/request_count（客戶(hù)端模式）
REQUEST_DURATION	請(qǐng)求持續(xù)時(shí)間的直方圖，適用于 HTTP、HTTP/2 和 GRPC 流量。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_request_duration_milliseconds。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/response_latencies（服務(wù)器模式）istio.io/service/client/roundtrip_latencies（客戶(hù)端模式）
REQUEST_SIZE	請(qǐng)求體大小的直方圖，適用于 HTTP、HTTP/2 和 GRPC 流量。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_request_bytes。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/request_bytes（服務(wù)器模式）istio.io/service/client/request_bytes（客戶(hù)端模式）
RESPONSE_SIZE	響應(yīng)體大小的直方圖，適用于 HTTP、HTTP/2 和 GRPC 流量。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_response_bytes。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/response_bytes（服務(wù)器模式）istio.io/service/client/response_bytes（客戶(hù)端模式）
TCP_OPENED_CONNECTIONS	工作負(fù)載生命周期中打開(kāi)的 TCP 連接計(jì)數(shù)器。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_tcp_connections_opened_total。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/connection_open_count（服務(wù)器模式）istio.io/service/client/connection_open_count（客戶(hù)端模式）
TCP_CLOSED_CONNECTIONS	工作負(fù)載生命周期中關(guān)閉的 TCP 連接計(jì)數(shù)器。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_tcp_connections_closed_total。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/connection_close_count（服務(wù)器模式）istio.io/service/client/connection_close_count（客戶(hù)端模式）
TCP_SENT_BYTES	TCP 連接期間發(fā)送的響應(yīng)字節(jié)計(jì)數(shù)器。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_tcp_sent_bytes_total。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/sent_bytes_count（服務(wù)器模式）istio.io/service/client/sent_bytes_count（客戶(hù)端模式）
TCP_RECEIVED_BYTES	TCP 連接期間接收的請(qǐng)求字節(jié)計(jì)數(shù)器。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_tcp_received_bytes_total。Stackdriver 提供商將此指標(biāo)導(dǎo)出為：istio.io/service/server/received_bytes_count（服務(wù)器模式）istio.io/service/client/received_bytes_count（客戶(hù)端模式）
GRPC_REQUEST_MESSAGES	每發(fā)送一個(gè) gRPC 消息時(shí)遞增的客戶(hù)端計(jì)數(shù)器。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_request_messages_total
GRPC_RESPONSE_MESSAGES	每發(fā)送一個(gè) gRPC 消息時(shí)遞增的服務(wù)器計(jì)數(shù)器。Prometheus 提供商將此指標(biāo)導(dǎo)出為：istio_response_messages_total

比如我們這里配置的指標(biāo)為 ALL_METRICS 則表示要覆蓋所有的 Istio 標(biāo)準(zhǔn)指標(biāo)。

overrides.match.mode 則表示選擇網(wǎng)絡(luò)流量中底層負(fù)載的角色，如果負(fù)載是流量的目標(biāo)（從負(fù)載的角度看，流量方向是入站），則將其視為作為 SERVER 運(yùn)行。如果負(fù)載是網(wǎng)絡(luò)流量的源頭，則被視為處于 CLIENT 模式（流量從負(fù)載出站）。

名稱(chēng)	描述
CLIENT_AND_SERVER	選擇適用于工作負(fù)載既是網(wǎng)絡(luò)流量的源頭，又是目標(biāo)的場(chǎng)景。
CLIENT	選擇適用于工作負(fù)載是網(wǎng)絡(luò)流量的源頭的場(chǎng)景。
SERVER	選擇適用于工作負(fù)載是網(wǎng)絡(luò)流量的目標(biāo)的場(chǎng)景。

另外的 tagOverrides 字段表示要覆蓋選定的指標(biāo)中的標(biāo)簽名稱(chēng)和標(biāo)簽表達(dá)式的集合，該字段中的 key 是標(biāo)簽的名稱(chēng)，value 是對(duì)標(biāo)簽執(zhí)行的操作，可以添加、刪除標(biāo)簽，或覆蓋其默認(rèn)值。

字段	類(lèi)型	描述	是否必需
operation	Operation	操作控制是否更新/添加一個(gè)標(biāo)簽，或者移除它。	否
value	string	當(dāng)操作為 UPSERT 時(shí)才考慮值。值是基于屬性的 CEL 表達(dá)式。例如：string(destination.port) 和 request.host。Istio 暴露所有標(biāo)準(zhǔn)的 Envoy 屬性。此外，Istio 也將節(jié)點(diǎn)元數(shù)據(jù)作為屬性暴露出來(lái)。更多信息請(qǐng)參見(jiàn) 自定義指標(biāo)文檔。	否

對(duì)應(yīng)的操作 Operator 可以配置 UPSERT 和 REMOVE 兩個(gè)操作：

名稱(chēng)	描述
UPSERT	使用提供的值表達(dá)式插入或更新標(biāo)簽。如果使用 UPSERT 操作，則必須指定 value 字段。
REMOVE	指定標(biāo)簽在生成時(shí)不應(yīng)包含在指標(biāo)中。

現(xiàn)在我們直接應(yīng)用上面的這個(gè)資源對(duì)象，然后我們?cè)偃ピL問(wèn)下 productpage 應(yīng)用，再次驗(yàn)證下指標(biāo)數(shù)據(jù)中是否包含我們移除的 destination_canonical_service 標(biāo)簽。

刪除標(biāo)簽

從上面的結(jié)果可以看到，我們已經(jīng)成功刪除了 destination_canonical_service 標(biāo)簽，這樣就可以減少指標(biāo)數(shù)據(jù)的基數(shù)了，可以用同樣的方法再去刪除一些不需要的標(biāo)簽。

另外需要注意在 Telemetry 對(duì)象中我們還可以通過(guò) selector 字段來(lái)配置哪些工作負(fù)載應(yīng)用這個(gè)遙測(cè)策略，如果未設(shè)置，遙測(cè)策略將應(yīng)用于與遙測(cè)策略相同的命名空間中的所有工作負(fù)載，當(dāng)然如果是在 istio-system 命名空間中則會(huì)應(yīng)用于所有命名空間中的工作負(fù)載。

添加指標(biāo)

上面我們已經(jīng)介紹了如何刪除指標(biāo)中的標(biāo)簽，那么我們也可以通過(guò) Telemetry API 來(lái)添加指標(biāo)中的標(biāo)簽，如下所示：

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: add-tags
spec:
  metrics:
    - overrides:
        - match:
            metric: REQUEST_COUNT
            mode: CLIENT
          tagOverrides:
            destination_x:
              operation: UPSERT
              value: "upstream_peer.labels['app'].value" # 必須加上雙引號(hào)
        - match:
            metric: REQUEST_COUNT
          tagOverrides:
            destination_port:
              value: "string(destination.port)"
            request_host:
              value: "request.host"
      providers:
        - name: prometheus

在上面的這個(gè)資源對(duì)象中我們?cè)?nbsp;tagOverrides 中首先添加了如下的配置：

destination_x:
  operation: UPSERT
  value: "upstream_peer.labels['app'].value"

表示我們要添加一個(gè)名為 destination_x 的標(biāo)簽，然后通過(guò) value 字段指定標(biāo)簽的值為 upstream_peer.labels['app'].value，這個(gè)值是一個(gè) CEL 表達(dá)式（必須在 JSON 中用雙引號(hào)引用字符串）。Istio 暴露了所有標(biāo)準(zhǔn)的 Envoy 屬性，對(duì)于出站請(qǐng)求，對(duì)等方元數(shù)據(jù)作為上游對(duì)等方(upstream_peer)的屬性可用；對(duì)于入站請(qǐng)求，對(duì)等方元數(shù)據(jù)作為下游對(duì)等方(downstream_peer)的屬性可用，包含以下字段：

屬性	類(lèi)型	值
name	string	Pod 名
namespace	string	Pod 所在命名空間
labels	map	工作負(fù)載標(biāo)簽
owner	string	工作負(fù)載 owner
workload_name	string	工作負(fù)載名稱(chēng)
platform_metadata	map	平臺(tái)元數(shù)據(jù)
istio_version	string	代理的版本標(biāo)識(shí)
mesh_id	string	網(wǎng)格唯一 ID
app_containers	list<string>	應(yīng)用容器的名稱(chēng)列表
cluster_id	string	工作負(fù)載所屬的集群標(biāo)識(shí)

例如，用于出站配置中的對(duì)等應(yīng)用標(biāo)簽的表達(dá)式是 upstream_peer.labels['app'].value，所以上面我們最終添加的 destination_x 這個(gè)標(biāo)簽的值為上游對(duì)等方的 app 標(biāo)簽的值。

另外添加的兩個(gè)標(biāo)簽 destination_port 和 request_host 的值分別為 string(destination.port) 和 request.host，這兩個(gè)值就來(lái)源于暴露的 Envoy 屬性。

另外這個(gè)資源對(duì)象我們指定的是 default 命名空間，則只會(huì)對(duì) default 命名空間中的工作負(fù)載應(yīng)用這個(gè)遙測(cè)策略。

同樣應(yīng)用這個(gè)資源對(duì)象后，再次訪問(wèn) productpage 應(yīng)用產(chǎn)生指標(biāo)，現(xiàn)在我們可以看到指標(biāo)中已經(jīng)包含了我們添加的標(biāo)簽了。

添加標(biāo)簽

禁用指標(biāo)

對(duì)于禁用指標(biāo)則相對(duì)更簡(jiǎn)單了。比如我們通過(guò)以下配置禁用所有指標(biāo)：

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: remove-all-metrics
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - disabled: true
          match:
            mode: CLIENT_AND_SERVER
            metric: ALL_METRICS

通過(guò)以下配置禁用 REQUEST_COUNT 指標(biāo)：

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: remove-request-count
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - disabled: true
          match:
            mode: CLIENT_AND_SERVER
            metric: REQUEST_COUNT

通過(guò)以下配置禁用客戶(hù)端的 REQUEST_COUNT 指標(biāo)：

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: remove-client
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - disabled: true
          match:
            mode: CLIENT
            metric: REQUEST_COUNT

通過(guò)以下配置禁用服務(wù)端的 REQUEST_COUNT 指標(biāo)：

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: remove-server
  namespace: istio-system
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
        - disabled: true
          match:
            mode: SERVER
            metric: REQUEST_COUNT

到這里我們就了解了如何通過(guò) Telemetry API 來(lái)自定義指標(biāo)了，這樣我們就可以根據(jù)自身的需求來(lái)定制了。

責(zé)任編輯：姜華來(lái)源： k8s技術(shù)圈

指標(biāo)Istio

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<cite id="gsjm0"><track id="gsjm0"></track></cite>

<style id="gsjm0"></style>

<blockquote id="gsjm0"><cite id="gsjm0"><th id="gsjm0"></th></cite></blockquote>

<thead id="gsjm0"></thead>