自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

重新定義可視化：我的 Grafana 設(shè)計(jì)之旅

作者：劉俊夏 2025-01-07 14:09:58

開(kāi)發(fā) 前端

我們還需要搞清楚資源的處理：?哪些需要，哪些不需要；哪些需要優(yōu)化，哪些不需要優(yōu)化；哪些需要監(jiān)控，哪些不需要監(jiān)控。?這些我們搞清楚之后，往后面進(jìn)行就會(huì)比較清晰了。

引言

我們這一篇主要是關(guān)注在我們 Prometheus-Operator 相關(guān) Grafana YAML 文件。因?yàn)槲疫@邊不打算使用 Helm 安裝，所以，你懂。

我們還需要搞清楚資源的處理： 哪些需要，哪些不需要；哪些需要優(yōu)化，哪些不需要優(yōu)化；哪些需要監(jiān)控，哪些不需要監(jiān)控。 這些我們搞清楚之后，往后面進(jìn)行就會(huì)比較清晰了。

開(kāi)始

需要監(jiān)控的部分

應(yīng)用層監(jiān)控

應(yīng)用性能指標(biāo)：

? 響應(yīng)時(shí)間：監(jiān)控API響應(yīng)時(shí)間，確保服務(wù)的及時(shí)性。

? 吞吐量：請(qǐng)求數(shù)、事務(wù)數(shù)等，評(píng)估應(yīng)用的處理能力。

? 錯(cuò)誤率：監(jiān)控HTTP錯(cuò)誤碼（如4xx、5xx）及應(yīng)用內(nèi)部錯(cuò)誤。

業(yè)務(wù)指標(biāo)：

? 根據(jù)具體業(yè)務(wù)需求，監(jiān)控關(guān)鍵業(yè)務(wù)指標(biāo)（如用戶注冊(cè)數(shù)、訂單量等）。

? 日志監(jiān)控：

a.收集和分析應(yīng)用日志，及時(shí)發(fā)現(xiàn)和排查問(wèn)題。

資源使用情況

CPU 和內(nèi)存使用率：

? 監(jiān)控應(yīng)用實(shí)例的CPU和內(nèi)存使用，避免資源瓶頸。

網(wǎng)絡(luò)流量：

? 監(jiān)控入站和出站流量，確保網(wǎng)絡(luò)資源充足。

存儲(chǔ)使用：

? 如果應(yīng)用使用了存儲(chǔ)資源，監(jiān)控存儲(chǔ)的使用情況和性能。

函數(shù)調(diào)用監(jiān)控（Serverless 特有）

? 函數(shù)執(zhí)行次數(shù)：監(jiān)控函數(shù)的調(diào)用頻率，了解負(fù)載情況。

? 函數(shù)執(zhí)行時(shí)長(zhǎng)：確保函數(shù)執(zhí)行時(shí)間在預(yù)期范圍內(nèi)。

? 錯(cuò)誤率：監(jiān)控函數(shù)執(zhí)行失敗的比例，及時(shí)發(fā)現(xiàn)問(wèn)題。

安全監(jiān)控

? 訪問(wèn)控制：監(jiān)控異常訪問(wèn)行為，防范潛在的安全威脅。

? 漏洞掃描：定期掃描應(yīng)用和依賴庫(kù)的安全漏洞。

不需要監(jiān)控的部分

由于阿里云負(fù)責(zé)維護(hù)基礎(chǔ)設(shè)施和部分組件，以下部分通常不需要自行監(jiān)控：

基礎(chǔ)設(shè)施健康狀況：

? 如底層服務(wù)器、網(wǎng)絡(luò)設(shè)備、存儲(chǔ)設(shè)備的健康狀態(tài)，這些由阿里云負(fù)責(zé)監(jiān)控和維護(hù)。

Kubernetes 控制平面：

? 如 API 服務(wù)器、調(diào)度器、控制器管理器等組件的運(yùn)行狀況，阿里云會(huì)確保其高可用性和穩(wěn)定性。

基礎(chǔ)組件的日志和指標(biāo)：

? 如etcd、kubelet等組件的日志和性能指標(biāo)，這些通常由阿里云自動(dòng)處理。

監(jiān)控設(shè)計(jì)的最佳實(shí)踐

定義關(guān)鍵指標(biāo)（KPIs）：

? 明確哪些指標(biāo)對(duì)業(yè)務(wù)和應(yīng)用性能至關(guān)重要，優(yōu)先監(jiān)控這些指標(biāo)。

設(shè)置告警策略：

? 根據(jù)關(guān)鍵指標(biāo)設(shè)置合理的閾值和告警策略，確保問(wèn)題能及時(shí)被發(fā)現(xiàn)和處理。

可視化儀表盤(pán)：

? 創(chuàng)建直觀的儀表盤(pán)，實(shí)時(shí)展示關(guān)鍵指標(biāo)，便于監(jiān)控和分析。

定期審查和優(yōu)化：

? 定期回顧監(jiān)控?cái)?shù)據(jù)和策略，根據(jù)業(yè)務(wù)變化和應(yīng)用需求進(jìn)行優(yōu)化。

在使用阿里云 ACK Serverless 集群時(shí)，監(jiān)控重點(diǎn)應(yīng)放在應(yīng)用性能、業(yè)務(wù)指標(biāo)、資源使用情況以及安全方面。利用阿里云提供的監(jiān)控工具和服務(wù)，可以有效地實(shí)現(xiàn)全面的監(jiān)控，同時(shí)減輕運(yùn)維負(fù)擔(dān)。通過(guò)合理的監(jiān)控設(shè)計(jì)，可以確保應(yīng)用的穩(wěn)定性和性能，及時(shí)響應(yīng)潛在的問(wèn)題。

Prometheus-Operator Manifests

我們這邊使用的是最新版本的，重點(diǎn)主要兩部分：

CRDs

圖片

這些就是 Prometheus-Operator 會(huì)使用的 CRD。

API Resources

圖片

圖片

以上就是我們 Prometheus-Operator 將要使用的所有的 YAML 文件，我們可以分為兩個(gè)部分：

API Resources：

? RBAC

? NetworkPolicy

? Service

? ConfigMap

? Secret

? ServiceAccount

? PodDistruptionBudget

? 相關(guān)控制器文件

CRDs：

? ServiceMonitor

? PrometheusRule

? AlertManager

? Prometheus

重點(diǎn)在于 Grafana 和 Prometheus，我們這篇先 Grafana。

Grafana

我們前面的概念講解了我們要監(jiān)控的東西，和不要監(jiān)控的東西，所以，我們這里就直接把不需要的 Dashboard 直接給去掉了，因?yàn)榧菏亲酝泄艿?，所以，關(guān)于控制平面還有我們工作節(jié)點(diǎn)相關(guān)的監(jiān)控就不需要了。

Prometheus-Operator 里面默認(rèn)有很多：

? Alertmanager-overview

? APIserver

? Cluster-total

? Controller-manager

? Grafana-overview

? k8s-resources-custer

? k8s-resources-multicluster

? k8s-resources-namespace

? k8s-resources-node

? k8s-resources-pod

? k8s-resources-workload

? k8s-resources-workload-namespace

? Kubelet

? Namespace-by-pod

? Namespace-by-workload

? Node-cluster-rsrc-use

? Node-rsrc-use

? Node-aix

? Nodes-drawin

? Nodes

? Persistentvolumesusage

? Pod-total

? Prometheus-remote-write

? Prometheus

? Proxy

? Scheduler

? Workload-total

對(duì)于 ACK Serverless 集群，由于其無(wú)節(jié)點(diǎn) (Node-less) 和彈性架構(gòu)的特點(diǎn)，很多與傳統(tǒng) Kubernetes 物理節(jié)點(diǎn)相關(guān)的 Dashboard 可能沒(méi)有實(shí)際意義。

以下是列出的 Dashboard 的分類和建議：

推薦保留的 Dashboard

這些 Dashboard 與 Serverless 集群或核心服務(wù)的監(jiān)控相關(guān)，建議保留：

Alertmanager-overview

? 顯示 Alertmanager 的狀態(tài)和告警相關(guān)信息。

? 如果監(jiān)控系統(tǒng)中使用了 Alertmanager，保留該 Dashboard。

Cluster-total

? 監(jiān)控整個(gè)集群的總體資源使用情況和 Pod 狀態(tài)。

? 對(duì)于 Serverless 集群，關(guān)注 Pods 和整體負(fù)載是有意義的。

Grafana-overview

? 監(jiān)控 Grafana 本身的性能和數(shù)據(jù)源狀態(tài)。

? 適合用于查看 Grafana 的健康狀況。

k8s-resources-namespace

? 監(jiān)控不同命名空間的資源使用情況（如 CPU、內(nèi)存、Pod 數(shù)量）。

? 在 Serverless 集群中，命名空間仍然是資源隔離的主要手段，因此保留。

k8s-resources-pod

? 查看每個(gè) Pod 的資源使用情況。

? Serverless 集群中仍需關(guān)注 Pod 的狀態(tài)和資源消耗。

k8s-resources-workload

? 監(jiān)控工作負(fù)載（如 Deployment、StatefulSet）的運(yùn)行狀況。

? Serverless 集群中工作負(fù)載是重點(diǎn)，建議保留。

k8s-resources-workload-namespace

? 按命名空間查看工作負(fù)載資源的運(yùn)行情況。

? 如果有多個(gè)命名空間隔離的應(yīng)用，可以保留。

Namespace-by-pod

? 按命名空間查看 Pod 的狀態(tài)和資源。

? 與 k8s-resources-pod 類似，適合用于按命名空間細(xì)化監(jiān)控。

Namespace-by-workload

? 按命名空間查看工作負(fù)載的運(yùn)行狀況。

? 與 k8s-resources-workload-namespace 類似，建議保留。

Prometheus-remote-write

? 如果使用 Prometheus 的遠(yuǎn)程寫(xiě)入（比如 GreptimeDB，我們后面會(huì)用到）功能，該 Dashboard 用于查看遠(yuǎn)程寫(xiě)入狀態(tài)和性能。

Workload-total

? 查看所有工作負(fù)載的總資源使用情況。

? Serverless 集群中關(guān)注工作負(fù)載總量和整體消耗，建議保留。

Prometheus

? 監(jiān)控 Prometheus 的自身狀態(tài)（如查詢性能、存儲(chǔ)使用）。

? 如果使用 Prometheus 作為監(jiān)控后端，建議保留。

不建議保留的 Dashboard

這些 Dashboard 與物理節(jié)點(diǎn)（Node）相關(guān)或在 Serverless 架構(gòu)中不適用，建議刪除：

k8s-resources-node

? 顯示每個(gè)節(jié)點(diǎn)的資源使用情況。

? Serverless 集群沒(méi)有物理節(jié)點(diǎn)，因此沒(méi)有意義。

Node-cluster-rsrc-use

? 監(jiān)控節(jié)點(diǎn)在集群中的資源使用情況。

? 同上，Serverless 集群沒(méi)有物理節(jié)點(diǎn)，建議刪除。

Node-rsrc-use

? 監(jiān)控單個(gè)節(jié)點(diǎn)的資源消耗。

? 同上，無(wú)物理節(jié)點(diǎn)時(shí)無(wú)意義。

Node-aix

? 監(jiān)控運(yùn)行 AIX 系統(tǒng)的節(jié)點(diǎn)。

? 在 Kubernetes 中通常較少使用，Serverless 集群中無(wú)意義。

Nodes-drawin

? 監(jiān)控運(yùn)行 Darwin（macOS）系統(tǒng)的節(jié)點(diǎn)。

? Serverless 集群中不會(huì)使用 macOS 作為節(jié)點(diǎn)，無(wú)意義。

Nodes

? 查看所有節(jié)點(diǎn)的狀態(tài)和資源使用。

? Serverless 集群沒(méi)有節(jié)點(diǎn)相關(guān)的概念，建議刪除。

Persistentvolumesusage

? 查看持久化卷的使用情況。

? Serverless 集群中通常不會(huì)直接使用持久化卷（如 PVC），而是使用外部存儲(chǔ)服務(wù)（如 NAS、OSS），因此可以刪除。

Pod-total

? 聚焦于所有 Pod 的狀態(tài)和資源。

? 如果已經(jīng)保留了 Cluster-total 和 k8s-resources-pod，可以刪除該 Dashboard。

Proxy

? 顯示 Kubernetes 中 kube-proxy 的狀態(tài)。

? Serverless 集群中通常不涉及 kube-proxy，因此可以刪除。

Kubelet

? 監(jiān)控每個(gè)節(jié)點(diǎn)上的 kubelet 狀態(tài)。

? Serverless 集群中沒(méi)有實(shí)際的 kubelet，因此可以刪除。

Scheduler

? 監(jiān)控 Kubernetes 調(diào)度器的性能和任務(wù)分配情況。

? 可以刪除，用不到

部分視需求保留的 Dashboard

這些 Dashboard 可能根據(jù)具體需求決定是否保留：

Controller-manager

? 用于監(jiān)控 Kubernetes 控制器管理器的狀態(tài)。

? Serverless 集群中控制器管理器依然存在，但其重要性可能不高。如果對(duì)控制器管理器的性能和狀態(tài)無(wú)特殊關(guān)注，可刪除。

k8s-resources-cluster

? 查看整個(gè)集群的資源使用情況。

? 如果已經(jīng)保留了 Cluster-total，可以刪除此 Dashboard。

k8s-resources-multicluster

? 監(jiān)控多個(gè)集群的資源使用。

? 如果沒(méi)有跨集群的需求或 Serverless 集群是單一集群，則可以刪除。

Pod-total

? 如果已經(jīng)保留了 Workload-total 和 k8s-resources-pod，此 Dashboard 可以刪除。

最終整理

保留的 Dashboard

? Alertmanager-overview

? Cluster-total

? Grafana-overview

? k8s-resources-namespace

? k8s-resources-pod

? k8s-resources-workload

? k8s-resources-workload-namespace

? Namespace-by-pod

? Namespace-by-workload

? Prometheus-remote-write

? Workload-total

? Prometheus

刪除的 Dashboard

? k8s-resources-node

? APIserver

? Node-cluster-rsrc-use

? Node-rsrc-use

? Node-aix

? Nodes-drawin

? Nodes

? Persistentvolumesusage

? Proxy

? Kubelet

? Scheduler

可選視需求保留

? Controller-manager

? k8s-resources-cluster

? k8s-resources-multicluster

? Pod-total

然后，這邊需要優(yōu)化或者刪掉一些 Dashboard，這里面有很多都用不到，但是在這之前，我們需要熟悉下 Dashboard 的 JSON 格式的配置，這邊隨便找一個(gè)吧，因?yàn)檫@個(gè)也是挺重要的，后面我們還會(huì)涉及到修改 Dashboard 的 JSON 配置。

Grafana Dashboard JSON 解析

這個(gè)就是定義 Grafana Dashboard 的 Config 文件，這里因?yàn)槲野阉郫B了，這樣就比較簡(jiǎn)潔了，不然幾萬(wàn)行……

可以看到類型是 ConfigMapList，解釋下吧： ConfigMapList 是一個(gè)包含多個(gè) ConfigMap 對(duì)象的列表。它通常在需要一次性查看或操作多個(gè) ConfigMap 的場(chǎng)景下使用，比如通過(guò) kubectl 查詢所有 ConfigMap 時(shí)，Kubernetes API 會(huì)返回一個(gè) ConfigMapList 對(duì)象。

注意：如果你使用 kubectl get confgmaplist -A，是不會(huì)有結(jié)果的，因?yàn)?ConfigMapList 僅用作數(shù)據(jù)查詢返回和臨時(shí)存儲(chǔ)，不會(huì)直接定義和應(yīng)用到 Kubernetes 集群中。

圖片

為了方便我們后續(xù)的進(jìn)行，我們必須要熟悉 Grafana Dashboard 的 JSON 文件，因?yàn)楹罄m(xù)需要修改和改進(jìn)，這邊隨便找一個(gè)吧，非常多，大家謹(jǐn)慎觀看 ?? ，沒(méi)事，后面有解析：

{
          "graphTooltip": 1,
          "panels": [
              {
                  "collapsed": false,
                  "gridPos": {
                      "h": 1,
                      "w": 24,
                      "x": 0,
                      "y": 0
                  },
                  "id": 1,
                  "panels": [

                  ],
                  "title": "CPU",
                  "type": "row"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "percentunit"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 0,
                      "y": 1
                  },
                  "id": 2,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_cpu_utilisation:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Utilisation"
                      }
                  ],
                  "title": "CPU Utilisation",
                  "type": "timeseries"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "percentunit"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 12,
                      "y": 1
                  },
                  "id": 3,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_load1_per_cpu:ratio{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Saturation"
                      }
                  ],
                  "title": "CPU Saturation (Load1 per CPU)",
                  "type": "timeseries"
              },
              {
                  "collapsed": false,
                  "gridPos": {
                      "h": 1,
                      "w": 24,
                      "x": 0,
                      "y": 8
                  },
                  "id": 4,
                  "panels": [

                  ],
                  "title": "Memory",
                  "type": "row"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "percentunit"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 0,
                      "y": 9
                  },
                  "id": 5,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_memory_utilisation:ratio{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Utilisation"
                      }
                  ],
                  "title": "Memory Utilisation",
                  "type": "timeseries"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "rds"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 12,
                      "y": 9
                  },
                  "id": 6,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_vmstat_pgmajfault:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Major page Faults"
                      }
                  ],
                  "title": "Memory Saturation (Major Page Faults)",
                  "type": "timeseries"
              },
              {
                  "collapsed": false,
                  "gridPos": {
                      "h": 1,
                      "w": 24,
                      "x": 0,
                      "y": 16
                  },
                  "id": 7,
                  "panels": [

                  ],
                  "title": "Network",
                  "type": "row"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "Bps"
                      },
                      "overrides": [
                          {
                              "matcher": {
                                  "id": "byRegexp",
                                  "options": "/Transmit/"
                              },
                              "properties": [
                                  {
                                      "id": "custom.transform",
                                      "value": "negative-Y"
                                  }
                              ]
                          }
                      ]
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 0,
                      "y": 17
                  },
                  "id": 8,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_network_receive_bytes_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Receive"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_network_transmit_bytes_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Transmit"
                      }
                  ],
                  "title": "Network Utilisation (Bytes Receive/Transmit)",
                  "type": "timeseries"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "Bps"
                      },
                      "overrides": [
                          {
                              "matcher": {
                                  "id": "byRegexp",
                                  "options": "/Transmit/"
                              },
                              "properties": [
                                  {
                                      "id": "custom.transform",
                                      "value": "negative-Y"
                                  }
                              ]
                          }
                      ]
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 12,
                      "y": 17
                  },
                  "id": 9,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_network_receive_drop_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Receive"
                      },
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance:node_network_transmit_drop_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "Transmit"
                      }
                  ],
                  "title": "Network Saturation (Drops Receive/Transmit)",
                  "type": "timeseries"
              },
              {
                  "collapsed": false,
                  "gridPos": {
                      "h": 1,
                      "w": 24,
                      "x": 0,
                      "y": 24
                  },
                  "id": 10,
                  "panels": [

                  ],
                  "title": "Disk IO",
                  "type": "row"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "percentunit"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 0,
                      "y": 25
                  },
                  "id": 11,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance_device:node_disk_io_time_seconds:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "{{device}}"
                      }
                  ],
                  "title": "Disk IO Utilisation",
                  "type": "timeseries"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "percentunit"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 12,
                      "x": 12,
                      "y": 25
                  },
                  "id": 12,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "instance_device:node_disk_io_time_weighted_seconds:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
                          "legendFormat": "{{device}}"
                      }
                  ],
                  "title": "Disk IO Saturation",
                  "type": "timeseries"
              },
              {
                  "collapsed": false,
                  "gridPos": {
                      "h": 1,
                      "w": 24,
                      "x": 0,
                      "y": 34
                  },
                  "id": 13,
                  "panels": [

                  ],
                  "title": "Disk Space",
                  "type": "row"
              },
              {
                  "datasource": {
                      "type": "prometheus",
                      "uid": "${datasource}"
                  },
                  "fieldConfig": {
                      "defaults": {
                          "custom": {
                              "fillOpacity": 100,
                              "showPoints": "never",
                              "stacking": {
                                  "mode": "normal"
                              }
                          },
                          "unit": "percentunit"
                      }
                  },
                  "gridPos": {
                      "h": 7,
                      "w": 24,
                      "x": 0,
                      "y": 35
                  },
                  "id": 14,
                  "options": {
                      "legend": {
                          "showLegend": false
                      },
                      "tooltip": {
                          "mode": "multi",
                          "sort": "desc"
                      }
                  },
                  "pluginVersion": "v11.4.0",
                  "targets": [
                      {
                          "datasource": {
                              "type": "prometheus",
                              "uid": "$datasource"
                          },
                          "expr": "sort_desc(1 -\n  (\n    max without (mountpoint, fstype) (node_filesystem_avail_bytes{job=\"node-exporter\", fstype!=\"\", instance=\"$instance\", cluster=\"$cluster\"})\n    /\n    max without (mountpoint, fstype) (node_filesystem_size_bytes{job=\"node-exporter\", fstype!=\"\", instance=\"$instance\", cluster=\"$cluster\"})\n  ) != 0\n)\n",
                          "legendFormat": "{{device}}"
                      }
                  ],
                  "title": "Disk Space Utilisation",
                  "type": "timeseries"
              }
          ],
          "refresh": "30s",
          "schemaVersion": 39,
          "tags": [
              "node-exporter-mixin"
          ],
          "templating": {
              "list": [
                  {
                      "name": "datasource",
                      "query": "prometheus",
                      "type": "datasource"
                  },
                  {
                      "datasource": {
                          "type": "prometheus",
                          "uid": "${datasource}"
                      },
                      "hide": 2,
                      "includeAll": false,
                      "name": "cluster",
                      "query": "label_values(node_time_seconds, cluster)",
                      "refresh": 2,
                      "sort": 1,
                      "type": "query"
                  },
                  {
                      "datasource": {
                          "type": "prometheus",
                          "uid": "${datasource}"
                      },
                      "name": "instance",
                      "query": "label_values(node_exporter_build_info{job=\"node-exporter\", cluster=\"$cluster\"}, instance)",
                      "refresh": 2,
                      "sort": 1,
                      "type": "query"
                  }
              ]
          },
          "time": {
              "from": "now-1h",
              "to": "now"
          },
          "timezone": "utc",
          "title": "Node Exporter / USE Method / Node",
          "uid": "fac67cfbe174d3ef53eb473d73d9212f"
      }

概覽

這個(gè) JSON 配置定義了一個(gè)名為 "Node Exporter / USE Method / Node" 的 Grafana Dashboard。它包含多個(gè)監(jiān)控面板（Panels），每個(gè)面板展示不同的系統(tǒng)性能指標(biāo)，如 CPU、內(nèi)存、網(wǎng)絡(luò)、磁盤(pán) I/O 和磁盤(pán)空間的使用情況。

主要配置參數(shù)

? graphTooltip: 控制工具提示的顯示方式。1 表示在鼠標(biāo)懸停時(shí)顯示所有數(shù)據(jù)點(diǎn)的詳細(xì)信息。

? refresh: 設(shè)置 Dashboard 的自動(dòng)刷新頻率為每 30 秒。

? schemaVersion: 表示 Grafana Dashboard 的 schema 版本，這里是 39。

? tags: 給 Dashboard 添加標(biāo)簽，這里是 node-exporter-mixin，便于分類和搜索。

? templating: 定義了變量，用于動(dòng)態(tài)選擇數(shù)據(jù)源、集群和實(shí)例。

? time: 默認(rèn)的時(shí)間范圍設(shè)置為過(guò)去 1 小時(shí) (from: "now-1h") 到現(xiàn)在 (to: "now")，時(shí)區(qū)為 UTC。

? title: Dashboard 的標(biāo)題。

? uid: Dashboard 的唯一標(biāo)識(shí)符。

模板變量（Templating Variables）

模板變量允許在 Dashboard 中動(dòng)態(tài)選擇不同的數(shù)據(jù)源、集群和實(shí)例，從而使 Dashboard 更加靈活和可復(fù)用。

定義的變量

? 類型: 數(shù)據(jù)源選擇器。

? 查詢: 固定為 prometheus，用戶可以選擇不同的 Prometheus 數(shù)據(jù)源。

? 數(shù)據(jù)源: 使用 ${datasource} 變量指定的數(shù)據(jù)源。

? 查詢: label_values(node_time_seconds, cluster)，獲取所有集群名稱。

? 隱藏: 類型 2 表示在 UI 中隱藏這個(gè)變量。

? 數(shù)據(jù)源: 使用 ${datasource} 變量指定的數(shù)據(jù)源。

? 查詢: label_values(node_exporter_build_info{job="node-exporter", cluster="$cluster"}, instance)，根據(jù)選定的集群獲取對(duì)應(yīng)的實(shí)例名稱。這些變量在面板的 Prometheus 查詢中以、cluster 和 $instance 的形式被引用，用于動(dòng)態(tài)過(guò)濾數(shù)據(jù)。

面板結(jié)構(gòu)（Panels）

Dashboard 中的面板分為幾個(gè)主要部分，每個(gè)部分通過(guò)一個(gè)折疊行（Row）進(jìn)行組織，下面詳細(xì)解釋每個(gè)部分和其包含的面板。

CPU 監(jiān)控

標(biāo)題行

{
    "collapsed": false,
    "gridPos": { "h": 1, "w": 24, "x": 0, "y": 0 },
    "id": 1,
    "panels": [],
    "title": "CPU",
    "type": "row"
}

? 作用: 作為 CPU 監(jiān)控面板的標(biāo)題，便于視覺(jué)上的分組。

? 屬性:

a.collapsed: false 表示該行是展開(kāi)的。

b.gridPos: 定義面板在網(wǎng)格中的位置和大小。

{
    "datasource": { "type": "prometheus", "uid": "
${datasource}" },
    "fieldConfig": { ... },
    "gridPos": { "h": 7, "w": 12, "x": 0, "y": 1 },
    "id": 2,
    "options": { ... },
    "pluginVersion": "v11.4.0",
    "targets": [
        {
            "expr": "instance:node_cpu_utilisation:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
            "legendFormat": "Utilisation"
        }
    ],
    "title": "CPU Utilisation",
    "type": "timeseries"
}

? 作用: 展示 CPU 利用率的時(shí)間序列圖。

? 主要配置:

a.datasource: 使用定義的 Prometheus 數(shù)據(jù)源。

b.expr: Prometheus 查詢語(yǔ)句，用于計(jì)算 CPU 利用率的 5 分鐘平均速率。

? 查詢解釋:

a.instance:node_cpu_utilisation:rate5m: 自定義的 Prometheus 指標(biāo)，表示每個(gè)實(shí)例的 CPU 利用率。

b.{job="node-exporter", instance="cluster"}: 過(guò)濾條件，根據(jù)選擇的實(shí)例和集群。

c.!= 0: 過(guò)濾掉值為 0 的數(shù)據(jù)點(diǎn)。

? legendFormat: 圖例格式，這里顯示為 "Utilisation"。

? fieldConfig: 配置字段的顯示方式，包括填充透明度、是否顯示數(shù)據(jù)點(diǎn)、堆疊模式和單位（百分比）。

? options: 配置圖例顯示和工具提示模式。

? type: 圖表類型為 timeseries。

{
    "expr": "instance:node_load1_per_cpu:ratio{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "Saturation"
    ...
    "title": "CPU Saturation (Load1 per CPU",
    "type": "timeseries"
}

? 作用: 展示每個(gè) CPU 的 1 分鐘負(fù)載比例，用于評(píng)估 CPU 的飽和度。

? 主要配置:

a.expr: 查詢每個(gè)實(shí)例每個(gè) CPU 的 1 分鐘負(fù)載比率。

b.legendFormat: 圖例顯示為 "Saturation"。

Memory 監(jiān)控

標(biāo)題行

{
    "collapsed": false,
    "gridPos": { "h": 1, "w": 24, "x": 0, "y": 8 },
    "id": 4,
    "panels": [],
    "title": "Memory",
    "type": "row"
}

? 作用: 作為內(nèi)存監(jiān)控面板的標(biāo)題。

{
    "expr": "instance:node_memory_utilisation:ratio{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "Utilisation",
    ...
    "title": "Memory Utilisation",
    "type": "timeseries"
}

? 作用: 展示內(nèi)存利用率的時(shí)間序列圖。

? 主要配置:

a.expr: 查詢內(nèi)存利用率比率。

b.legendFormat: 圖例顯示為 "Utilisation"。

{
    "expr": "instance:node_vmstat_pgmajfault:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "Major page Faults",
    ...
    "title": "Memory Saturation (Major Page Faults)",
    "type": "timeseries"
}

? 作用: 監(jiān)控主頁(yè)面錯(cuò)誤率，反映內(nèi)存飽和度。

? 主要配置:

a.expr: 查詢每個(gè)實(shí)例的主頁(yè)面錯(cuò)誤速率。

b. legendFormat: 圖例顯示為 "Major page Faults"。

Network 監(jiān)控

標(biāo)題行

{
    "collapsed": false,
    "gridPos": { "h": 1, "w": 24, "x": 0, "y": 16 },
    "id": 7,
    "panels": [],
    "title": "Network",
    "type": "row"
}

? 作用: 作為網(wǎng)絡(luò)監(jiān)控面板的標(biāo)題。

{
    "expr": "instance:node_network_receive_bytes_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "Receive",
    {
        "expr": "instance:node_network_transmit_bytes_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
        "legendFormat": "Transmit"
    },
    ...
    "title": "Network Utilisation (Bytes Receive/Transmit)",
    "type": "timeseries",
    "fieldConfig": {
        "overrides": [
            {
                "matcher": { "id": "byRegexp", "options": "/Transmit/" },
                "properties": [
                    { "id": "custom.transform", "value": "negative-Y" }
                ]
            }
        ]
    }
}

? 作用: 展示網(wǎng)絡(luò)接收和發(fā)送字節(jié)數(shù)的時(shí)間序列圖。

? 主要配置:

a.expr: 兩個(gè)查詢分別獲取接收（Receive）和發(fā)送（Transmit）的字節(jié)速率。

b.legendFormat: 分別顯示為 "Receive" 和 "Transmit"。

c.fieldConfig.overrides: 將 "Transmit" 數(shù)據(jù)轉(zhuǎn)換為負(fù)值（negative-Y），以便在圖表中與接收數(shù)據(jù)對(duì)稱顯示。

{
    "expr": "instance:node_network_receive_drop_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "Receive",
    {
        "expr": "instance:node_network_transmit_drop_excluding_lo:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
        "legendFormat": "Transmit"
    },
    ...
    "title": "Network Saturation (Drops Receive/Transmit)",
    "type": "timeseries",
    "fieldConfig": {
        "overrides": [
            {
                "matcher": { "id": "byRegexp", "options": "/Transmit/" },
                "properties": [
                    { "id": "custom.transform", "value": "negative-Y" }
                ]
            }
        ]
    }
}

? 作用: 監(jiān)控網(wǎng)絡(luò)接收和發(fā)送丟包數(shù)的時(shí)間序列圖。

? 主要配置:

a.expr: 兩個(gè)查詢分別獲取接收和發(fā)送的丟包速率。

b.legendFormat: 分別顯示為 "Receive" 和 "Transmit"。

c.fieldConfig.overrides: 同樣將 "Transmit" 數(shù)據(jù)轉(zhuǎn)換為負(fù)值，以便與接收數(shù)據(jù)對(duì)稱顯示。

Disk IO 監(jiān)控

標(biāo)題行

{
    "collapsed": false,
    "gridPos": { "h": 1, "w": 24, "x": 0, "y": 24 },
    "id": 10,
    "panels": [],
    "title": "Disk IO",
    "type": "row"
}

? 作用: 作為磁盤(pán) I/O 監(jiān)控面板的標(biāo)題。

{
    "expr": "instance_device:node_disk_io_time_seconds:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "{{device}}",
    ...
    "title": "Disk IO Utilisation",
    "type": "timeseries"
}

? 作用: 展示磁盤(pán) I/O 時(shí)間利用率的時(shí)間序列圖。

? 主要配置:

a.expr: 查詢每個(gè)設(shè)備的 I/O 時(shí)間速率。

b.legendFormat: 使用設(shè)備名稱（{{device}}）作為圖例。

{
    "expr": "instance_device:node_disk_io_time_weighted_seconds:rate5m{job=\"node-exporter\", instance=\"$instance\", cluster=\"$cluster\"} != 0",
    "legendFormat": "{{device}}",
    ...
    "title": "Disk IO Saturation",
    "type": "timeseries"
}

? 作用: 監(jiān)控加權(quán)的磁盤(pán) I/O 時(shí)間，反映 I/O 飽和度。

? 主要配置:

a.expr: 查詢每個(gè)設(shè)備的加權(quán) I/O 時(shí)間速率。

b. legendFormat: 使用設(shè)備名稱作為圖例。

Disk Space 監(jiān)控

標(biāo)題行

{
    "collapsed": false,
    "gridPos": { "h": 1, "w": 24, "x": 0, "y": 34 },
    "id": 13,
    "panels": [],
    "title": "Disk Space",
    "type": "row"
}

? 作用: 作為磁盤(pán)空間監(jiān)控面板的標(biāo)題。

{
    "expr": "sort_desc(1 -\n  (\n    max without (mountpoint, fstype) (node_filesystem_avail_bytes{job=\"node-exporter\", fstype!=\"\", instance=\"$instance\", cluster=\"$cluster\"})\n    /\n    max without (mountpoint, fstype) (node_filesystem_size_bytes{job=\"node-exporter\", fstype!=\"\", instance=\"$instance\", cluster=\"$cluster\"})\n  ) != 0\n)\n",
    "legendFormat": "{{device}}",
    ...
    "title": "Disk Space Utilisation",
    "type": "timeseries"
}

? 作用: 展示磁盤(pán)空間利用率的時(shí)間序列圖。

? 主要配置:

expr: 復(fù)雜的 Prometheus 查詢，用于計(jì)算磁盤(pán)空間的使用率。

1）查詢解釋:

? node_filesystem_avail_bytes: 可用磁盤(pán)空間字節(jié)數(shù)。

? node_filesystem_size_bytes: 磁盤(pán)總空間字節(jié)數(shù)。

? 計(jì)算方法: 1 - (可用空間 / 總空間)，即已用空間比例。

? sort_desc: 將結(jié)果按降序排序。

? != 0: 過(guò)濾掉值為 0 的數(shù)據(jù)點(diǎn)。

2） legendFormat: 使用設(shè)備名稱作為圖例。

面板配置詳解

每個(gè)面板的配置結(jié)構(gòu)大致相同，以下是各主要配置項(xiàng)的解釋：

datasource

? 描述: 定義該面板使用的數(shù)據(jù)源，這里統(tǒng)一使用模板變量 ${datasource} 指定的 Prometheus 數(shù)據(jù)源。

? 格式:

"datasource": {
    "type": "prometheus",
    "uid": "${datasource}"
}

fieldConfig

? 描述: 配置字段的顯示屬性，包括默認(rèn)設(shè)置和自定義覆蓋。

? 主要配置:

1）defaults: 默認(rèn)字段配置。

? custom.fillOpacity: 填充透明度，值為 100 表示完全不透明。

? custom.showPoints: 是否顯示數(shù)據(jù)點(diǎn)，這里設(shè)置為 "never"，即不顯示。

? custom.stacking.mode: 堆疊模式，這里設(shè)置為 "normal"，表示正常堆疊。

? unit: 數(shù)據(jù)的單位，如 percentunit（百分比）、Bps（字節(jié)每秒）等。

2） overrides: 允許對(duì)特定條件下的字段進(jìn)行覆蓋配置。例如，將 "Transmit" 數(shù)據(jù)轉(zhuǎn)換為負(fù)值。

gridPos

? 描述: 定義面板在 Dashboard 網(wǎng)格中的位置和大小。

? 屬性:

a.h: 高度（單位為網(wǎng)格行數(shù)）。

b.w: 寬度（單位為網(wǎng)格列數(shù)）。

c.x: 水平起始位置（網(wǎng)格列索引）。

d.y: 垂直起始位置（網(wǎng)格行索引）。

targets

? 描述: 定義數(shù)據(jù)查詢的目標(biāo)，這里主要是 Prometheus 查詢。

? 屬性:

a.expr: Prometheus 查詢表達(dá)式。

b.legendFormat: 圖例格式，用于標(biāo)識(shí)不同數(shù)據(jù)系列。

options

? 描述: 定義圖表的顯示選項(xiàng)。

? 主要配置:

1）legend.showLegend: 是否顯示圖例，這里設(shè)置為 false，即不顯示。

2） tooltip: 工具提示的顯示模式。

a. mode: "multi" 表示顯示多個(gè)數(shù)據(jù)系列的工具提示。

b.sort: "desc" 表示按降序排序數(shù)據(jù)。

type

? 描述: 定義圖表的類型，這里主要使用 timeseries，表示時(shí)間序列圖。

看完之后，為了加深印象，建議多看幾個(gè)，大體都是相同的，只不過(guò)有一些參數(shù)會(huì)不一樣，有什么不懂的，就直接問(wèn) AI ，很方便，要善用工具，不然你會(huì)被淘汰。

如果后面需要定制化，那你也可以得心應(yīng)手。

擴(kuò)展

我這里還需要再增加額外的 Dashboard，使用 JSON 格式的文件，我就直接在 YAML 文件里面定義了。

ArgoCD

ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: argocd-servicemonitor
  namespace: monitoring
  labels:
    app.kubernetes.io/name: argocd
    app.kubernetes.io/part-of: argocd
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: argocd-server  # 需要匹配 ArgoCD 服務(wù)的標(biāo)簽
  namespaceSelector:
    matchNames:
      - argocd  # ArgoCD 所在的命名空間
  endpoints:
    - port: metrics  # Prometheus 監(jiān)控的端口
      path: /metrics      # 監(jiān)控端點(diǎn)路徑
      interval: 30s       # 采樣間隔
      scrapeTimeout: 10s  # 超時(shí)時(shí)間
      tlsConfig:          # 如果需要 TLS 加密，啟用以下配置
        insecureSkipVerify: true

一般而言：

? ServiceMonitor 用于監(jiān)控對(duì)應(yīng) Service 背后的 Pod 的 Metrics，比較適合被監(jiān)控 Pod 有一致的 Service 的場(chǎng)景；

? PodMonitor 用于監(jiān)控對(duì)應(yīng) Labels 下背后 Pod 的 Metrics，比較適合被監(jiān)控 Pod 沒(méi)有 Service 且多個(gè) Pod 部署規(guī)則并不統(tǒng)一的場(chǎng)景；

Dashboard JSON 文件

這個(gè)太大了，我就不展示了，大家需要的話，可以到這個(gè) 地址^[1]。

CoreDNS

ServiceMonitor

Prometheus-Operator 自帶，所以這邊就不用做什么了，只需要關(guān)注 Dashboard 的配置了。

Dashboard JSON 文件

這個(gè)太大了，我就不展示了，大家需要的話，可以到這個(gè) 地址^[2]。

修改配置文件

我們需要配置我們的 ConfigMap 然后還有我們的控制器文件，主要是需要把我們新添加的 Dasboard 掛載到 Grafana 里面，我這里演示一個(gè)，后續(xù)需要更多，都可以照著這個(gè)做：

圖片

可以看到我另外添加了兩個(gè)，這個(gè)時(shí)候我們就需要把它掛載到 Grafana 里面了，還有它默認(rèn)是 Deployment，我這里需要持久化，所以就修改成了 StatefulSet。

apiVersion: apps/v1
kind: StatefulSet
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 11.4.0
  name: grafana
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: grafana
      app.kubernetes.io/name: grafana
      app.kubernetes.io/part-of: kube-prometheus
  template:
    metadata:
      annotations:
        checksum/grafana-config: cb0d6303ddbb694464bde843b0fe874c
        checksum/grafana-dashboardproviders: ca302ceedc58d72663436a77e5e0ea29
        checksum/grafana-datasources: b748e773cdfff19dcfe874d29600675b
      labels:
        app.kubernetes.io/component: grafana
        app.kubernetes.io/name: grafana
        app.kubernetes.io/part-of: kube-prometheus
        app.kubernetes.io/version: 11.4.0
    spec:
      automountServiceAccountToken: false
      containers:
      - env: []
        image: grafana/grafana:11.4.0
        name: grafana
        ports:
        - containerPort: 3000
          name: http
        readinessProbe:
          httpGet:
            path: /api/health
            port: http
        resources:
          limits:
            cpu: 200m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 100Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop:
            - ALL
          readOnlyRootFilesystem: true
          seccompProfile:
            type: RuntimeDefault
        volumeMounts:
        - mountPath: /var/lib/grafana
          name: grafana-storage
          readOnly: false
        - mountPath: /etc/grafana/provisioning/datasources
          name: grafana-datasources
          readOnly: false
        - mountPath: /etc/grafana/provisioning/dashboards
          name: grafana-dashboards
          readOnly: false
        - mountPath: /tmp
          name: tmp-plugins
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/cluster-total
          name: grafana-dashboard-cluster-total
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/grafana-overview
          name: grafana-dashboard-grafana-overview
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-cluster
          name: grafana-dashboard-k8s-resources-cluster
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-namespace
          name: grafana-dashboard-k8s-resources-namespace
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-node
          name: grafana-dashboard-k8s-resources-node
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-pod
          name: grafana-dashboard-k8s-resources-pod
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workload
          name: grafana-dashboard-k8s-resources-workload
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workloads-namespace
          name: grafana-dashboard-k8s-resources-workloads-namespace
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/namespace-by-pod
          name: grafana-dashboard-namespace-by-pod
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/namespace-by-workload
          name: grafana-dashboard-namespace-by-workload
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/prometheus-remote-write
          name: grafana-dashboard-prometheus-remote-write
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/workload-total
          name: grafana-dashboard-workload-total
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/prometheus
          name: grafana-dashboard-prometheus
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/argocd     # 我們這里需要掛載上去
          name: grafana-dashboard-argocd
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/coredns
          name: grafana-dashboard-coredns
          readOnly: false
        - mountPath: /etc/grafana
          name: grafana-config
          readOnly: false
      nodeSelector:
        kubernetes.io/os: linux
      securityContext:
        fsGroup: 65534
        runAsGroup: 65534
        runAsNonRoot: true
        runAsUser: 65534
      serviceAccountName: grafana
      volumes:
      - name: grafana-datasources
        secret:
          secretName: grafana-datasources
      - configMap:
          name: grafana-dashboards
        name: grafana-dashboards
      - emptyDir:
          medium: Memory
        name: tmp-plugins
      - configMap:
          name: grafana-dashboard-cluster-total
        name: grafana-dashboard-cluster-total
      - configMap:
          name: grafana-dashboard-grafana-overview
        name: grafana-dashboard-grafana-overview
      - configMap:
          name: grafana-dashboard-k8s-resources-cluster
        name: grafana-dashboard-k8s-resources-cluster
      - configMap:
          name: grafana-dashboard-k8s-resources-namespace
        name: grafana-dashboard-k8s-resources-namespace
      - configMap:
          name: grafana-dashboard-k8s-resources-node
        name: grafana-dashboard-k8s-resources-node
      - configMap:
          name: grafana-dashboard-k8s-resources-pod
        name: grafana-dashboard-k8s-resources-pod
      - configMap:
          name: grafana-dashboard-k8s-resources-workload
        name: grafana-dashboard-k8s-resources-workload
      - configMap:
          name: grafana-dashboard-k8s-resources-workloads-namespace
        name: grafana-dashboard-k8s-resources-workloads-namespace
      - configMap:
          name: grafana-dashboard-namespace-by-pod
        name: grafana-dashboard-namespace-by-pod
      - configMap:
          name: grafana-dashboard-prometheus-remote-write
        name: grafana-dashboard-prometheus-remote-write
      - configMap:
          name: grafana-dashboard-namespace-by-workload
        name: grafana-dashboard-namespace-by-workload
      - configMap:
          name: grafana-dashboard-workload-total
        name: grafana-dashboard-workload-total
      - configMap:
          name: grafana-dashboard-alertmanager-overview
        name: grafana-dashboard-alertmanager-overview
      - configMap:
          name: grafana-dashboard-prometheus
        name: grafana-dashboard-prometheus
      - configMap:                    # 以下是我們新添加的，我們這里定義好，上面就可以掛載上去
          name: grafana-dashboard-argocd
        name: grafana-dashboard-argocd
      - configMap:
          name: grafana-dashboard-coredns
        name: grafana-dashboard-coredns
      - name: grafana-config
        secret:
          secretName: grafana-config

然后我們這里需要持久化數(shù)據(jù)：

volumeClaimTemplates:
    - metadata:
        name: grafana-storage
      spec:
        storageClassName: alicloud-nas-subpath
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 15Gi

我這里還需要配置下 Grafana 的 Config 文件，主要是做一些優(yōu)化，把初始化密碼定義下：

apiVersion: v1
kind: Secret
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 11.4.0
  name: grafana-config
  namespace: monitoring
stringData:
  grafana.ini: |
    [date_formats]
    default_timezone = UTC

    [security]
    admin_user = admin
    admin_password = j019e99392129
type: Opaque

然后考慮到我們后續(xù)還需要實(shí)現(xiàn)相應(yīng)的 Grafana Reporter 自動(dòng)化 PDF 報(bào)告生成，所以這邊就直接優(yōu)化了：

[rendering]
    concurrent_render_request_limit = 70

結(jié)語(yǔ)

后續(xù)有些細(xì)節(jié)還需要再優(yōu)化下，比如 Dashboard 的展示數(shù)據(jù)有問(wèn)題，就需要我們就行修改和優(yōu)化。

我們的 Grafana 之路到此為止就算結(jié)束了。

但是這才是剛剛開(kāi)始，一個(gè)偉大的開(kāi)始。

引用鏈接

[1] 地址: https://grafana.com/grafana/dashboards/14584-argocd/[2] 地址: https://grafana.com/grafana/dashboards/14981-coredns/

責(zé)任編輯：武曉燕來(lái)源：云原生運(yùn)維圈

Grafana 定義可視化

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<blockquote id="uspj6"><i id="uspj6"></i></blockquote>

<wbr id="uspj6"><track id="uspj6"></track></wbr>

<cite id="uspj6"></cite>

<optgroup id="uspj6"><fieldset id="uspj6"></fieldset></optgroup>

<cite id="uspj6"></cite>