自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<tfoot id="it0wj"></tfoot>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

使用 Node Exporter 監(jiān)控 Linux 主機(jī)之一

作者：陽明 2021-10-25 07:57:45

運(yùn)維系統(tǒng)運(yùn)維

Node Exporter 是用于暴露 *NIX 主機(jī)指標(biāo)的 Exporter，比如采集 CPU、內(nèi)存、磁盤等信息。采用 Go 編寫，不存在任何第三方依賴，所以只需要下載解壓即可運(yùn)行。

Node Exporter 是用于暴露 *NIX 主機(jī)指標(biāo)的 Exporter，比如采集 CPU、內(nèi)存、磁盤等信息。采用 Go 編寫，不存在任何第三方依賴，所以只需要下載解壓即可運(yùn)行。

安裝配置

由于 Node Exporter 是一個(gè)獨(dú)立的二進(jìn)制文件，可以直接從 Prometheus 下載頁面(https://prometheus.io/download/#node_exporter) 下載解壓運(yùn)行：

☸ ➜ wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz 
# 國內(nèi)加速可以使用下面的命令下載 
# wget https://download.fastgit.org/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz 
☸ ➜ tar -xvf node_exporter-1.2.2.linux-amd64.tar.gz 
node_exporter-1.2.2.linux-amd64/ 
node_exporter-1.2.2.linux-amd64/LICENSE 
node_exporter-1.2.2.linux-amd64/NOTICE 
node_exporter-1.2.2.linux-amd64/node_exporter 
☸ ➜ cd node_exporter-1.2.2.linux-amd64 && ls -la 
total 18084 
drwxr-xr-x  2 3434 3434       56 Aug  6 21:50 . 
dr-xr-x---. 5 root root     4096 Oct 14 11:50 .. 
-rw-r--r--  1 3434 3434    11357 Aug  6 21:49 LICENSE 
-rwxr-xr-x  1 3434 3434 18494215 Aug  6 21:45 node_exporter 
-rw-r--r--  1 3434 3434      463 Aug  6 21:49 NOTICE

直接執(zhí)行 node_exporter 文件即可運(yùn)行：

☸ ➜ ./node_exporter 
level=info ts=2021-10-14T03:52:31.947Z caller=node_exporter.go:182 msg="Starting node_exporter" version="(version=1.2.2, branch=HEAD, revision=26645363b486e12be40af7ce4fc91e731a33104e)" 
level=info ts=2021-10-14T03:52:31.947Z caller=node_exporter.go:183 msg="Build context" build_context="(go=go1.16.7, user=root@b9cb4aa2eb17, date=20210806-13:44:18)" 
...... 
level=info ts=2021-10-14T03:52:31.948Z caller=node_exporter.go:199 msg="Listening on" address=:9100 
level=info ts=2021-10-14T03:52:31.948Z caller=tls_config.go:191 msg="TLS is disabled." http2=false

從日志上可以看出 node_exporter 監(jiān)聽在 9100 端口上，默認(rèn)的 metrics 接口通過 /metrics 端點(diǎn)暴露，我們可以通過訪問 http://localhost:9100/metrics 來獲取監(jiān)控指標(biāo)數(shù)據(jù)：

☸ ➜ curl http://localhost:9100/metrics 
...... 
# HELP node_load1 1m load average. 
# TYPE node_load1 gauge 
node_load1 0.01 
# HELP node_load15 15m load average. 
# TYPE node_load15 gauge 
node_load15 0.05 
# HELP node_load5 5m load average. 
# TYPE node_load5 gauge 
node_load5 0.04 
# HELP node_memory_Active_anon_bytes Memory information field Active_anon_bytes. 
# TYPE node_memory_Active_anon_bytes gauge 
node_memory_Active_anon_bytes 8.4393984e+07 
# HELP node_memory_Active_bytes Memory information field Active_bytes. 
# TYPE node_memory_Active_bytes gauge 
node_memory_Active_bytes 1.8167808e+08 
# HELP node_memory_Active_file_bytes Memory information field Active_file_bytes. 
# TYPE node_memory_Active_file_bytes gauge 
node_memory_Active_file_bytes 9.7284096e+07 
# HELP node_memory_AnonHugePages_bytes Memory information field AnonHugePages_bytes. 
# TYPE node_memory_AnonHugePages_bytes gauge 
node_memory_AnonHugePages_bytes 3.5651584e+07 
# HELP node_memory_AnonPages_bytes Memory information field AnonPages_bytes. 
# TYPE node_memory_AnonPages_bytes gauge 
node_memory_AnonPages_bytes 8.159232e+07 
# HELP node_memory_Bounce_bytes Memory information field Bounce_bytes. 
# TYPE node_memory_Bounce_bytes gauge 
node_memory_Bounce_bytes 0 
......

該 metrics 接口數(shù)據(jù)就是一個(gè)標(biāo)準(zhǔn)的 Prometheus 監(jiān)控指標(biāo)格式，我們只需要將該端點(diǎn)配置到 Prometheus 中即可抓取該指標(biāo)數(shù)據(jù)。為了了解 node_exporter 可配置的參數(shù)，我們可以使用 ./node_exporter -h 來查看幫助信息：

☸ ➜ ./node_exporter -h 
    --web.listen-address=":9100"  # 監(jiān)聽的端口，默認(rèn)是9100 
    --web.telemetry-path="/metrics"  # metrics的路徑，默認(rèn)為/metrics 
    --web.disable-exporter-metrics  # 是否禁用go、prome默認(rèn)的metrics 
    --web.max-requests=40     # 最大并行請(qǐng)求數(shù)，默認(rèn)40，設(shè)置為0時(shí)不限制 
    --log.level="info"        # 日志等級(jí): [debug, info, warn, error, fatal] 
    --log.format=logfmt     # 置日志打印target和格式: [logfmt, json] 
    --version                 # 版本號(hào) 
    --collector.{metric-name} # 各個(gè)metric對(duì)應(yīng)的參數(shù) 
    ......

其中最重要的參數(shù)就是 --collector.，通過該參數(shù)可以啟用我們收集的功能模塊，node_exporter 會(huì)默認(rèn)采集一些模塊，要禁用這些默認(rèn)啟用的收集器可以通過 --no-collector. 標(biāo)志來禁用，如果只啟用某些特定的收集器，基于先使用 --collector.disable-defaults 標(biāo)志禁用所有默認(rèn)的，然后在通過指定具體的收集器 --collector. 來進(jìn)行啟用。下圖列出了默認(rèn)啟用的收集器：

一般來說為了方便管理我們可以使用 docker 容器來運(yùn)行 node_exporter，但是需要注意的是由于采集的是宿主機(jī)的指標(biāo)信息，所以需要訪問主機(jī)系統(tǒng)，如果使用 docker 容器來部署的話需要添加一些額外的參數(shù)來允許 node_exporter 訪問宿主機(jī)的命名空間，如果直接在宿主機(jī)上運(yùn)行的，我們可以用 systemd 來管理，創(chuàng)建一個(gè)如下所示的 service unit 文件：

☸ ➜ cat /etc/systemd/system/node_exporter.service 
[Unit] 
Description=node exporter service 
Documentation=https://prometheus.io 
After=network.target 
 
[Service] 
Type=simple 
User=root 
Group=root 
ExecStart=/usr/local/bin/node_exporter  # 有特殊需求的可以在后面指定參數(shù)配置 
Restart=on-failure 
 
[Install] 
WantedBy=multi-user.target

然后就可以使用 systemd 來管理 node_exporter 了：

☸ ➜ cp node_exporter /usr/local/bin/node_exporter 
☸ ➜ systemctl daemon-reload 
☸ ➜ systemctl start node_exporter 
☸ ➜ systemctl status node_exporter 
● node_exporter.service - node exporter servoce 
   Loaded: loaded (/etc/systemd/system/node_exporter.service; disabled; vendor preset: disabled) 
   Active: active (running) since Thu 2021-10-14 15:29:46 CST; 5s ago 
     Docs: https://prometheus.io 
 Main PID: 18679 (node_exporter) 
    Tasks: 5 
   Memory: 6.5M 
   CGroup: /system.slice/node_exporter.service 
           └─18679 /usr/local/bin/node_exporter 
 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:..._zone 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...=time 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...timex 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...ueues 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...uname 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...mstat 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...r=xfs 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...r=zfs 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=node_exporter.go:...:9100 
Oct 14 15:29:46 node1 node_exporter[18679]: level=info ts=2021-10-14T07:29:46.137Z caller=tls_config.go:191...false 
Hint: Some lines were ellipsized, use -l to show in full.

這里我們用 systemd 的方式在兩個(gè)節(jié)點(diǎn)上(node1、node2)分別啟動(dòng) node_exporter，啟動(dòng)完成后我們使用靜態(tài)配置的方式在之前的 Prometheus 配置中新增一個(gè) node_exporter 的抓取任務(wù)，來采集這兩個(gè)節(jié)點(diǎn)的監(jiān)控指標(biāo)數(shù)據(jù)，配置文件如下所示：

global: 
  scrape_interval: 5s 
 
scrape_configs: 
  - job_name: "prometheus" 
    static_configs: 
      - targets: ["localhost:9090"] 
  - job_name: "demo" 
    scrape_interval: 15s # 會(huì)覆蓋global全局的配置 
    scrape_timeout: 10s 
    static_configs: 
      - targets: ["localhost:10000", "localhost:10001", "localhost:10002"] 
  - job_name: "node_exporter" # 新增 node_exporter 任務(wù) 
    static_configs: 
      - targets: ["node1:9100", "node2:9100"] # node1、node2 在 hosts 中做了映射

上面配置文件最后我們新增了一個(gè)名為 node_exporter 的抓取任務(wù)，采集的目標(biāo)使用靜態(tài)配置的方式進(jìn)行配置，然后重新加載 Prometheus，正常在 Prometheus 的 WebUI 的目標(biāo)頁面就可以看到上面配置的 node_exporter 任務(wù)了。

接下來我們來了解一些關(guān)于節(jié)點(diǎn)監(jiān)控的常用指標(biāo)，比如 CPU、內(nèi)存、IO 監(jiān)控等。

CPU 監(jiān)控

對(duì)于節(jié)點(diǎn)我們首先能想到的就是要先對(duì) CPU 進(jìn)行監(jiān)控，因?yàn)?CPU 是處理任務(wù)的核心，根據(jù) CPU 的狀態(tài)可以分析出當(dāng)前系統(tǒng)的健康狀態(tài)。要對(duì)節(jié)點(diǎn)進(jìn)行 CPU 監(jiān)控，需要用到 node_cpu_seconds_total 這個(gè)監(jiān)控指標(biāo)，在 metrics 接口中該指標(biāo)內(nèi)容如下所示：

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode. 
# TYPE node_cpu_seconds_total counter 
node_cpu_seconds_total{cpu="0",mode="idle"} 13172.76 
node_cpu_seconds_total{cpu="0",mode="iowait"} 0.25 
node_cpu_seconds_total{cpu="0",mode="irq"} 0 
node_cpu_seconds_total{cpu="0",mode="nice"} 0.01 
node_cpu_seconds_total{cpu="0",mode="softirq"} 87.99 
node_cpu_seconds_total{cpu="0",mode="steal"} 0 
node_cpu_seconds_total{cpu="0",mode="system"} 309.38 
node_cpu_seconds_total{cpu="0",mode="user"} 79.93 
node_cpu_seconds_total{cpu="1",mode="idle"} 13168.98 
node_cpu_seconds_total{cpu="1",mode="iowait"} 0.27 
node_cpu_seconds_total{cpu="1",mode="irq"} 0 
node_cpu_seconds_total{cpu="1",mode="nice"} 0 
node_cpu_seconds_total{cpu="1",mode="softirq"} 74.1 
node_cpu_seconds_total{cpu="1",mode="steal"} 0 
node_cpu_seconds_total{cpu="1",mode="system"} 314.71 
node_cpu_seconds_total{cpu="1",mode="user"} 78.83 
node_cpu_seconds_total{cpu="2",mode="idle"} 13182.78 
node_cpu_seconds_total{cpu="2",mode="iowait"} 0.69 
node_cpu_seconds_total{cpu="2",mode="irq"} 0 
node_cpu_seconds_total{cpu="2",mode="nice"} 0 
node_cpu_seconds_total{cpu="2",mode="softirq"} 66.01 
node_cpu_seconds_total{cpu="2",mode="steal"} 0 
node_cpu_seconds_total{cpu="2",mode="system"} 309.09 
node_cpu_seconds_total{cpu="2",mode="user"} 79.44 
node_cpu_seconds_total{cpu="3",mode="idle"} 13185.13 
node_cpu_seconds_total{cpu="3",mode="iowait"} 0.18 
node_cpu_seconds_total{cpu="3",mode="irq"} 0 
node_cpu_seconds_total{cpu="3",mode="nice"} 0 
node_cpu_seconds_total{cpu="3",mode="softirq"} 64.49 
node_cpu_seconds_total{cpu="3",mode="steal"} 0 
node_cpu_seconds_total{cpu="3",mode="system"} 305.86 
node_cpu_seconds_total{cpu="3",mode="user"} 78.17

從接口中描述可以看出該指標(biāo)是用來統(tǒng)計(jì) CPU 每種模式下所花費(fèi)的時(shí)間，是一個(gè) Counter 類型的指標(biāo)，也就是會(huì)一直增長，這個(gè)數(shù)值其實(shí)是 CPU 時(shí)間片的一個(gè)累積值，意思就是從操作系統(tǒng)啟動(dòng)起來 CPU 開始工作，就開始記錄自己總共使用的時(shí)間，然后保存下來，而且這里的累積的 CPU 使用時(shí)間還會(huì)分成幾個(gè)不同的模式，比如用戶態(tài)使用時(shí)間、空閑時(shí)間、中斷時(shí)間、內(nèi)核態(tài)使用時(shí)間等等，也就是平時(shí)我們使用 top 命令查看的 CPU 的相關(guān)信息，而我們這里的這個(gè)指標(biāo)會(huì)分別對(duì)這些模式進(jìn)行記錄。

接下來我們來對(duì)節(jié)點(diǎn)的 CPU 進(jìn)行監(jiān)控，我們也知道一個(gè)一直增長的 CPU 時(shí)間對(duì)我們意義不大，一般我們更希望監(jiān)控的是節(jié)點(diǎn)的 CPU 使用率，也就是我們使用 top 命令看到的百分比。

要計(jì)算 CPU 的使用率，那么就需要搞清楚這個(gè)使用率的含義，CPU 使用率是 CPU 除空閑(idle)狀態(tài)之外的其他所有 CPU 狀態(tài)的時(shí)間總和除以總的 CPU 時(shí)間得到的結(jié)果，理解了這個(gè)概念后就可以寫出正確的 promql 查詢語句了。

要計(jì)算除空閑狀態(tài)之外的 CPU 時(shí)間總和，更好的方式是不是直接計(jì)算空閑狀態(tài)的 CPU 時(shí)間使用率，然后用 1 減掉就是我們想要的結(jié)果了，所以首先我們先過濾 idle 模式的指標(biāo)，在 Prometheus 的 WebUI 中輸入 node_cpu_seconds_total{mode="idle"} 進(jìn)行過濾：

要計(jì)算使用率，肯定就需要知道 idle 模式的 CPU 用了多長時(shí)間，然后和總的進(jìn)行對(duì)比，由于這是 Counter 指標(biāo)，我們可以用 increase 函數(shù)來獲取變化，使用查詢語句 increase(node_cpu_seconds_total{mode="idle"}[1m])，因?yàn)?increase 函數(shù)要求輸入一個(gè)區(qū)間向量，所以這里我們?nèi)?1 分鐘內(nèi)的數(shù)據(jù)：

我們可以看到查詢結(jié)果中有很多不同 cpu 序號(hào)的數(shù)據(jù)，我們當(dāng)然需要計(jì)算所有 CPU 的時(shí)間，所以我們將它們聚合起來，我們要查詢的是不同節(jié)點(diǎn)的 CPU 使用率，所以就需要根據(jù) instance 標(biāo)簽進(jìn)行聚合，使用查詢語句 sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance)：

這樣我們就分別拿到不同節(jié)點(diǎn) 1 分鐘內(nèi)的空閑 CPU 使用時(shí)間了，然后和總的 CPU (這個(gè)時(shí)候不需要過濾狀態(tài)模式)時(shí)間進(jìn)行比較即可，使用查詢語句 sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(increase(node_cpu_seconds_total[1m])) by (instance)：

然后計(jì)算 CPU 使用率就非常簡單了，使用 1 減去乘以 100 即可：(1 - sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(increase(node_cpu_seconds_total[1m])) by (instance) ) * 100。這就是能夠想到的最直接的 CPU 使用率查詢方式了，當(dāng)然前面我們學(xué)習(xí)的 promql 語法中提到過更多的時(shí)候我們會(huì)去使用 rate 函數(shù)，而不是用 increase 函數(shù)進(jìn)行計(jì)算，所以最終的 CPU 使用率的查詢語句為：(1 - sum(increase(node_cpu_seconds_total{mode="idle"}[1m])) by (instance) / sum(increase(node_cpu_seconds_total[1m])) by (instance) ) * 100。

可以和 top 命令的結(jié)果進(jìn)行對(duì)比(下圖為 node2 節(jié)點(diǎn))，基本上是保持一致的，這就是監(jiān)控節(jié)點(diǎn) CPU 使用率的方式。

責(zé)任編輯：姜華來源： k8s技術(shù)圈

Node Exporter Linux 監(jiān)控

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<blockquote id="hncn8"><p id="hncn8"></p></blockquote><sub id="hncn8"></sub>

<cite id="hncn8"></cite>