云監(jiān)控 Nagios 安裝步驟
前言
最近在研究云監(jiān)控的相關(guān)工具,之前寫過Ganglia的安裝步驟,這回來記錄下Nagios的安裝步驟。
本文不講解相關(guān)原理,若想了解請參考其他資料。
本文目的:即使之前未觸過nagios,也能按照文中步驟搭建自己的nagios監(jiān)控集群。
@Author duangr
@Website http://my.oschina.net/duangr/blog/183160
1. Nagios簡介
Nagios是一個可運行在Linux/Unix平臺之上的開源監(jiān)視系統(tǒng),可以用來監(jiān)視系統(tǒng)運行狀態(tài)和網(wǎng)絡信息。Nagios可以監(jiān)視所指定的本地或遠程主機以及服務,同時提供異常通知功能。在系統(tǒng)或服務狀態(tài)異常時發(fā)出郵件或短信報警***時間通知網(wǎng)站運維人員,在狀態(tài)恢復后發(fā)出正常的郵件或短信通知。
2. 相關(guān)環(huán)境
Host Name | IP | OS |
Arch |
duangr-1 | 192.168.56.10 | CentOS 6.4 | x86_64 |
duangr-2 | 192.168.56.11 | CentOS 6.4 | x86_64 |
duangr-3 | 192.168.56.12 | CentOS 6.4 | x86_64 |
3. 部署規(guī)劃
Nagios主節(jié)點需要安裝:
- nagios
- nagios-plugin
- nrpe
- php
- apache
Nagios從節(jié)點需要安裝:
- nagios-plugin
- nrpe
安裝路徑規(guī)劃
項 | 值 |
nagios安裝路徑 | /usr/local/nagios |
php安裝路徑 | /usr/local/php |
apache安裝路徑 | /usr/local/apache2 |
4. 代碼獲取
- nagios-4.0.2.tar.gz
- nagios-plugins-1.5.tar.gz
- nrpe-2.15.tar.gz
- httpd-2.2.23.tar.gz
- php-5.4.10.tar.gz
5. 前提依賴
5.1 主機環(huán)境檢查(全部節(jié)點)
- # rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
- gcc-4.4.7-3.el6.x86_64
- glibc-2.14.1-6.x86_64
- glibc-common-2.14.1-6.x86_64
- gd-2.0.35-11.el6.x86_64
- package gd-devel is not installed
- package xinetd is not installed
- openssl-devel-1.0.0-27.el6.x86_64
若有缺失,請先安裝. 可通過如下幾個鏡像網(wǎng)站下載相關(guān)安裝包:
- http://rpm.pbone.net/
- http://mirrors.163.com/centos/6.4/os/x86_64/Packages/
- http://mirrors.sohu.com/centos/6.4/os/x86_64/Packages/
安裝后再次檢查如下:
- # rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
- gcc-4.4.7-3.el6.x86_64
- glibc-2.14.1-6.x86_64
- glibc-common-2.14.1-6.x86_64
- gd-2.0.35-11.el6.x86_64
- gd-devel-2.0.35-11.el6.x86_64
- xinetd-2.3.14-38.el6.x86_64
- openssl-devel-1.0.0-27.el6.x86_64
#p#
6. 編譯安裝
6.1 創(chuàng)建用戶nagios(全部節(jié)點)
- useradd nagios -d /usr/local/nagios
- passwd nagios (密碼自定義)
6.2 安裝nagios主程序(主節(jié)點安裝)
- tar -zxf nagios-4.0.2.tar.gz
- cd nagios-4.0.2
- ./configure --prefix=/usr/local/nagios
- make all
- make install && make install-init && make install-commandmode && make install-config
將nagios添加為服務
- chkconfig --add nagios
- chkconfig nagios off
- chkconfig --level 35 nagios on
- chkconfig --list nagios
- nagios 0:關(guān)閉 1:關(guān)閉 2:關(guān)閉 3:啟用 4:關(guān)閉 5:啟用 6:關(guān)閉
6.3 安裝nagios插件(全部節(jié)點安裝)
- tar -zxf nagios-plugins-1.5.tar.gz
- cd nagios-plugins-1.5
- ./configure --prefix=/usr/local/nagios --with-nagios-user=nagios --with-nagios-group=nagios
- make && make install
如果出現(xiàn)mysql相關(guān)的編譯錯誤,是mysql的默認安裝路徑被修改導致的,調(diào)整with-mysql后重新make
- ./configure --prefix=/usr/local/nagios --with-mysql=/usr/local/mysql
- make && make install
6.4 安裝NRPE(全部節(jié)點安裝)
- tar -zxf nrpe-2.15.tar.gz
- cd nrpe-2.15
- ./configure --enable-command-args
- make all
- make install-plugin
下面步驟只需要在被監(jiān)控節(jié)點執(zhí)行
- make install-daemon && make install-daemon-config && make install-xinetd
6.4.1 被監(jiān)控節(jié)點配置
如果是被監(jiān)控節(jié)點,需要配置NRPE已守護進程運行(通過xinetd來運行)
1、更改/etc/xinetd.d/nrpe文件,設置允許nagios主節(jié)點服務器連接
- vi /etc/xinetd.d/nrpe
- only_from = 127.0.0.1 192.168.56.10
2、在/etc/services結(jié)尾增加:
- nrpe 5666/tcp # NRPE
3、增加對參數(shù)的支持
- vi /usr/local/nagios/etc/nrpe.cfg
- dont_blame_nrpe=1
4、啟動xinetd
- service xinetd restart
5、驗證nrpe是否監(jiān)聽
- netstat -at | grep nrpe
6、測試nrpe是否正常運行
- /usr/local/nagios/libexec/check_nrpe -H localhost
- NRPE v2.15
6.4.2 主節(jié)點配置
如果是監(jiān)控服務主節(jié)點,在全部被監(jiān)控節(jié)點NRPE配置完成后,可以依次做下檢測
- /usr/local/nagios/libexec/check_nrpe -H 192.168.56.11
- NRPE v2.15
- /usr/local/nagios/libexec/check_nrpe -H 192.168.56.12
- NRPE v2.15
6.5 安裝Apache(主節(jié)點安裝)
- tar -zxf httpd-2.2.23.tar.gz
- cd httpd-2.2.23
- ./configure --prefix=/usr/local/apache2
- make && make install
6.6 安裝PHP(主節(jié)點安裝)
- cd /export/home/tools/soft/php
- tar -zxf php-5.4.10.tar.gz
- cd /php-5.4.10
- ./configure --prefix=/usr/local/php --with-apxs2=/usr/local/apache2/bin/apxs
- make && make install
6.7 使用apache 發(fā)布PHP的WEB
vi /usr/local/apache2/conf/httpd.conf
- ....
- Listen 80
- ....
- <IfModule dir_module>
- DirectoryIndex index.html index.php
- AddType application/x-httpd-php .php
- </IfModule>
- ....
- #setting for nagios
- ScriptAlias /nagios/cgi-bin "/usr/local/nagios/sbin"
- <Directory "/usr/local/nagios/sbin">
- AuthType Basic
- Options ExecCGI
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthUserFile /usr/local/nagios/etc/htpasswd
- Require valid-user
- </Directory>
- Alias /nagios "/usr/local/nagios/share"
- <Directory "/usr/local/nagios/share">
- AuthType Basic
- Options None
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "nagios Access"
- AuthUserFile /usr/local/nagios/etc/htpasswd
- Require valid-user
- </Directory>
為web訪問時添加用戶名和密碼(此處用戶名為admin,可自定義)
- /usr/local/apache2/bin/htpasswd -c /usr/local/nagios/etc/htpasswd admin
啟動apache
- /usr/local/apache2/bin/apachectl start
訪問頁面:http://192.168.56.10/nagios/
#p#
7. 配置Nagios
7.1 配置遠程被監(jiān)控節(jié)點
7.1.1 修改配置文件
- # su - nagios
- $ vi /usr/local/nagios/etc/nrpe.cfg
修改為如下配置內(nèi)容:
- command[check_users]=/usr/local/nagios/libexec/check_users -w $ARG1$ -c $ARG2$
- command[check_load]=/usr/local/nagios/libexec/check_load -w $ARG1$ -c $ARG2$
- command[check_disk]=/usr/local/nagios/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
- command[check_procs]=/usr/local/nagios/libexec/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
- command[check_procs_args]=/usr/local/nagios/libexec/check_procs $ARG1$
- command[check_swap]=/usr/local/nagios/libexec/check_swap -w $ARG1$ -c $ARG2$
以上監(jiān)控命令功能:
- check_users 監(jiān)控登陸用戶數(shù)
- check_load 監(jiān)控CPU負載
- check_disk 監(jiān)控磁盤的使用
- check_procs 監(jiān)控進程數(shù)量,狀態(tài)包括 RSZDT
- check_swap 監(jiān)控SWAP分區(qū)使用
7.1.2 重啟xinetd服務
配置完上述命令后,重啟 xinetd服務
- service xinetd restart
7.1.3 校驗配置
檢查監(jiān)控命令配置是否ok
- /usr/local/nagios/libexec/check_nrpe -H localhost -c check_users -a 5 10
- /usr/local/nagios/libexec/check_nrpe -H localhost -c check_load -a 15,10,5 30,25,20
- /usr/local/nagios/libexec/check_nrpe -H localhost -c check_disk -a 20% 10% /
- /usr/local/nagios/libexec/check_nrpe -H localhost -c check_procs -a 200 400 RSZDT
- /usr/local/nagios/libexec/check_nrpe -H localhost -c check_swap -a 20% 10%
7.2 配置監(jiān)控服務主節(jié)點
7.2.1 cgi.cfg(控制CGI訪問的配置文件)
(使用 nagios 用戶)
vi /usr/local/nagios/etc/cgi.cfg
修改如下內(nèi)容,為admin用戶增加權(quán)限:
- default_user_name=admin
- authorized_for_system_information=nagiosadmin,admin
- authorized_for_configuration_information=nagiosadmin,admin
- authorized_for_system_commands=nagiosadmin,admin
- authorized_for_all_services=nagiosadmin,admin
- authorized_for_all_hosts=nagiosadmin,admin
- authorized_for_all_service_commands=nagiosadmin,admin
- authorized_for_all_host_commands=nagiosadmin,admin
7.2.2 nagios.cfg(nagios主配置文件)
(使用 nagios 用戶)
vi /usr/local/nagios/etc/nagios.cfg
- #cfg_file=/export/home/nagios/etc/objects/localhost.cfg (注釋掉)
- cfg_dir=/export/home/nagios/etc/servers
主配置文件聲明了監(jiān)控腳本的存儲路徑為 ./servers,默認沒有此目錄,需要手工創(chuàng)建
nagios 會讀取 servers 目錄下面后綴為.cfg的全部文件作為配置文件
- cd /usr/local/nagios/etc
- mkdir servers
- cd servers
7.2.3 定義監(jiān)控的主機組
聲明一個監(jiān)控的主機組,將主機環(huán)境中提到的三臺主機全部加入監(jiān)控
vi /export/home/nagios/etc/servers/group.cfg
新文件,內(nèi)容如下:
- define hostgroup{
- hostgroup_name duangr-server
- alias duangr Server
- members duangr-1,duangr-2,duangr-3
- }
解釋下上面的配置:
- hostgroup_name: 主機組的名稱,可隨意指定
- alias: 主機組別名,可隨意指定
- members: 主機組成員,多個主機名稱之前使用逗號分隔。另外主機名稱必須與 define host 中host_name 一致。
主機的定義,后面會說到。
7.2.4 定義監(jiān)控的主機
下面開始定義具體的主機
7.2.4.1 本地主機監(jiān)控配置
先定義本地主機 duangr-1
vi /export/home/nagios/etc/servers/duangr-1.cfg
新文件,內(nèi)容如下:
- define host{
- use linux-server
- host_name duangr-1
- alias duangr-1
- address 192.168.56.10
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Host Alive
- check_command check-host-alive
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Users
- check_command check_local_users!20!50
- }
- define service{
- use local-service
- host_name duangr-1
- service_description CPU
- check_command check_local_load!5.0,4.0,3.0!10.0,6.0,4.0
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Disk Root
- check_command check_local_disk!20%!10%!/
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Disk Home
- check_command check_local_disk!20%!10%!/export/home
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Zombie Procs
- check_command check_local_procs!5!10!Z
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Total Procs
- check_command check_local_procs!250!400!RSZDT
- }
- define service{
- use local-service
- host_name duangr-1
- service_description Swap Usage
- check_command check_local_swap!20!10
- }
說明下,由于是此主機也是監(jiān)控服務主節(jié)點所在主機,因此可以使用check_local_* 的相關(guān)命令來進行監(jiān)控。
這個文件中已經(jīng)將常用的監(jiān)控項配置進去。
7.2.4.2 遠程主機監(jiān)控配置
再定義遠程主機duangr-2和duangr-3
定義遠程主機的監(jiān)控之前,需要先定義check_nrpe命令
vi /usr/local/nagios/etc/objects/commands.cfg
在文件的***面添加如下內(nèi)容:
- # 'check_nrpe' command definition
- define command{
- command_name check_nrpe
- command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
- }
- define command{
- command_name check_nrpe_args
- command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$ -a $ARG2$
- }
定義duangr-2主機的監(jiān)控配置
$ vi /usr/local/nagios/etc/servers/duangr-2.cfg
新文件,內(nèi)容如下:
- define host{
- use linux-server
- host_name duangr-2
- alias duangr-2
- address 192.168.56.11
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Host Alive
- check_command check-host-alive
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Users
- check_command check_nrpe_args!check_users!5 10
- }
- define service{
- use local-service
- host_name duangr-2
- service_description CPU
- check_command check_nrpe_args!check_load!15,10,5 30,25,20
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Disk Root
- check_command check_nrpe_args!check_disk!20% 10% /
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Disk /export/home
- check_command check_nrpe_args!check_disk!20% 10% /export/home
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Procs Zombie
- check_command check_nrpe_args!check_procs!5 10 Z
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Procs Total
- check_command check_nrpe_args!check_procs_args!"-w400 -c600"
- }
- define service{
- use local-service
- host_name duangr-2
- service_description Swap Usage
- check_command check_nrpe_args!check_swap!20% 10%
- }
- ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
- ;; 下面是一些常用進程的監(jiān)控,主要是云平臺相關(guān)進程
- ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
- ;; 監(jiān)控crond進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: crond
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Ccrond"
- }
- ;; 監(jiān)控zookeeper進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: QuorumPeerMain
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.quorum.QuorumPeerMain"
- }
- ;;監(jiān)控storm的從節(jié)點進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: supervisor
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.supervisor"
- }
- ;; 監(jiān)控storm的主節(jié)點進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: nimbus
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -adaemon.nimbus"
- }
- ;; 監(jiān)控MetaQ進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: MetaQ
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -ametamorphosis-server-w"
- }
- ;; 監(jiān)控Redis進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: redis-server
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Credis-server"
- }
- ;; 監(jiān)控hadoop主節(jié)點NameNode進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: NameNode
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.NameNode"
- }
- ;; 監(jiān)控hadoop主節(jié)點SecondaryNameNode進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: SecondaryNameNode
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.namenode.SecondaryNameNode"
- }
- ;; 監(jiān)控hadoop主節(jié)點ResourceManager進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: ResourceManager
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.resourcemanager.ResourceManager"
- }
- ;; 監(jiān)控hadoop從節(jié)點DataNode進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: DataNode
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.datanode.DataNode"
- }
- ;;監(jiān)控hadoop從節(jié)點NodeManager進程
- define service{
- use local-service
- host_name duangr-2
- service_description PS: NodeManager
- check_command check_nrpe_args!check_procs_args!"-c1:1 -Cjava -aserver.nodemanager.NodeManager"
- }
說明下,由于duangr-2是遠程主機,因此使用check_nrpe_args命令來監(jiān)控.
這個文件中已經(jīng)將常用的監(jiān)控項配置進去, 同時還包含了hadoop、storm、zookeeper、metaq、redis的相關(guān)進程監(jiān)控,主要的監(jiān)控思路是判斷進程是否存在。
定義duangr-3主機的監(jiān)控配置
vi duangr-3.cfg
內(nèi)容與duangr-2.cfg類似,只需要修改 host_name 、alias、 address即可.
7.2.4.3 郵件監(jiān)控
定義監(jiān)控人郵件地址
vi /usr/local/nagios/etc/objects/contacts.cfg
- define contact{
- contact_name nagiosadmin ; Short name of user
- use generic-contact ; Inherit default values from generic-contact template (defined above)
- alias Nagios Admin ; Full name of user
- email yourname@domain.com
- ; <<***** CHANGE THIS TO YOUR EMAIL ADDRESS ******
- }
除了配置監(jiān)控郵件的接收人外,還要確保:
- 本主機與郵件服務器互通
- 本主機SendMail可以使用外部SMTP服務發(fā)送郵件
7.2.4.4 校驗配置
- /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
7.2.4.5 啟動
- /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
nagios已經(jīng)是一個服務,也可以執(zhí)行如下操作:
- service nagios start/stop/restart/status
#p#
8. 監(jiān)控頁面