nagios監(jiān)控網(wǎng)絡(luò)服務(wù)器和網(wǎng)絡(luò)服務(wù)故障解決篇
nagios添加主機(jī)和服務(wù)可能出現(xiàn)的問題有如下情況:
1:配置參數(shù)出現(xiàn)問題,如果你沒有檢查配置就啟動(dòng)nagios,可能會(huì)啟動(dòng)成功,但是顯示會(huì)不正常;
解決方法:調(diào)整配置參數(shù)
2:Connection refused
當(dāng)出現(xiàn)這個(gè)問題的時(shí)候,我開始以為是ssh的無密碼登錄沒有成功,但是其實(shí)我的服務(wù)器沒有啟動(dòng)該服務(wù)造成的,啟動(dòng)服務(wù)即可。但是這些是有端口的服務(wù),沒有使用端口的狀態(tài)任何檢測(cè)?使用nrpe,ok,我們現(xiàn)在在服務(wù)器上安裝nrpe:
一、遠(yuǎn)程主機(jī)的配置
1、安裝nrpe與配置
- fetch http://ufpr.dl.sourceforge.net/sourceforge/nagios/nrpe-2.5.2.tar.gz
- tar zxvf nrpe-2.5.2.tar.gz
- cd nrpe-2.5.2
- ./configure --enable-ssl --enable-command-args
- make all
- mkdir -p /usr/local/nagios/etc
- mkdir /usr/local/nagios/bin
- mkdir /usr/local/nagios/libexec
- pw addgroup nagios
- pw useradd nagios -g nagios -d /usr/local/nagios/ -s /sbin/nologin
- chown -R nagios:nagios /usr/local/nagios
- cp ./sample-config/nrpe.cfg /usr/local/nagios/etc
- cp src/nrpe /usr/local/nagios/bin
2、啟動(dòng)nrpe,端口為5666
- /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
- netstat -ant | grep 5666
- tcp4 0 0 *.5666 *.* LISTEN
二、監(jiān)控服務(wù)器上的配置
1、安裝nrpe(主要是使用check_nrpe模塊)
- fetch http://ufpr.dl.sourceforge.net/sourceforge/nagios/nrpe-2.5.2.tar.gz
- tar zxvf nrpe-2.5.2.tar.gz
- cd nrpe-2.5.2
- ./configure --enable-ssl --enable-command-args
- make all
- cp src/check_nrpe /usr/local/nagios/libexec
2、nagios文件的配置
vi checkcommands.cfg
定義check_nrpe命令
- # 'check_nrep' command definition
- define command{
- command_name check_nrpe
- command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
- }
三、上面我們已經(jīng)配置了一部分參數(shù)
下面是配置的最終結(jié)果:
- define host{
- use generic-host ; Name of host template to use
- host_name test_nrpe
- alias client
- address 10.5.1.156
- check_command check-host-alive
- max_check_attempts 1
- check_period 24x7
- notification_interval 120
- notification_period 24x7
- notification_options d,r
- contact_groups admins
- }
- # 'check_load' command definition
- define command{
- command_name check_load
- command_line $USER1$/check_load -w $ARG1$ -c $ARG2$
- }
- # 'check_load' command definition
- define command{
- command_name check_disk
- command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description PING
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ping!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description apache
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_http!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description mysql
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_mysql!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description ntp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ntp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description test_load
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_load!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description test_disk
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_disk!100.0,20%!500.0,60%
- }
四、檢查配置參數(shù)并重啟nagios
如何在nagios中使用外部命令
- vi /usr/local/nagios/etc/nagios.cfg
- check_external_commands=1
- mkdir /usr/local/nagios/var/rw
- chown nagios.nagcmd /usr/local/nagios/var/rw
- chmod u+rw /usr/local/nagios/var/rw
- chmod g+rw /usr/local/nagios/var/rw
- chmod g+s /usr/local/nagios/var/rw
- svc -t /service/nagios/
- /usr/local/apache2/bin/apachectl restart
nagios監(jiān)控網(wǎng)絡(luò)服務(wù)器和網(wǎng)絡(luò)服務(wù)問題的解決就結(jié)束了,有關(guān)nagios的基礎(chǔ)內(nèi)容您可以參考:概念篇、安裝篇和配置篇