用Nagios來(lái)監(jiān)控網(wǎng)絡(luò)服務(wù)器-nagios配置
現(xiàn)在開(kāi)始配置:
1:配置web接口
假設(shè)你已經(jīng)運(yùn)行了apache,如果沒(méi)有,請(qǐng)參考:apache的安裝
- vi /usr/local/apache2/conf/httpd.conf
添加如下內(nèi)容:
- ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin
- Options Execcgi
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd.users
- Require valid-user
- Alias /nagios /usr/local/nagios/share
- Options None
- AllowOverride None
- Order allow,deny
- Allow from all
- AuthName "Nagios Access"
- AuthType Basic
- AuthUserFile /usr/local/nagios/etc/htpasswd.users
- Require valid-user
修改完畢,保存文件,并重啟apache:
- /usr/local/apahce2/bin/apachectl restart
2:配置apache的BASIC認(rèn)證:
生成認(rèn)證密碼:
- /usr/local/apache2/bin/htpasswd –c /usr/local/nagios/etc/htpasswd.users nagios nagios
apache接口配置完成。
#p#
開(kāi)始配置nagios:
- cd /usr/local/nagios/etc/
在/usr/local/nagios/etc下是nagios的配置模板文件-sample,把.cfg-sample文件全部拷貝成.cfg
例如:cp nagios.cfg-sample nagios.cfg
全部拷貝完成即可.
- vi minimal.cfg
注釋所有command:
注釋的方法是在每一個(gè)定義語(yǔ)句前面添加”#“
修改cgi.cfg
修改use_authentication=1為use_authentication=0,即不用驗(yàn)證.不然有一些頁(yè)面不會(huì)顯示。
現(xiàn)在檢查配置文件是否有語(yǔ)法錯(cuò)誤:
- /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果正確,會(huì)顯示以下結(jié)果:
- Total Warnings: 0
- Total Errors: 0
否則,需要根據(jù)提示進(jìn)行修改配置文件。
配置文件等會(huì)再弄。現(xiàn)在啟動(dòng)nagios
- /usr/local/nagios/bin/nagios -d /usr/local/nagios/etc/nagios.cfg
為了使nagios異常中斷,我們使用daemontools啟動(dòng):
#p#
安裝daemontool:
- mkdir -p /package
- chmod 1755 /package
- cd /package
- fetch http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
- cd admin/daemontools-0.76/
- package/install
檢查svscan進(jìn)程是否啟動(dòng):
- ps aux | grep svscan
- root 376 0.0 0.0 1636 0 con- IW - 0:00.00 /bin/sh /command/svscanboot
- root 411 0.0 0.0 1224 208 con- S 8Jul06 0:42.50 svscan /service
ok,啟動(dòng)正常了。
- cd /service
- mkdir nagios
- chmod 1755 nagios
- touch ./run
- chmod 755 ./run
- vi run
- PATH=/usr/local/bin:/usr/bin:/bin
- export PATH
- exec env - PATH=$PATH \
- /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
- mkdir log
- cd log
- touch ./run
- chmod 755 ./run
- vi ./run
- #!/bin/sh
- exec setuidgid logadmin multilog t s1000000 n100 ./main
- mkdir main
- chmod 777 main
- chown nagios.nagios main
- touch status
- chown nagios.nagios status
- svc -u /service/nagios/
- svstat /service/nagios/
- root@## ps auxww | grep nagios
- root 23276 0.0 0.1 1176 488 ?? I 5:00PM 0:01.71 supervise nagios
- nagios 34251 0.0 0.3 2316 1552 ?? S 6:06PM 0:00.10 /usr/local/nagios/bin/nagios /usr/local/nagios/etc/nagios.cfg
- root@##
ok,現(xiàn)在把nagios服務(wù)做成自動(dòng)啟動(dòng)的服務(wù)了。
通過(guò)svc命令可以啟動(dòng)或者停止服務(wù)。
---------------------------------------------------------------------------------
- svc opts services
- opts is a series of getopt-style options. services consists of any number of arguments, each argument naming a directory used by supervise.
- -u: Up. If the service is not running, start it. If the service stops, restart it.
- -d: Down. If the service is running, send it a TERM signal and then a CONT signal. After it stops, do not restart it.
- -o: Once. If the service is not running, start it. Do not restart it if it stops.
- -p: Pause. Send the service a STOP signal.
- -c: Continue. Send the service a CONT signal.
- -h: Hangup. Send the service a HUP signal.
- -a: Alarm. Send the service an ALRM signal.
- -i: Interrupt. Send the service an INT signal.
- -t: Terminate. Send the service a TERM signal.
- -k: Kill. Send the service a KILL signal.
- -x: Exit. supervise will exit as soon as the service is down. If you use this option on a stable system, you're doing something wrong; supervise is designed to run forever.
---------------------------------------------------------------------------------
比如:
停止nagios--svc -d /service/nagios/
重啟nagios--svc -t /service/nagios/
啟動(dòng)nagios--svc -u /service/nagios/
當(dāng)然,你也可以使用inited的方式進(jìn)行:
/usr/local/etc/rc.d/nagios start/stop
好了,反正daemontools很強(qiáng)大,以后慢慢熟悉,轉(zhuǎn)入正題。
現(xiàn)在打開(kāi)網(wǎng)頁(yè):http://localhost/nagios/
一定會(huì)讓你大吃一驚,呵呵,我的服務(wù)器和服務(wù)狀態(tài)都清楚的看到了。
現(xiàn)在我們的nagios中只有一個(gè),那就是它自己,localhost,呵呵,等會(huì)我們添加別的主機(jī)和主機(jī)服務(wù),ok,我們認(rèn)識(shí)一下nagios的廬山真面目:
#p#
配置nagios:
1)為主機(jī)添加服務(wù)
2)添加主機(jī)并添加服務(wù)
3)停止一個(gè)服務(wù)
4)刪除一臺(tái)主機(jī)和服務(wù)
5)查看所有主機(jī)的故障
6)查看一臺(tái)特定的主機(jī)狀態(tài)
7)改變報(bào)警的時(shí)間間隔
8)改變發(fā)現(xiàn)故障的重試次數(shù)
9)如何在nagios中使用外部命令
1)為主機(jī)添加一個(gè)服務(wù)
為localhost主機(jī)添加qmail服務(wù)的監(jiān)控,方法如下:
- vi minimal.cfg
- define service{
- use generic-service ; Name of service template to use
- host_name localhost
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!20%!10%!/
- }
可以直接拷貝原有的進(jìn)行修改,我這個(gè)就是拷貝的原有的check_local_disk進(jìn)行的。
修改host_name,service_description,check_command等
- define service{
- use generic-service ; Name of service template to use
- host_name localhost
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!20%!10%!/
- }
照貓畫(huà)虎的進(jìn)行修改,然后去修改:
- vi checkcommands.cfg
- #'check_qmail' command definition
- define command{
- command_name check_qmail
- command_line $USER1$/check_smtp -H 127.0.0.1
- }
- define command{
- command_name check_pop3
- command_line $USER1$/check_pop -H 127.0.0.1
- }
保存,然后檢查配置文件:
- /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
如果沒(méi)有錯(cuò)誤會(huì)顯示:
- Total Warnings: 0
- Total Errors: 0
如果有錯(cuò)誤,請(qǐng)根據(jù)提示進(jìn)行錯(cuò)誤的修正。
重啟nagios
- svc -d /service/nagios/ && svc -u /service/nagios/
通過(guò)web頁(yè)面檢查nagios的結(jié)果:
http://10.5.1.153/nagios/
點(diǎn)擊“Service Detail”
會(huì)出現(xiàn):
2)添加主機(jī)并添加服務(wù)
我們會(huì)監(jiān)控這臺(tái)主機(jī)的負(fù)載、磁盤(pán)等一些沒(méi)有通過(guò)端口方式啟動(dòng)的服務(wù)器狀態(tài),以及它的服務(wù),比如:apache、mysql、qmail和ntp等等吧。那么沒(méi)有端口的nagios直接能監(jiān)控到嗎?答案是不行。所以我們必須在兩臺(tái)主機(jī)上安裝nrpe,nrpe可以啟動(dòng)5666端口,把檢測(cè)的信息源源不斷的傳給監(jiān)控中心的主機(jī)。
ok,我們把a(bǔ)pache、mysql、qmail和ntp先加上,這回我們把監(jiān)控的主機(jī)和服務(wù)新建一個(gè)文件:
- cd /usr/local/nagios/etc/
- touch 10_5_1_156.cfg
- vi nagios.cfg
- cfg_file=/usr/local/nagios/etc/10_5_1_156.cfg
- vi 10_5_1_156.cfg
定義一個(gè)主機(jī):
- define host{
- use generic-host ; Name of host template to use
- host_name test_nrpe
- alias client
- address 10.5.1.156
- check_command check-host-alive
- max_check_attempts 1
- check_period 24x7
- notification_interval 120
- notification_period 24x7
- notification_options d,r
- contact_groups admins
- }
定義主機(jī)需要檢查的服務(wù):
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description PING
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ping!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description apache
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_http!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description mysql
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_mysql!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description ntp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_ntp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_smtp
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_smtp!100.0,20%!500.0,60%
- }
- define service{
- use generic-service ; Name of service template to use
- host_name test_nrpe
- service_description qmail_pop3
- is_volatile 0
- check_period 24x7
- max_check_attempts 1
- normal_check_interval 1
- retry_check_interval 1
- contact_groups admins
- notification_options w,u,c,r
- notification_interval 960
- notification_period 24x7
- check_command check_pop!100.0,20%!500.0,60%
- }
現(xiàn)在我們象上次一樣把服務(wù)也定義完了。
【編輯推薦】