DRBD+Heartbeat+Pacemaker實現(xiàn)MFS的高可用
前言:單點問題是單點故障(Single Point of Failure)造成系統(tǒng)失效的通俗說法。對于單點問題,我們解決的方案一般是采用冗余設備或者熱備,因為硬件的錯誤或者人為的原因,總是有可能造成單個或多個節(jié)點的失效,有時我們做節(jié)點的維護或者升級,也需要暫時停止某些節(jié)點,所以一個可靠的系統(tǒng)必須能承受單個或多個節(jié)點的停止。
我原本的架構是由DRBD和MooseFS所組成,DRBD實現(xiàn)了基于網(wǎng)絡的raid 1,HA和Pacemaker實現(xiàn)基于策略的故障轉移;而MooseFS由于設計了meta server,到目前為止還沒有做系統(tǒng)級冗余的計劃。但是,meta server并不能實時自動切換,所以,只有我們自己來設計了。這篇就是介紹如何設計實施故障的自動切換,以及故障排除后的自動恢復。
名詞定義
單點故障(Single Point of Failure)(引用來源)
引起產(chǎn)品故障的,且沒有冗余或替代的工作程序作為補救的局部故障。(GJB451-90)
單點失效是導致一項產(chǎn)品完成任務的性能不可逆轉地降低到合同規(guī)定水平以下的單一硬件失效或軟件差錯(產(chǎn)品發(fā)生單點失效的方式就是產(chǎn)品的單點失效模式)。(MIL-STD-1543B-88)
某產(chǎn)品的失效將導致系統(tǒng)的失效,且不能由貯備或代替的工作程序來補償。(MIL-STD-721C-81)
高可用
高可用(HA)有兩種不同的含義,在廣義環(huán)境中,是指整個系統(tǒng)的高可用(High Availability)性,在狹義方面,一般指主機的冗余接管,如主機HA,如果不特殊說明,本書中的HA都指廣義的高可用性。在高可用的解釋方面,可以分為如下一些方面:
(1)系統(tǒng)失敗或崩潰(System faults and crashes)
(2)應用層或者中間層錯誤(Application and middleware failures)
(3)網(wǎng)絡失?。∟etwork failures)
(4)介質(zhì)失敗,一般指存放數(shù)據(jù)的媒體介質(zhì)故障(Media failures)
(5)人為失誤(Human Error)
(6)分級與容災(Disasters and extended outages)
(7)計劃宕機與維護(Planned downtime, maintenance and management tasks)
以下為具體描述與實現(xiàn)步驟。
用途:
解決mfs master的單點問題,同樣可以作為其他需要高可用環(huán)境的標準配置方法
規(guī)劃:
使用drbd實現(xiàn)主備機的災容,Heartbeat做心跳監(jiān)測,Pacemaker實現(xiàn)服務(資源)的切換及控制等
描述:
drbd雙主模式對網(wǎng)絡和配置要求比較高,在此結構下不采用;
drbd需要清空一個分區(qū),并且不能格式化
可單獨由heartbeat實現(xiàn)服務的切換及failover;
需要ha項目中的其他組件,如GLUE和ResourceAgent。
軟件:
DRBD下載地址:http://oss.linbit.com/drbd/
DRBD 8.3.9: drbd-8.3.9.tar.gz
HA下載地址:http://www.linux-ha.org/wiki/Downloads
Heartbeat 3.0.4: Heartbeat-3-0-STABLE-3.0.4.tar.bz2
Cluster Glue 1.0.7: glue-1.0.7.tar.bz2
Resource Agents 1.0.3: agents-1.0.3.tar.bz2
Pacemaker 1.0.5: Pacemaker-1.0.5.tar.bz2
mfs下載:http://www.moosefs.org/index.php/download.html
moosefs 1.6.20: mfs-1.6.20-2.tar.gz
環(huán)境:
mfsmaster:192.168.1.1 mfsbackup:192.168.1.2 VIP:192.168.1.10
安裝:
設置hosts:
# vi /etc/hosts 192.168.1.1 mfs.master 192.168.1.2 mfs.backup
DRBD:
# wget http://oss.linbit.com/drbd/8.3/drbd-8.3.9.tar.gz # tar zxvf drbd-8.3.9.tar.gz # cd drbd-8.3.9 # ./configure --prefix=/usr/local/drbd --with-km # make && make install # vi /usr/local/drbd/etc/drbd.d/ global_common.conf syncer { # rate after al-extents use-rle cpu-mask verify-alg csums-alg rate 100M; } # vi /usr/local/drbd/etc/drbd.d/ mfs.res resource mfs { device /dev/drbd0; disk /dev/lvm/mfsdata; meta-disk internal; on mfs.master { address 192.168.1.1:7789; } on mfs.backup { address 192.168.1.2:7789; } } # cp /usr/local/drbd/etc/rc.d/init. d/drbd /etc/init.d/ # insmod /lib/modules/2.6.18-8.el5/ kernel/drivers/block/drbd.ko # modprobe drbd # chkconfig --add drbd # chkconfig --level 35 drbd on # service drbd start # drbdadm create-md all # mkfs.ext3 /dev/drbd0 # vi /etc/fstab /dev/drbd0 /mfsmeta ext3 defaults,noatime,nodiratime 0 0
以上主備機同樣的配置,查看同步狀態(tài):
# cat /proc/drbd GIT-hash: 1c3b2f71137171c1236b497969734da43b5bec90 build by root@mfs.master, 2010-12-20 19:19:37 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:89190240 nr:613604 dw:89803844 dr:620461 al:45275 bm:5 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
在主機上執(zhí)行:
# drbdsetup /dev/drbd0 primary -o # mount /mfsmeta
至此drbd安裝完成,可以根據(jù)文檔,做一些測試。
MFS:
# wget http://pro.hit.gemius.pl/ hitredir/id= 0sWa0S8ft4sTAHF1bGAAEZPcP3ziyq 7f9SdhoQf7oeT.c7/url=moosefs. org/tl_files/mfscode/mfs-1.6. 20-2.tar.gz # tar zxvf mfs-1.6.20-2.tar.gz # cd mfs-1.6.20 # ./configure --prefix=/usr/local/mfs # make # make install # vi /usr/local/mfs/etc/mfsmaster. cfg DATA_PATH = /mfsdata/metalog # mkdir /mfsmeta/metalog # chown nobody.nobody /mfsmeta/metalog
設置臨時hosts,測試mfs啟動:
# vi /etc/hosts 192.168.1.1 mfsmaster # /usr/local/mfs/sbin/mfsmaster start
mfs master安裝完成,
HA:
配置:
# export PREFIX=/usr # export LCRSODIR=$PREFIX/libexec/lcrso # export CLUSTER_USER=hacluster # export CLUSTER_GROUP=haclient # getent group ${CLUSTER_GROUP} >/dev/null || groupadd -r ${CLUSTER_GROUP} # getent passwd ${CLUSTER_USER} >/dev/null || useradd -r -g ${CLUSTER_GROUP} -d /var/lib/heartbeat/cores/hacluster -s /sbin/nologin -c "cluster user" ${CLUSTER_USER}
GLUE:
# wget -O cluster-glue.tar.bz2 http://hg.linux-ha.org/glue/archive/tip. tar.bz2 # tar jxvf cluster-glue.tar.bz2 # cd Reusable-Cluster-Components-* # ./autogen.sh && ./configure --prefix=$PREFIX --with-daemon-user=${CLUSTER_ USER} --with-daemon-group=${CLUSTER_ GROUP} # make # make install
Resource Agent:
# wget -O resource-agents.tar.bz2 http://hg.linux-ha.org/agents/ archive/tip.tar.bz2 # tar jxvf resource-agents.tar.bz2 # cd Cluster-Resource-Agents-* # ./autogen.sh && ./configure --prefix=$PREFIX # make # make install
Heartbeat:
# wget -O heartbeat.tar.bz2 http://hg.linux-ha.org/dev/archive/tip. tar.bz2 # tar jxvf heartbeat.tar.bz2 # cd Heartbeat-* # ./bootstrap && ./configure --prefix=$PREFIX # make # make install # cp doc/ha.cf $PREFIX/etc/ha.d/ # cp doc/authkeys $PREFIX/etc/ha.d/ # chmod 0600 $PREFIX/etc/ha.d/authkeys
配置heartbeat:
# vi $PREFIX/etc/ha.d/ha.cf debugfile /opt/logs/heartbeat/ha-debug logfile /opt/logs/heartbeat/ha-log logfacility local0 keepalive 2 deadtime 30 warntime 10 initdead 120 udpport 694 ucast eth0 192.168.1.2 auto_failback on ping 192.168.1.254 respawn hacluster /usr/lib64/heartbeat/ipfail crm on # service heartbeat start
具體配置說明請參考文檔。
Pacemaker:
# wget -O pacemaker.tar.bz2 http://hg.clusterlabs.org/pacemaker/ stable-1.0/archive/tip.tar.bz2 # tar jxvf pacemaker.tar.bz2 # cd Pacemaker-1-0-* # ./autogen.sh && ./configure --prefix=$PREFIX --with-lcrso-dir=$LCRSODIR # make # make install # ldconfig -v # crm crm(live)# configure node mfs.master crm(live)# configure node mfs.backup crm(live)# configure primitive mfsmaster_drbd ocf:linbit:drbd params drbd_resource="mfs" drbdconf="/usr/local/drbd/etc/ drbd.conf" meta migration-threshold="10"
crm(live)# configure monitor mfsmaster_drbd 30s:20s
crm(live)# configure primitive mfsmaster_fs ocf:heartbeat:Filesystem params device="/dev/drbd0" directory="/mfsmeta" fstype="ext3"
crm(live)# configure primitive mfsmaster_vip ocf:heartbeat:IPaddr2 params ip="192.168.1.10" nic="eth0:0"
crm(live)# configure primitive mfsmaster lsb:mfsmaster
crm(live)# configure monitor mfsmaster 30s:30s
crm(live)# configure group mfsmaster_group mfsmaster_vip mfsmaster_fs mfsmaster
crm(live)# configure ms mfsmaster_drbd_ms mfsmaster_drbd meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
crm(live)# configure colocation mfsmaster_colo inf: mfsmaster_group mfsmaster_drbd_ms:Master
crm(live)# configure order mfsmaster_order inf: mfsmaster_drbd_ms:promote mfsmaster_group:start
crm(live)# configure property $id="cib-bootstrap-options" expected-quorum-votes="2" stonith-enabled="false" no-quorum-policy="ignore" start-failure-is-fatal="false"
crm(live)# configure commit
crm(live)# quit
設置VIP為mfsmaster:
# vi /etc/hosts 192.168.1.10 mfsmaster
可以使用crm_mon查看資源狀態(tài),如果正常應該是類似:
# crm_mon ============ Last updated: Sun Jan 23 14:01:53 2011 Stack: Heartbeat Current DC: mfs.backup (985860ea-ae2b-4490-b7e9-19f902321358) - partition with quorum Version: 1.0.10- b0266dd5ffa9c51377c68b1f29d6bc 84367f51dd 2 Nodes configured, 2 expected votes 2 Resources configured. ============ Online: [ mfs.master mfs.backup ] Resource Group: mfsmaster_group mfsmaster_vip (ocf::heartbeat:IPaddr2): Started mfs.master mfsmaster_fs (ocf::heartbeat:Filesystem): Started mfs.master mfsmaster (lsb:mfsmaster): Started mfs.master Master/Slave Set: mfsmaster_drbd_ms Masters: [ mfs.master ] Slaves: [ mfs.backup ]
至此,mfs的高可用環(huán)境搭建完成,可以使用crm node stndby和crm node online進行資源測試。
原文:http://hi.baidu.com/leolance/blog/item/7ac035205870f020c9955905.html
【編輯推薦】