手把手教你在Kubernetes上部署Redis高可用集群
Redis 介紹
Redis 代表REmote DIctionary Server是一種開源的內(nèi)存中數(shù)據(jù)存儲,通常用作數(shù)據(jù)庫,緩存或消息代理。它可以存儲和操作高級數(shù)據(jù)類型,例如列表,地圖,集合和排序集合。由于Redis接受多種格式的密鑰,因此可以在服務(wù)器上執(zhí)行操作,從而減少了客戶端的工作量。它僅將磁盤用于持久性,而將數(shù)據(jù)庫完全保存在內(nèi)存中。Redis是一種流行的數(shù)據(jù)存儲解決方案,并被GitHub,Pinterest,Snapchat,Twitter,StackOverflow,F(xiàn)lickr等技術(shù)巨頭所使用。
為什么使用 Redis
- 它的速度非???。它是用 ANSI C 編寫的,并且可以在 POSIX 系統(tǒng)上運行,例如 Linux,Mac OS X 和 Solaris。
- Redis 通常被排名為最流行的鍵/值數(shù)據(jù)庫和最流行的與容器一起使用的 NoSQL 數(shù)據(jù)庫。
- 其緩存解決方案減少了對云數(shù)據(jù)庫后端的調(diào)用次數(shù)。
- 應(yīng)用程序可以通過其客戶端 API 庫對其進行訪問。
- 所有流行的編程語言都支持 Redis。
- 它是開源且穩(wěn)定的。
什么是 Redis 集群
Redis Cluster 是一組 Redis 實例,旨在通過對數(shù)據(jù)庫進行分區(qū)來擴展數(shù)據(jù)庫,從而使其更具彈性。群集中的每個成員(無論是主副本還是輔助副本)都管理哈希槽的子集。如果主機無法訪問,則其從機將升級為主機。在由三個主節(jié)點組成的最小 Redis 群集中,每個主節(jié)點都有一個從節(jié)點(以實現(xiàn)最小的故障轉(zhuǎn)移),每個主節(jié)點都分配有一個介于 0 到 16,383 之間的哈希槽范圍。節(jié)點 A 包含從 0 到 5000 的哈希槽,節(jié)點 B 從 5001 到 10000,節(jié)點 C 從 10001 到 16383。群集內(nèi)部的通信是通過內(nèi)部總線進行的,使用協(xié)議傳播有關(guān)群集的信息或發(fā)現(xiàn)新節(jié)點。
在 Kubernetes 中部署 Redis 集群
在Kubernetes中部署Redis集群面臨挑戰(zhàn),因為每個 Redis 實例都依賴于一個配置文件,該文件可以跟蹤其他集群實例及其角色。為此,我們需要結(jié)合使用Kubernetes StatefulSets和PersistentVolumes。
克隆部署文件
- git clone https://github.com/llmgo/redis-sts.git
創(chuàng)建 statefulset 類型資源
- [root@node01 redis-sts]# cat redis-sts.yml
- ---
- apiVersion: v1
- kind: ConfigMap
- metadata:
- name: redis-cluster
- data:
- update-node.sh: |
- #!/bin/sh
- REDIS_NODES="/data/nodes.conf"
- sed -i -e "/myself/ s/[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}/${POD_IP}/" ${REDIS_NODES}
- exec "$@"
- redis.conf: |+
- cluster-enabled yes
- cluster-require-full-coverage no
- cluster-node-timeout 15000
- cluster-config-file /data/nodes.conf
- cluster-migration-barrier 1
- appendonly yes
- protected-mode no
- ---
- apiVersion: apps/v1
- kind: StatefulSet
- metadata:
- name: redis-cluster
- spec:
- serviceName: redis-cluster
- replicas: 6
- selector:
- matchLabels:
- app: redis-cluster
- template:
- metadata:
- labels:
- app: redis-cluster
- spec:
- containers:
- - name: redis
- image: redis:5.0.5-alpine
- ports:
- - containerPort: 6379
- name: client
- - containerPort: 16379
- name: gossip
- command: ["/conf/update-node.sh", "redis-server", "/conf/redis.conf"]
- env:
- - name: POD_IP
- valueFrom:
- fieldRef:
- fieldPath: status.podIP
- volumeMounts:
- - name: conf
- mountPath: /conf
- readOnly: false
- - name: data
- mountPath: /data
- readOnly: false
- volumes:
- - name: conf
- configMap:
- name: redis-cluster
- defaultMode: 0755
- volumeClaimTemplates:
- - metadata:
- name: data
- spec:
- accessModes: [ "ReadWriteOnce" ]
- resources:
- requests:
- storage: 5Gi
- storageClassName: standard
- $ kubectl apply -f redis-sts.yml
- configmap/redis-cluster created
- statefulset.apps/redis-cluster created
- $ kubectl get pods -l app=redis-cluster
- NAME READY STATUS RESTARTS AGE
- redis-cluster-0 1/1 Running 0 53s
- redis-cluster-1 1/1 Running 0 49s
- redis-cluster-2 1/1 Running 0 46s
- redis-cluster-3 1/1 Running 0 42s
- redis-cluster-4 1/1 Running 0 38s
- redis-cluster-5 1/1 Running 0 34s
創(chuàng)建 service
- [root@node01 redis-sts]# cat redis-svc.yml
- ---
- apiVersion: v1
- kind: Service
- metadata:
- name: redis-cluster
- spec:
- type: ClusterIP
- clusterIP: 10.96.0.100
- ports:
- - port: 6379
- targetPort: 6379
- name: client
- - port: 16379
- targetPort: 16379
- name: gossip
- selector:
- app: redis-cluster
- $ kubectl apply -f redis-svc.yml
- service/redis-cluster created
- $ kubectl get svc redis-cluster
- NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
- redis-cluster ClusterIP 10.96.0.100 <none> 6379/TCP,16379/TCP 35s
初始化 redis cluster
下一步是形成Redis集群。為此,我們運行以下命令并鍵入yes以接受配置。前三個節(jié)點成為主節(jié)點,后三個節(jié)點成為從節(jié)點。
- $ kubectl exec -it redis-cluster-0 -- redis-cli --cluster create --cluster-replicas 1 $(kubectl get pods -l app=redis-cluster -o jsonpath='{range.items[*]}{.status.podIP}:6379 ')
- >>> Performing hash slots allocation on 6 nodes...
- Master[0] -> Slots 0 - 5460
- Master[1] -> Slots 5461 - 10922
- Master[2] -> Slots 10923 - 16383
- Adding replica 10.244.2.11:6379 to 10.244.9.19:6379
- Adding replica 10.244.9.20:6379 to 10.244.6.10:6379
- Adding replica 10.244.8.15:6379 to 10.244.7.8:6379
- M: 00721c43db194c8f2cacbafd01fd2be6a2fede28 10.244.9.19:6379
- slots:[0-5460] (5461 slots) master
- M: 9c36053912dec8cb20a599bda202a654f241484f 10.244.6.10:6379
- slots:[5461-10922] (5462 slots) master
- M: 2850f24ea6367de58fb50e632fc56fe4ba5ef016 10.244.7.8:6379
- slots:[10923-16383] (5461 slots) master
- S: 554a58762e3dce23ca5a75886d0ccebd2d582502 10.244.8.15:6379
- replicates 2850f24ea6367de58fb50e632fc56fe4ba5ef016
- S: 20028fd0b79045489824eda71fac9898f17af896 10.244.2.11:6379
- replicates 00721c43db194c8f2cacbafd01fd2be6a2fede28
- S: 87e8987e314e4e5d4736e5818651abc1ed6ddcd9 10.244.9.20:6379
- replicates 9c36053912dec8cb20a599bda202a654f241484f
- Can I set the above configuration? (type 'yes' to accept): yes
- >>> Nodes configuration updated
- >>> Assign a different config epoch to each node
- >>> Sending CLUSTER MEET messages to join the cluster
- Waiting for the cluster to join
- ...
- >>> Performing Cluster Check (using node 10.244.9.19:6379)
- M: 00721c43db194c8f2cacbafd01fd2be6a2fede28 10.244.9.19:6379
- slots:[0-5460] (5461 slots) master
- 1 additional replica(s)
- M: 9c36053912dec8cb20a599bda202a654f241484f 10.244.6.10:6379
- slots:[5461-10922] (5462 slots) master
- 1 additional replica(s)
- S: 87e8987e314e4e5d4736e5818651abc1ed6ddcd9 10.244.9.20:6379
- slots: (0 slots) slave
- replicates 9c36053912dec8cb20a599bda202a654f241484f
- S: 554a58762e3dce23ca5a75886d0ccebd2d582502 10.244.8.15:6379
- slots: (0 slots) slave
- replicates 2850f24ea6367de58fb50e632fc56fe4ba5ef016
- S: 20028fd0b79045489824eda71fac9898f17af896 10.244.2.11:6379
- slots: (0 slots) slave
- replicates 00721c43db194c8f2cacbafd01fd2be6a2fede28
- M: 2850f24ea6367de58fb50e632fc56fe4ba5ef016 10.244.7.8:6379
- slots:[10923-16383] (5461 slots) master
- 1 additional replica(s)
- [OK] All nodes agree about slots configuration.
- >>> Check for open slots...
- >>> Check slots coverage...
- [OK] All 16384 slots covered.
驗證集群
- [root@node01 redis-sts]# kubectl exec -it redis-cluster-0 -- redis-cli cluster info
- cluster_state:ok
- cluster_slots_assigned:16384
- cluster_slots_ok:16384
- cluster_slots_pfail:0
- cluster_slots_fail:0
- cluster_known_nodes:6
- cluster_size:3
- cluster_current_epoch:6
- cluster_my_epoch:1
- cluster_stats_messages_ping_sent:16
- cluster_stats_messages_pong_sent:22
- cluster_stats_messages_sent:38
- cluster_stats_messages_ping_received:17
- cluster_stats_messages_pong_received:16
- cluster_stats_messages_meet_received:5
- cluster_stats_messages_received:38
- [root@node01 redis-sts]# for x in $(seq 0 5); do echo "redis-cluster-$x"; kubectl exec redis-cluster-$x -- redis-cli role; echo; done
- redis-cluster-0
- master
- 14
- 10.244.2.11
- 6379
- 14
- redis-cluster-1
- master
- 28
- 10.244.9.20
- 6379
- 28
- redis-cluster-2
- master
- 28
- 10.244.8.15
- 6379
- 28
- redis-cluster-3
- slave
- 10.244.7.8
- 6379
- connected
- 28
- redis-cluster-4
- slave
- 10.244.9.19
- 6379
- connected
- 14
- redis-cluster-5
- slave
- 10.244.6.10
- 6379
- connected
- 28
測試集群
我們想使用集群,然后模擬節(jié)點的故障。對于前一項任務(wù),我們將部署一個簡單的 Python 應(yīng)用程序,而對于后者,我們將刪除一個節(jié)點并觀察集群行為。
部署點擊計數(shù)器應(yīng)用
我們將一個簡單的應(yīng)用程序部署到集群中,并在其前面放置一個負載平衡器。此應(yīng)用程序的目的是在將計數(shù)器值作為 HTTP 響應(yīng)返回之前,增加計數(shù)器并將其存儲在 Redis 集群中。
- $ kubectl apply -f app-deployment-service.yml
- service/hit-counter-lb created
- deployment.apps/hit-counter-app created
在此過程中,如果我們繼續(xù)加載頁面,計數(shù)器將繼續(xù)增加,并且在刪除Pod之后,我們看到?jīng)]有數(shù)據(jù)丟失。
- $ curl `kubectl get svc hit-counter-lb -o json|jq -r .spec.clusterIP`
- I have been hit 20 times since deployment.
- $ curl `kubectl get svc hit-counter-lb -o json|jq -r .spec.clusterIP`
- I have been hit 21 times since deployment.
- $ curl `kubectl get svc hit-counter-lb -o json|jq -r .spec.clusterIP`
- I have been hit 22 times since deployment.
- $ kubectl delete pods redis-cluster-0
- pod "redis-cluster-0" deleted
- $ kubectl delete pods redis-cluster-1
- pod "redis-cluster-1" deleted
- $ curl `kubectl get svc hit-counter-lb -o json|jq -r .spec.clusterIP`
- I have been hit 23 times since deployment.