徹底搞懂 Kubernetes 中的 Events
之前我寫了一篇《更優(yōu)雅的 Kubernetes 集群事件度量方案》,利用 Jaeger 利用 tracing 的方式來采集 Kubernetes 集群中的 events 并進行展示。最終效果如下:
寫那篇文章的時候,立了個 flag 要詳細介紹下其中的原理,鴿了很久,現(xiàn)在年底了,也該發(fā)出來了。
Eents 概覽
我們先來做個簡單的示例,來看看 Kubernetes 集群中的 events 是什么。
創(chuàng)建一個新的名叫 moelove 的 namespace ,然后在其中創(chuàng)建一個叫做 redis 的 deployment。接下來查看這個 namespace 中的所有 events。
- (MoeLove) ➜ kubectl create ns moelove
- namespace/moelove created
- (MoeLove) ➜ kubectl -n moelove create deployment redis --image=ghcr.io/moelove/redis:alpine
- deployment.apps/redis created
- (MoeLove) ➜ kubectl -n moelove get deploy
- NAME READY UP-TO-DATE AVAILABLE AGE
- redis 1/1 1 1 11s
- (MoeLove) ➜ kubectl -n moelove get events
- LAST SEEN TYPE REASON OBJECT MESSAGE
- 21s Normal Scheduled pod/redis-687967dbc5-27vmr Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
- 21s Normal Pulling pod/redis-687967dbc5-27vmr Pulling image "ghcr.io/moelove/redis:alpine"
- 15s Normal Pulled pod/redis-687967dbc5-27vmr Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s
- 14s Normal Created pod/redis-687967dbc5-27vmr Created container redis
- 14s Normal Started pod/redis-687967dbc5-27vmr Started container redis
- 22s Normal SuccessfulCreate replicaset/redis-687967dbc5 Created pod: redis-687967dbc5-27vmr
- 22s Normal ScalingReplicaSet deployment/redis Scaled up replica set redis-687967dbc5 to 1
但是我們會發(fā)現(xiàn)默認情況下 kubectl get events 并沒有按照 events 發(fā)生的順序進行排列,所以我們往往需要為其增加 --sort-by='{.metadata.creationTimestamp}' 參數(shù)來讓其輸出可以按時間進行排列。
這也是為何 Kubernetes v1.23 版本中會新增 kubectl alpha events 命令的原因。
按時間排序后可以看到如下結(jié)果:
- (MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}'
- LAST SEEN TYPE REASON OBJECT MESSAGE
- 2m12s Normal Scheduled pod/redis-687967dbc5-27vmr Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
- 2m13s Normal SuccessfulCreate replicaset/redis-687967dbc5 Created pod: redis-687967dbc5-27vmr
- 2m13s Normal ScalingReplicaSet deployment/redis Scaled up replica set redis-687967dbc5 to 1
- 2m12s Normal Pulling pod/redis-687967dbc5-27vmr Pulling image "ghcr.io/moelove/redis:alpine"
- 2m6s Normal Pulled pod/redis-687967dbc5-27vmr Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s
- 2m5s Normal Created pod/redis-687967dbc5-27vmr Created container redis
- 2m5s Normal Started pod/redis-687967dbc5-27vmr Started container redis
通過以上的操作,我們可以發(fā)現(xiàn) events 實際上是 Kubernetes 集群中的一種資源。當 Kubernetes 集群中資源狀態(tài)發(fā)生變化時,可以產(chǎn)生新的 events。
深入 Events
單個 Event 對象
既然 events 是 Kubernetes 集群中的一種資源,正常情況下它的 metadata.name 中應(yīng)該包含其名稱,用于進行單獨操作。所以我們可以使用如下命令輸出其 name :
- (MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}' -o jsonpath='{range .items[*]}{.metadata.name}{"\n"}{end}'
- redis-687967dbc5-27vmr.16c4fb7bde8c69d2
- redis-687967dbc5.16c4fb7bde6b54c4
- redis.16c4fb7bde1bf769
- redis-687967dbc5-27vmr.16c4fb7bf8a0ab35
- redis-687967dbc5-27vmr.16c4fb7d8ecaeff8
- redis-687967dbc5-27vmr.16c4fb7d99709da9
- redis-687967dbc5-27vmr.16c4fb7d9be30c06
選擇其中的任意一條 event 記錄,將其輸出為 YAML 格式進行查看:
- (MoeLove) ➜ kubectl -n moelove get events redis-687967dbc5-27vmr.16c4fb7bde8c69d2 -o yaml
- action: Binding
- apiVersion: v1
- eventTime: "2021-12-28T19:31:13.702987Z"
- firstTimestamp: null
- involvedObject:
- apiVersion: v1
- kind: Pod
- name: redis-687967dbc5-27vmr
- namespace: moelove
- resourceVersion: "330230"
- uid: 71b97182-5593-47b2-88cc-b3f59618c7aa
- kind: Event
- lastTimestamp: null
- message: Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
- metadata:
- creationTimestamp: "2021-12-28T19:31:13Z"
- name: redis-687967dbc5-27vmr.16c4fb7bde8c69d2
- namespace: moelove
- resourceVersion: "330235"
- uid: e5c03126-33b9-4559-9585-5e82adcd96b0
- reason: Scheduled
- reportingComponent: default-scheduler
- reportingInstance: default-scheduler-kind-control-plane
- source: {}
- type: Normal
可以看到其中包含了很多信息, 這里我們先不展開。我們看另一個例子。
kubectl describe 中的 Events
我們可以分別對 Deployment 對象和 Pod 對象執(zhí)行 describe 的操作,可以得到如下結(jié)果(省略掉了中間輸出):
- 對 Deployment 操作
- (MoeLove) ➜ kubectl -n moelove describe deploy/redis
- Name: redis
- Namespace: moelove
- ...
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Normal ScalingReplicaSet 15m deployment-controller Scaled up replica set redis-687967dbc5 to 1
- 對 Pod 操作
- (MoeLove) ➜ kubectl -n moelove describe pods redis-687967dbc5-27vmr
- Name: redis-687967dbc5-27vmr
- Namespace: moelove
- Priority: 0
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Normal Scheduled 18m default-scheduler Successfully assigned moelove/redis-687967dbc5-27vmr to kind-worker3
- Normal Pulling 18m kubelet Pulling image "ghcr.io/moelove/redis:alpine"
- Normal Pulled 17m kubelet Successfully pulled image "ghcr.io/moelove/redis:alpine" in 6.814310968s
- Normal Created 17m kubelet Created container redis
- Normal Started 17m kubelet Started container redis
我們可以發(fā)現(xiàn) 對不同的資源對象進行 describe 的時候,能看到的 events 內(nèi)容都是與自己有直接關(guān)聯(lián)的。在 describe Deployment 的時候,看不到 Pod 相關(guān)的 Events 。
這說明, Event 對象中是包含它所描述的資源對象的信息的,它們是有直接聯(lián)系的。
結(jié)合前面我們看到的單個 Event 對象,我們發(fā)現(xiàn) involvedObject 字段中內(nèi)容就是與該 Event 相關(guān)聯(lián)的資源對象的信息。
更進一步了解 Events
我們來看看如下的示例,創(chuàng)建一個 Deployment ,但是使用一個不存在的鏡像:
- (MoeLove) ➜ kubectl -n moelove create deployment non-exist --image=ghcr.io/moelove/non-exist
- deployment.apps/non-exist created
- (MoeLove) ➜ kubectl -n moelove get pods
- NAME READY STATUS RESTARTS AGE
- non-exist-d9ddbdd84-tnrhd 0/1 ErrImagePull 0 11s
- redis-687967dbc5-27vmr 1/1 Running 0 26m
我們可以看到當前的 Pod 處于一個 ErrImagePull 的狀態(tài)。查看當前 namespace 中的 events (我省略掉了之前 deploy/redis 的記錄)
- (MoeLove) ➜ kubectl -n moelove get events --sort-by='{.metadata.creationTimestamp}'
- LAST SEEN TYPE REASON OBJECT MESSAGE
- 35s Normal SuccessfulCreate replicaset/non-exist-d9ddbdd84 Created pod: non-exist-d9ddbdd84-tnrhd
- 35s Normal ScalingReplicaSet deployment/non-exist Scaled up replica set non-exist-d9ddbdd84 to 1
- 35s Normal Scheduled pod/non-exist-d9ddbdd84-tnrhd Successfully assigned moelove/non-exist-d9ddbdd84-tnrhd to kind-worker3
- 17s Warning Failed pod/non-exist-d9ddbdd84-tnrhd Error: ErrImagePull
- 17s Warning Failed pod/non-exist-d9ddbdd84-tnrhd Failed to pull image "ghcr.io/moelove/non-exist": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/moelove/non-exist:latest": failed to resolve reference "ghcr.io/moelove/non-exist:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
- 18s Normal Pulling pod/non-exist-d9ddbdd84-tnrhd Pulling image "ghcr.io/moelove/non-exist"
- 4s Warning Failed pod/non-exist-d9ddbdd84-tnrhd Error: ImagePullBackOff
- 4s Normal BackOff pod/non-exist-d9ddbdd84-tnrhd Back-off pulling image "ghcr.io/moelove/non-exist"
對這個 Pod 執(zhí)行 describe 操作:
- (MoeLove) ➜ kubectl -n moelove describe pods non-exist-d9ddbdd84-tnrhd
- ...
- Events:
- Type Reason Age From Message
- ---- ------ ---- ---- -------
- Normal Scheduled 4m default-scheduler Successfully assigned moelove/non-exist-d9ddbdd84-tnrhd to kind-worker3
- Normal Pulling 2m22s (x4 over 3m59s) kubelet Pulling image "ghcr.io/moelove/non-exist"
- Warning Failed 2m21s (x4 over 3m59s) kubelet Failed to pull image "ghcr.io/moelove/non-exist": rpc error: code = Unknown desc = failed to pull and unpack image "ghcr.io/moelove/non-exist:latest": failed to resolve reference "ghcr.io/moelove/non-exist:latest": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden
- Warning Failed 2m21s (x4 over 3m59s) kubelet Error: ErrImagePull
- Warning Failed 2m9s (x6 over 3m58s) kubelet Error: ImagePullBackOff
- Normal BackOff 115s (x7 over 3m58s) kubelet Back-off pulling image "ghcr.io/moelove/non-exist"
我們可以發(fā)現(xiàn),這里的輸出和之前正確運行 Pod 的不一樣。最主要的區(qū)別在于 Age 列。這里我們看到了類似 115s (x7 over 3m58s) 這樣的輸出。
它的含義表示:該類型的 event 在 3m58s 中已經(jīng)發(fā)生了 7 次,最近的一次發(fā)生在 115s 之前
但是當我們?nèi)ブ苯?kubectl get events 的時候,我們并沒有看到有 7 次重復(fù)的 event 。這說明 Kubernetes 會自動將重復(fù)的 events 進行合并。
選擇最后一條 Events (方法前面內(nèi)容已經(jīng)講了) 并將其內(nèi)容使用 YAML 格式進行輸出:
- (MoeLove) ➜ kubectl -n moelove get events non-exist-d9ddbdd84-tnrhd.16c4fce570cfba46 -o yaml
- apiVersion: v1
- count: 43
- eventTime: null
- firstTimestamp: "2021-12-28T19:57:06Z"
- involvedObject:
- apiVersion: v1
- fieldPath: spec.containers{non-exist}
- kind: Pod
- name: non-exist-d9ddbdd84-tnrhd
- namespace: moelove
- resourceVersion: "333366"
- uid: 33045163-146e-4282-b559-fec19a189a10
- kind: Event
- lastTimestamp: "2021-12-28T18:07:14Z"
- message: Back-off pulling image "ghcr.io/moelove/non-exist"
- metadata:
- creationTimestamp: "2021-12-28T19:57:06Z"
- name: non-exist-d9ddbdd84-tnrhd.16c4fce570cfba46
- namespace: moelove
- resourceVersion: "334638"
- uid: 60708be0-23b9-481b-a290-dd208fed6d47
- reason: BackOff
- reportingComponent: ""
- reportingInstance: ""
- source:
- component: kubelet
- host: kind-worker3
- type: Normal
這里我們可以看到其字段中包括一個 count 字段,表示同類 event 發(fā)生了多少次。以及 firstTimestamp 和 lastTimestamp 分別表示了這個 event 首次出現(xiàn)了最近一次出現(xiàn)的時間。這樣也就解釋了前面的輸出中 events 持續(xù)的周期了。
徹底搞懂 Events
以下內(nèi)容是從 Events 中隨便選擇的一條,我們可以看到它包含的一些字段信息:
- apiVersion: v1
- count: 1
- eventTime: null
- firstTimestamp: "2021-12-28T19:31:13Z"
- involvedObject:
- apiVersion: apps/v1
- kind: ReplicaSet
- name: redis-687967dbc5
- namespace: moelove
- resourceVersion: "330227"
- uid: 11e98a9d-9062-4ccb-92cb-f51cc74d4c1d
- kind: Event
- lastTimestamp: "2021-12-28T19:31:13Z"
- message: 'Created pod: redis-687967dbc5-27vmr'
- metadata:
- creationTimestamp: "2021-12-28T19:31:13Z"
- name: redis-687967dbc5.16c4fb7bde6b54c4
- namespace: moelove
- resourceVersion: "330231"
- uid: 8e37ec1e-b3a1-420c-96d4-3b3b2995c300
- reason: SuccessfulCreate
- reportingComponent: ""
- reportingInstance: ""
- source:
- component: replicaset-controller
- type: Normal
其中主要字段的含義如下:
- count: 表示當前同類的事件發(fā)生了多少次 (前面已經(jīng)介紹)
- involvedObject: 與此 event 有直接關(guān)聯(lián)的資源對象 (前面已經(jīng)介紹) , 結(jié)構(gòu)如下:
- type ObjectReference struct {
- Kind string
- Namespace string
- Name string
- UID types.UID
- APIVersion string
- ResourceVersion string
- FieldPath string
- }
- source: 直接關(guān)聯(lián)的組件, 結(jié)構(gòu)如下:
- type EventSource struct {
- Component string
- Host string
- }
- reason: 簡單的總結(jié)(或者一個固定的代碼),比較適合用于做篩選條件,主要是為了讓機器可讀,當前有超過 50 種這樣的代碼;
- message: 給一個更易讓人讀懂的詳細說明
- type: 當前只有 Normal 和 Warning 兩種類型, 源碼中也分別寫了其含義:
- // staging/src/k8s.io/api/core/v1/types.go
- const (
- // Information only and will not cause any problems
- EventTypeNormal string = "Normal"
- // These events are to warn that something might go wrong
- EventTypeWarning string = "Warning"
- )
所以,當我們將這些 Events 都作為 tracing 的 span 采集回來后,就可以按照其 source 進行分類,按 involvedObject 進行關(guān)聯(lián),按時間進行排序了。
總結(jié)
在這篇文章中,我主要通過兩個示例,一個正確部署的 Deploy,以及一個使用不存在鏡像部署的 Deploy,深入的介紹了 Events 對象的實際的作用及其各個字段的含義。
對于 Kubernetes 而言,Events 中包含了很多有用的信息,但是這些信息卻并不會對 Kubernetes 造成什么影響,它們也并不是實際的 Kubernetes 的日志。默認情況下 Kubernetes 中的日志在 1 小時后就會被清理掉,以便釋放對 etcd 的資源占用。
所以為了能更好的讓集群管理員知道發(fā)生了什么,在生產(chǎn)環(huán)境中,我們通常會把 Kubernetes 集群的 events 也給采集回來。我個人比較推薦的工具是:https://github.com/opsgenie/kubernetes-event-exporter
本文轉(zhuǎn)載自微信公眾號「MoeLove」,可以通過以下二維碼關(guān)注。轉(zhuǎn)載本文請聯(lián)系MoeLove公眾號。