自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

聊聊不同版本 Kafka Producer 的分區(qū)策略

開(kāi)發(fā) 前端
某微服務(wù),使用 log4j kafka appender 寫日志到 kafka 中。業(yè)務(wù)同學(xué)反饋,微服務(wù)日志中有頻繁出現(xiàn)類似如下的kafka連接斷開(kāi)的日志,希望分析下kafka連接斷開(kāi)的原因,并確認(rèn)是否會(huì)因此丟失微服務(wù)日志。

1 問(wèn)題現(xiàn)象

某微服務(wù),使用 log4j kafka appender 寫日志到 kafka 中。業(yè)務(wù)同學(xué)反饋,微服務(wù)日志中有頻繁出現(xiàn)類似如下的kafka連接斷開(kāi)的日志,希望分析下kafka連接斷開(kāi)的原因,并確認(rèn)是否會(huì)因此丟失微服務(wù)日志。

2024-08-08 07:10:06.994 -INFO [kafka-producer-network-thread | producer-1] org.apache.kafka.clients. NetworkClient -[Producer clientId=producer-1] Node 2 disconnected.)

kafka連接斷開(kāi)的具體日志如下圖所示:

圖片圖片

2 問(wèn)題分析

  • 由于kafka集群有多個(gè)broker節(jié)點(diǎn),kafka topic 也有多個(gè)分區(qū),所以微服務(wù)作為 kafka producer,在底層會(huì)建立對(duì)個(gè) tcp 連接,以發(fā)送日志到各個(gè) kafka topic partition。
  • 通過(guò) awk 篩選微服務(wù)日志可見(jiàn),微服務(wù)針對(duì)同一個(gè) kafka broker的tcp 連接,并不會(huì)頻繁斷開(kāi),且斷開(kāi)的時(shí)點(diǎn),一般是在業(yè)務(wù)低峰期,打印日志不多的時(shí)候:

圖片圖片

  • 為進(jìn)一步確認(rèn)問(wèn)題,在微服務(wù)節(jié)點(diǎn)上,可以查看下 TCP 連接的狀態(tài)(可以通過(guò)命令 netstat -apn -t|grep -i 9092 或 ss -antop |grep -i 9092),甚至通過(guò) tcpdump 抓包分析下(可以通過(guò)命令nohup tcpdump tcp port 9092 -i any -s100 -B 8192 -p -n -w /tmp/9092.pcap  >> /tmp/9092.out 2>&1 &)。
  • 在生產(chǎn)環(huán)境中,使用上述命令,抓了近2個(gè)小時(shí)的包,并對(duì) tcpdump 包文件進(jìn)行了分析,如下圖可見(jiàn),所有的tcp連接斷開(kāi),都是由微服務(wù)(即 kafka producer)主動(dòng)斷開(kāi)的,且具體來(lái)說(shuō),是微服務(wù)長(zhǎng)達(dá)540秒,都不需要發(fā)送日志到某個(gè)具體的kafka topic partition時(shí),因超時(shí)主動(dòng)斷開(kāi)的:

圖片圖片

image

圖片圖片

image

圖片圖片

3.問(wèn)題原因

3.1 kafka 對(duì)空閑連接檢查和清理機(jī)制-producer 參數(shù)connections.max.idle.ms

為減輕對(duì)kafka broker 服務(wù)端的壓力,kafka producer 有空閑連接的檢查和清理機(jī)制:

  • 當(dāng)某個(gè)tcp連接長(zhǎng)時(shí)間idle時(shí),kafka producer就會(huì)主動(dòng)關(guān)閉該空閑連接,并打印日志org.apache.kafka.clients. NetworkClient -[Producer clientId=producer-1] Node 2 disconnected;
  • 判斷空閑連接的空閑閾值,就是參數(shù) connections.max.idle.ms,該參數(shù)默認(rèn) 540000 即 9 分鐘,這也與上述tcpdump抓包體現(xiàn)的,kafka producer 在540秒的空閑后,主動(dòng)斷開(kāi)連接的現(xiàn)象一致:

圖片圖片

3.2 Kafka在key=null 時(shí)的分區(qū)策略- Sticky Partitioner vs Round Robin Partitioner

  • Kafka producer 寫ProducerRecord到kafka topic時(shí), ProducerRecord對(duì)應(yīng)的具體 partiton分區(qū),是由Partitioner分區(qū)器決定的。當(dāng) ProducerRecord 的key=null 時(shí),會(huì)采用默認(rèn)的分區(qū)器,在Kafka 2.3 及kafka2.3以下版本,默認(rèn)的分區(qū)策略是Round Robin Partitioner;而Kafka 2.4 及以上版本,默認(rèn)的分區(qū)策略是Sticky Partitioner。
  • 目前該微服務(wù),寫日志到 kafka 時(shí),log4j2.xml中都沒(méi)有配置 kafka record key,且采用的是kafka-clients-3.4.0.jar,所以此時(shí)會(huì)使用 Sticky Partitioner 而不是 Round Robin Partitioner,相鄰的多個(gè)ProducerRecord都會(huì)屬于同一個(gè) record batch,都會(huì) “sticky” 到同一個(gè)分區(qū),所以在日志低峰期,可能長(zhǎng)時(shí)間都不會(huì)發(fā)送日志到某些分區(qū),如果該時(shí)間超過(guò)了producer 參數(shù) connections.max.idle.ms(默認(rèn) 540000 即 9 分鐘),則該分區(qū)對(duì)應(yīng)的 TCP 連接,就會(huì)被 PRODUCER 主動(dòng)關(guān)閉,如果 kafka 分區(qū)較多比如100個(gè),而微服務(wù)日志較少且 batch.size 和 linger.ms 較大,該現(xiàn)象會(huì)更加明顯。

4 問(wèn)題總結(jié)

  • 為減輕對(duì)服務(wù)端kafka broker的壓力,kafka producer 有空閑連接的檢查和清理機(jī)制,當(dāng)某個(gè)tcp連接的idle時(shí)長(zhǎng)大于參數(shù)connections.max.idle.ms時(shí)(該參數(shù)默認(rèn) 540000 即 9 分鐘),kafka producer就會(huì)主動(dòng)關(guān)閉該空閑連接,并打印日志org.apache.kafka.clients. NetworkClient -[Producer clientId=producer-1] Node 2 disconnected,此時(shí)所有的日志都會(huì)被正常落地,不會(huì)丟失日志數(shù)據(jù);
  • 當(dāng)使用Kafka 2.4 及以上版本的 kafka producer時(shí),如果沒(méi)有指定ProducerRecord的key,當(dāng)業(yè)務(wù)低峰期微服務(wù)日志較少且 batch.size 和 linger.ms 較大時(shí),上述kafka producer主動(dòng)關(guān)閉空閑連接的現(xiàn)象會(huì)更加明顯,因?yàn)榇藭r(shí)會(huì)使用默認(rèn)的分區(qū)策略Sticky Partitioner,此時(shí)相鄰的多個(gè)ProducerRecord都會(huì)屬于同一個(gè) record batch,都會(huì) “sticky” 到同一個(gè)分區(qū),所以可能長(zhǎng)時(shí)間都不會(huì)發(fā)送日志到某些分區(qū),如果該空閑時(shí)間超過(guò)了上述producer 參數(shù) connections.max.idle.ms,則該分區(qū)對(duì)應(yīng)的 TCP 連接,就會(huì)被 PRODUCER 主動(dòng)關(guān)閉。

5.技術(shù)背景

5.1 KAFKA 客戶端和服務(wù)端在版本上的雙向兼容性

  • KAFKA 客戶端和服務(wù)端在版本上具有雙向兼容性,即客戶端和服務(wù)端的版本可以不同:Kafka has a "bidirectional" client compatibility policy. In other words, new clients can talk to old servers, and old clients can talk to new servers. This allows users to upgrade either clients or servers without experiencing any downtime.
  • 在上述案例中,kafka 集群服務(wù)端,使用的是kafka_2.11-2.2.0.jar;
  • 在上述案例中,該微服務(wù),通過(guò)logj寫日志到kafka時(shí),使用的 kafka producer 版本是 kafka-clients-3.4.0.jar
  • 目前 apache kafka 最新版是 Kafka 3.8 (截止 202409);

圖片圖片

5.2 kafka 對(duì)空閑連接檢查和清理機(jī)制-producer 參數(shù)connections.max.idle.ms

為減輕對(duì)kafka broker 服務(wù)端的壓力,kafka producer 有空閑連接的檢查和清理機(jī)制:

  • 當(dāng)某個(gè)tcp連接長(zhǎng)時(shí)間idle時(shí),kafka producer就會(huì)主動(dòng)關(guān)閉該空閑連接,并打印日志org.apache.kafka.clients. NetworkClient -[Producer clientId=producer-1] Node 2 disconnected;
  • 判斷空閑連接的空閑閾值,就是參數(shù) connections.max.idle.ms,該參數(shù)默認(rèn) 540000 即 9 分鐘,這也與上述tcpdump抓包體現(xiàn)的,kafka producer 在540秒的空閑后,主動(dòng)斷開(kāi)連接的現(xiàn)象一致;

5.3 Kafka在 key=null 時(shí)的分區(qū)策略- Sticky Partitioner vs Round Robin Partitioner

  • Kafka producer 寫ProducerRecord到kafka topic時(shí), ProducerRecord對(duì)應(yīng)的具體 partiton分區(qū),是由Partitioner分區(qū)器決定的;
  • 當(dāng) ProducerRecord 的key=null 時(shí),會(huì)采用默認(rèn)的分區(qū)器,在Kafka 2.3 及kafka2.3以下版本,默認(rèn)的分區(qū)策略是Round Robin Partitioner;而Kafka 2.4 及以上版本,默認(rèn)的分區(qū)策略是Sticky Partitioner。
  • When key=null, the producer has a default partitioner that varies: Round Robin: for Kafka 2.3 and below;Sticky Partitioner: for Kafka 2.4 and above:

With Kafka producer <= v2.3: when there’s no partition and no key specified, the default partitioner sends data in a round-robin fashion. This results in more batches (one batch per partition) and smaller batches (imagine with 100 partitions). And this is a problem because smaller batches lead to more requests as well as higher latency.

With Kafka producer >= v2.4: Sticky Partitioner improves the performance of the producer especially with high throughput. It is a performance goal to have all the records sent to a single partition and not multiple partitions to improve batching. The producer sticky partitioner will “stick” to a partition until the batch is full or linger.ms has elapsed; After sending the batch, kafka producer will change the partition that is "sticky". This will lead to larger batches and reduced latency (because we have larger requests, and the batch.size is more likely to be reached). Over time, the records are still spread evenly across partitions, so the balance of the cluster is not affected.

5.4 相關(guān)源碼與參考鏈接

-- 相關(guān)源碼
org.apache.kafka.clients.producer.Partitioner
org.apache.kafka.clients.producer.internals.DefaultPartitioner
org.apache.kafka.clients.producer.UniformStickyPartitioner
org.apache.kafka.clients.producer.RoundRobinPartitioner
-- 參考連接:
KIP-480: Sticky Partitioner
KIP-794: Strictly Uniform Sticky Partitioner
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner


責(zé)任編輯:武曉燕 來(lái)源: 明哥的IT隨筆
相關(guān)推薦

2024-10-10 17:17:57

2022-05-05 10:00:53

Kafka分區(qū)分配Linux

2024-02-27 08:05:32

Flink分區(qū)機(jī)制數(shù)據(jù)傳輸

2023-11-21 08:11:48

Kafka的分區(qū)策略

2024-05-11 11:18:21

Kafka監(jiān)控框架

2024-08-26 13:23:26

2021-07-30 07:28:15

Kafka消息引擎

2022-11-07 08:01:18

Git分支管理

2024-10-06 12:56:36

Golang策略設(shè)計(jì)模式

2019-06-03 09:00:25

Kubernetes部署金絲雀版本

2022-03-29 15:10:22

架構(gòu)設(shè)計(jì)模型

2021-12-27 08:22:18

Kafka消費(fèi)模型

2021-12-28 12:01:59

Kafka 消費(fèi)者機(jī)制

2022-01-06 07:18:18

Kafka選舉Leader

2022-12-28 09:17:53

ApacheZookeeperProposals

2021-01-20 08:07:52

oracle分區(qū)單表

2020-02-26 15:12:43

線程池增長(zhǎng)回收

2021-05-14 08:33:02

Flink策略源碼

2022-01-04 11:31:15

不同路徑DP

2023-05-26 18:52:55

點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號(hào)