探究網(wǎng)絡(luò)延遲對事務(wù)的影響
1.背景概述
最近在做數(shù)據(jù)同步測試,需要通過DTS將kafka中的數(shù)據(jù)同步到數(shù)據(jù)庫中,4G的數(shù)據(jù)量同步到數(shù)據(jù)庫用了大約4個(gè)多小時(shí),這看起來并不合理;此時(shí)查看數(shù)據(jù)庫所在主機(jī)的CPU,IO的使用率都不高,沒有瓶頸;最后通過排查發(fā)現(xiàn)由于kafka,DTS,數(shù)據(jù)庫不再同一個(gè)機(jī)房,網(wǎng)絡(luò)延遲較大,導(dǎo)致同步速率緩慢;
將kafka,DTS,數(shù)據(jù)庫部署到同一個(gè)機(jī)房后,同步速度明顯提升,只需要15分鐘就能同步完。
2.問題復(fù)現(xiàn)
本次測試通過sysbench在不同網(wǎng)絡(luò)延遲的情況下,進(jìn)行數(shù)據(jù)寫入及性能壓測,對比網(wǎng)絡(luò)延遲對數(shù)據(jù)庫事務(wù)的影響。
2.1 查看當(dāng)前網(wǎng)絡(luò)延遲
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=0.299 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=0.297 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=0.329 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=0.263 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=0.367 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=0.237 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=0.160 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=0.180 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=0.257 ms
當(dāng)前2臺主機(jī)在同一個(gè)機(jī)房,網(wǎng)絡(luò)延遲大約在 0.3ms 左右
2.2 (正常延遲)通過sysbench寫入數(shù)據(jù)
2.2.1 創(chuàng)建一張表寫入500W條數(shù)據(jù)
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real1m56.459s
user0m7.187s
sys0m0.400s
寫入 500w 數(shù)據(jù)量耗時(shí) 1m56s
2.2.2 sysbench 壓測3分鐘
SQL statistics:
queries performed:
read: 1711374
write: 488964
other: 244482
total: 2444820
transactions: 122241 (407.37 per sec.)
queries: 2444820 (8147.45 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 407.3725
time elapsed: 300.0718s
total number of events: 122241
Latency (ms):
min: 10.68
avg: 122.72
max: 1267.88
95th percentile: 502.20
sum: 15000894.94
Threads fairness:
events (avg/stddev): 2444.8200/14.99
execution time (avg/stddev): 300.0179/0.02
可以看到 TPS:407.37 QPS:8147.45
2.3通過tc命令模擬網(wǎng)絡(luò)延遲
tc命令是Linux系統(tǒng)中的一個(gè)網(wǎng)絡(luò)管理工具,用于配置和管理網(wǎng)絡(luò)流量控制。它可以用來限制網(wǎng)絡(luò)帶寬、延遲、丟包等,以及實(shí)現(xiàn)QoS(Quality of Service)等功能。
# 對ens3網(wǎng)卡進(jìn)行延遲設(shè)置,設(shè)置延遲為10ms
tc qdisc add dev ens3 root netem delay 10ms
如果在使用tc命令時(shí)報(bào)錯(cuò)如下錯(cuò)誤,可以升級一下內(nèi)核模塊
# 報(bào)錯(cuò)
tc qdisc add dev ens3 root netem delay 10ms
Error: Specified qdisc not found.
# 升級
$ yum install kernel-modules-extra*
# 重啟主機(jī)
$ reboot
2.4查看當(dāng)前網(wǎng)絡(luò)延遲
$ ping 192.168.137.162
PING 192.168.137.162 (192.168.137.162) 56(84) bytes of data.
64 bytes from 192.168.137.162: icmp_seq=1 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=2 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=3 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=4 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=5 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=6 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=7 ttl=64 time=10.4 ms
64 bytes from 192.168.137.162: icmp_seq=8 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=9 ttl=64 time=10.5 ms
64 bytes from 192.168.137.162: icmp_seq=10 ttl=64 time=10.2 ms
2.5 (延遲10ms)通過sysbench寫入數(shù)據(jù)
2.5.1 創(chuàng)建一張表寫入500W條數(shù)據(jù)
$ time sysbench lua/oltp_read_write.lua --mysql-db=sysbench --mysql-host=192.168.137.162 --mysql-port=3307 --mysql-user=root --mysql-password=greatdb --tables=1 --table_size=5000000 --report-interval=2 --threads=10 --time=600 --mysql-ignore-errors=all prepare
sysbench 1.1.0-df89d34 (using bundled LuaJIT 2.1.0-beta3)
Initializing worker threads...
Creating table 'sbtest1'...
Inserting 5000000 records into 'sbtest1'
Creating a secondary index on 'sbtest1'...
real2m11.656s
user0m7.314s
sys0m0.470s
寫入 500w 數(shù)據(jù)量耗時(shí) 2m11s
2.5.2 sysbench 壓測3分鐘
SQL statistics:
queries performed:
read: 788214
write: 225204
other: 112602
total: 1126020
transactions: 56301 (187.41 per sec.)
queries: 1126020 (3748.16 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
Throughput:
events/s (eps): 187.4079
time elapsed: 300.4196s
total number of events: 56301
Latency (ms):
min: 210.14
avg: 266.68
max: 493.91
95th percentile: 419.45
sum: 15014235.80
Threads fairness:
events (avg/stddev): 1126.0200/1.16
execution time (avg/stddev): 300.2847/0.16
可以看到 TPS:187.41 QPS:3748.16
3.總結(jié)
通過上面的測試可以看出網(wǎng)絡(luò)延遲較大時(shí),對數(shù)據(jù)的寫入及每秒執(zhí)行的事務(wù)數(shù)都有較大影響;如果需要做性能測試及數(shù)據(jù)同步,盡量將壓測工具或同步工具部署在同一個(gè)機(jī)房,避免網(wǎng)絡(luò)延遲較大,對測試結(jié)果有影響。