Fio壓測工具和io隊列深度理解和誤區(qū)
隨著塊設備的發(fā)展,特別是SSD盤的出現(xiàn),設備的并行度越來越高。利用好這些設備,有個訣竅就是提高設備的iodepth, 一把喂給設備更多的IO請求,讓電梯算法和設備有機會來安排合并以及內部并行處理,提高總體效率。
應用使用IO通常有二種方式:同步和異步。 同步的IO一次只能發(fā)出一個IO請求,等待內核完成才返回,這樣對于單個線程iodepth總是小于1,但是可以透過多個線程并發(fā)執(zhí)行來解決,通常我們會用16-32根線程同時工作把iodepth塞滿。 異步的話就是用類似libaio這樣的linux native aio一次提交一批,然后等待一批的完成,減少交互的次數(shù),會更有效率。
io隊列深度通常對不同的設備很敏感,那么如何用fio來探測出合理的值呢?
讓我們先來看下和iodepth相關的參數(shù):
iodepth=int
Number of I/O units to keep in flight against the file. Note that increasing iodepth beyond 1 will not affect synchronous ioengines
(except for small degress when verify_async is in use). Even async engines my impose OS restrictions causing the desired depth not to be
achieved. This may happen on Linux when using libaio and not setting direct=1, since buffered IO is not async on that OS. Keep an eye on
the IO depth distribution in the fio output to verify that the achieved depth is as expected. Default:
1.
iodepth_batch=int
Number of I/Os to submit at once. Default: iodepth.
iodepth_batch_complete=int
This defines how many pieces of IO to retrieve at once. It defaults to 1 which
means that we’ll ask for a minimum of 1 IO in the retrieval process from the kernel. The IO retrieval will go on until we hit the limit
set by iodepth_low. If this variable is set to 0, then fio will always check for completed events before queuing more IO. This helps
reduce IO latency, at the cost of more retrieval system calls.
iodepth_low=int
Low watermark indicating when to start filling the queue again. Default: iodepth.
direct=bool
If true, use non-buffered I/O (usually O_DIRECT). Default: false.
fsync=int
How many I/Os to perform before issuing an fsync(2) of dirty data. If 0, don’t sync. Default: 0.
這幾個參數(shù)在libaio的引擎下的作用,文檔寫的挺明白,但容我再羅嗦下IO請求的流程:
libaio引擎會用這個iodepth值來調用io_setup準備個可以一次提交iodepth個IO的上下文,同時申請個io請求隊列用于保持IO。 在壓測進行的時候,系統(tǒng)會生成特定的IO請求,往io請求隊列里面扔,當隊列里面的IO個數(shù)達到iodepth_batch值的時候,就調用io_submit批次提交請求,然后開始調用io_getevents開始收割已經完成的IO。 每次收割多少呢?由于收割的時候,超時時間設置為0,所以有多少已完成就算多少,最多可以收割iodepth_batch_complete值個。隨著收割,IO隊列里面的IO數(shù)就少了,那么需要補充新的IO。 什么時候補充呢?當IO數(shù)目降到iodepth_low值的時候,就重新填充,保證OS可以看到至少iodepth_low數(shù)目的io在電梯口排隊著。
注意:這些參數(shù)在文檔里面描述的有點小問題,比如說默認值什么的是不太對的,所以我的建議是這些參數(shù)要去顯示的寫。
如何確認fio安裝我們的配置在工作呢? fio提高了診斷辦法 --debug=io ,我們來演示下:
# cat nvdisk-test [global] bs=512 ioengine=libaio userspace_reap rw=randrw rwmixwrite=20 time_based runtime=180 direct=1 group_reporting randrepeat=0 norandommap ramp_time=6 iodepth=16 iodepth_batch=8 iodepth_low=8 iodepth_batch_complete=8 exitall [test] filename=/dev/nvdisk0 numjobs=1
fio任務配置里面有幾個點需要非常注意:
1. libaio工作的時候需要文件direct方式打開。
2. 塊大小必須是扇區(qū)的倍數(shù)。
3. userspace_reap提高異步IO收割的速度。
4. ramp_time的作用是減少日志對高速IO的影響。
5. 只要開了direct,fsync就不會發(fā)生。
# fio nvdisk-test --debug=io fio: set debug option io io 22441 load ioengine libaio io 22441 load ioengine libaio test: (g=0): rw=randrw, bs=512-512/512-512, ioengine=libaio, iodepth=16 fio 2.0.5 Starting 1 process io 22444 invalidate cache /dev/nvdisk0: 0/8589926400 io 22444 fill_io_u: io_u 0x6d3210: off=3694285312/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d3210: off=3694285312/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d3210)=0 io 22444 queue: io_u 0x6d3210: off=3694285312/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d2f80: off=4595993600/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d2f80: off=4595993600/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d2f80)=0 io 22444 queue: io_u 0x6d2f80: off=4595993600/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d2cb0: off=3825244160/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d2cb0: off=3825244160/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d2cb0)=0 io 22444 queue: io_u 0x6d2cb0: off=3825244160/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d29a0: off=6994864640/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d29a0: off=6994864640/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d29a0)=0 io 22444 queue: io_u 0x6d29a0: off=6994864640/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d2710: off=2572593664/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d2710: off=2572593664/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d2710)=0 io 22444 queue: io_u 0x6d2710: off=2572593664/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d2400: off=3267822080/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d2400: off=3267822080/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d2400)=0 io 22444 queue: io_u 0x6d2400: off=3267822080/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d2130: off=7099489280/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d2130: off=7099489280/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d2130)=0 io 22444 queue: io_u 0x6d2130: off=7099489280/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6d1ea0: off=7682447872/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d1ea0: off=7682447872/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d1ea0)=0 io 22444 queue: io_u 0x6d1ea0: off=7682447872/len=512/ddir=0//dev/nvdisk0 io 22444 calling ->commit(), depth 8 io 22444 fill_io_u: io_u 0x6d1b90: off=5983331840/len=512/ddir=0//dev/nvdisk0 io 22444 prep: io_u 0x6d1b90: off=5983331840/len=512/ddir=0//dev/nvdisk0 io 22444 ->prep(0x6d1b90)=0 io 22444 queue: io_u 0x6d1b90: off=5983331840/len=512/ddir=0//dev/nvdisk0 io 22444 fill_io_u: io_u 0x6cdfa0: off=6449852928/len=512/ddir=0//dev/nvdisk0 ...
我們可以看到詳細的IO工作過程,這個方法不需要對OS非常的熟悉,比較實用。
還有個方法就是透過strace來跟蹤系統(tǒng)調用的情況, 更直觀點。
# pstree -p init(1)─┬─agent_eagleye(22296) ├─screen(13490)─┬─bash(18324)─┬─emacs(19429) │ │ ├─emacs(20365) │ │ ├─emacs(21268) │ │ ├─fio(22452)─┬─fio(22454) │ │ │ └─{fio}(22453) │ │ └─man(20385)───sh(20386)───sh(20387)───less(20391) ├─sshd(1834)───sshd(13115)───bash(13117)───screen(13662) └─udevd(705)─┬─udevd(1438) └─udevd(1745 # strace -p 22454 ... io_submit(140534061244416, 8, {{(nil), 0, 1, 0, 3}, {(nil), 0, 0, 0, 3}, {(nil), 0, 0, 0, 3}, {(nil), 0, 0, 0, 3}, {(nil), 0, 0, 0, 3}, {(nil), 0, 1, 0, 3}, {(nil), 0, 1, 0, 3}, {(nil), 0, 0, 0, 3}}) = 8 io_getevents(140534061244416, 8, 8, {{(nil), 0x6d3210, 512, 0}, {(nil), 0x6d2f80, 512, 0}, {(nil), 0x6d2cb0, 512, 0}, {(nil), 0x6d29a0, 512, 0}, {(nil), 0x6d2710, 512, 0}, {(nil), 0x6d2400, 512, 0}, {(nil), 0x6d2130, 512, 0}, {(nil), 0x6d1ea0, 512, 0}}, NULL) = 8 ...
最后有效的一招就是用iostat -dx 1來確認你的iodepth是符合設備特性的。
通過這些方法確認你的配置是對的,之后分析出來的數(shù)據才會有意義。
【編輯推薦】