內(nèi)核調(diào)測(cè)工具Kprobe之實(shí)踐篇
本文轉(zhuǎn)載自微信公眾號(hào)「人人都是極客」,作者布道師Peter。轉(zhuǎn)載本文請(qǐng)聯(lián)系人人都是極客公眾號(hào)。
Kprobe介紹
debug內(nèi)核函數(shù)變量的時(shí)候最常用的是添加log,用printk看下相關(guān)的信息,但是這種方式往往需要重新編譯內(nèi)核,然后再啟動(dòng)設(shè)備。
而Kprobe可以在運(yùn)行的內(nèi)核中動(dòng)態(tài)插入探測(cè)點(diǎn),執(zhí)行你預(yù)定義的操作。可以跟蹤內(nèi)核幾乎所有的代碼地址,并且當(dāng)斷點(diǎn)被擊中后會(huì)響應(yīng)處理函數(shù)。
使用kprobe最常用的就是查詢函數(shù)調(diào)用的參數(shù)和返回值。
目前,使用kprobe可以通過兩種方式:
- 第一種是開發(fā)人員自行編寫內(nèi)核模塊,向內(nèi)核注冊(cè)探測(cè)點(diǎn),探測(cè)函數(shù)可根據(jù)需要自行定制,使用靈活方便;
- 第二種方式是使用kprobes on trace,這種方式是kprobe和Ftrace結(jié)合使用,即可以通過kprobe來優(yōu)化Ftrace來跟蹤函數(shù)的調(diào)用。
編寫kprobe探測(cè)模塊
Kprobe結(jié)構(gòu)體與API介紹
- struct hlist_node hlist:被用于kprobe全局hash,索引值為被探測(cè)點(diǎn)的地址;
- struct list_head list:用于鏈接同一被探測(cè)點(diǎn)的不同探測(cè)kprobe;
- kprobe_opcode_t *addr:被探測(cè)點(diǎn)的地址;
- const char *symbol_name:被探測(cè)函數(shù)的名字;
- unsigned int offset:被探測(cè)點(diǎn)在函數(shù)內(nèi)部的偏移,用于探測(cè)函數(shù)內(nèi)部的指令,如果該值為0表示函數(shù)的入口;
- kprobe_pre_handler_t pre_handler:在被探測(cè)點(diǎn)指令執(zhí)行之前調(diào)用的回調(diào)函數(shù);
- kprobe_post_handler_t post_handler:在被探測(cè)指令執(zhí)行之后調(diào)用的回調(diào)函數(shù);
- kprobe_fault_handler_t fault_handler:在執(zhí)行pre_handler、post_handler或單步執(zhí)行被探測(cè)指令時(shí)出現(xiàn)內(nèi)存異常則會(huì)調(diào)用該回調(diào)函數(shù);
- kprobe_break_handler_t break_handler:在執(zhí)行某一kprobe過程中觸發(fā)了斷點(diǎn)指令后會(huì)調(diào)用該函數(shù),用于實(shí)現(xiàn)jprobe;
- kprobe_opcode_t opcode:保存的被探測(cè)點(diǎn)原始指令;
- struct arch_specific_insn ainsn:被復(fù)制的被探測(cè)點(diǎn)的原始指令,用于單步執(zhí)行,架構(gòu)強(qiáng)相關(guān)(可能包含指令模擬函數(shù));
- u32 flags:狀態(tài)標(biāo)記。
- int register_kprobe(struct kprobe *kp) //向內(nèi)核注冊(cè)kprobe探測(cè)點(diǎn)
- void unregister_kprobe(struct kprobe *kp) //卸載kprobe探測(cè)點(diǎn)
- int register_kprobes(struct kprobe **kps, int num) //注冊(cè)探測(cè)函數(shù)向量,包含多個(gè)探測(cè)點(diǎn)
- void unregister_kprobes(struct kprobe **kps, int num) //卸載探測(cè)函數(shù)向量,包含多個(gè)探測(cè)點(diǎn)
- int disable_kprobe(struct kprobe *kp) //臨時(shí)暫停指定探測(cè)點(diǎn)的探測(cè)
- int enable_kprobe(struct kprobe *kp) //恢復(fù)指定探測(cè)點(diǎn)的探測(cè)
用例kprobe_example.c分析與演示
linux內(nèi)核源碼中提供了kprobe的用例 samples/kprobes/kprobe_example.c
- /* For each probe you need to allocate a kprobe structure */
- static struct kprobe kp = {
- .symbol_name = "do_fork",
- };
- static int __init kprobe_init(void)
- {
- int ret;
- kp.pre_handler = handler_pre;
- kp.post_handler = handler_post;
- kp.fault_handler = handler_fault;
- ret = register_kprobe(&kp);
- if (ret < 0) {
- printk(KERN_INFO "register_kprobe failed, returned %d\n", ret);
- return ret;
- }
- printk(KERN_INFO "Planted kprobe at %p\n", kp.addr);
- return 0;
- }
- static void __exit kprobe_exit(void)
- {
- unregister_kprobe(&kp);
- printk(KERN_INFO "kprobe at %p unregistered\n", kp.addr);
- }
- module_init(kprobe_init)
- module_exit(kprobe_exit)
- MODULE_LICENSE("GPL");
程序中定義了一個(gè)struct kprobe結(jié)構(gòu)實(shí)例kp并初始化其中的symbol_name字段為“do_fork”,表明它將要探測(cè)do_fork函數(shù)。在模塊的初始化函數(shù)中,注冊(cè)了 pre_handler、post_handler和fault_handler這3個(gè)回調(diào)函數(shù)分別為handler_pre、handler_post和handler_fault,最后調(diào)用register_kprobe注冊(cè)。在模塊的卸載函數(shù)中調(diào)用unregister_kprobe函數(shù)卸載kp探測(cè)點(diǎn)。
- static int handler_pre(struct kprobe *p, struct pt_regs *regs)
- {
- ......
- #ifdef CONFIG_ARM64
- pr_info("<%s> pre_handler: p->addr = 0x%p, pc = 0x%lx,"
- " pstate = 0x%lx\n",
- p->symbol_name, p->addr, (long)regs->pc, (long)regs->pstate);
- #endif
- /* A dump_stack() here will give a stack backtrace */
- return 0;
- }
handler_pre回調(diào)函數(shù)的第一個(gè)入?yún)⑹亲?cè)的struct kprobe探測(cè)實(shí)例,第二個(gè)參數(shù)是保存的觸發(fā)斷點(diǎn)前的寄存器狀態(tài),它在do_fork函數(shù)被調(diào)用之前被調(diào)用,該函數(shù)僅僅是打印了被探測(cè)點(diǎn)的地址,保存的個(gè)別寄存器參數(shù)。
- static void handler_post(struct kprobe *p, struct pt_regs *regs,
- unsigned long flags)
- {
- ......
- #ifdef CONFIG_ARM64
- pr_info("<%s> post_handler: p->addr = 0x%p, pstate = 0x%lx\n",
- p->symbol_name, p->addr, (long)regs->pstate);
- #endif
- }
handler_post回調(diào)函數(shù)的前兩個(gè)入?yún)⑼琱andler_pre,第三個(gè)參數(shù)目前尚未使用,全部為0;該函數(shù)在do_fork函數(shù)調(diào)用之后被調(diào)用,這里打印的內(nèi)容同handler_pre類似。
- static int handler_fault(struct kprobe *p, struct pt_regs *regs, int trapnr)
- {
- pr_info("fault_handler: p->addr = 0x%p, trap #%dn", p->addr, trapnr);
- /* Return 0 because we don't handle the fault. */
- return 0;
- }
handler_fault回調(diào)函數(shù)會(huì)在執(zhí)行handler_pre、handler_post或單步執(zhí)行do_fork時(shí)出現(xiàn)錯(cuò)誤時(shí)調(diào)用,這里第三個(gè)參數(shù)時(shí)具體發(fā)生錯(cuò)誤的trap number,與架構(gòu)相關(guān)。
加載到內(nèi)核中后,隨便在終端上敲一個(gè)命令,可以看到dmesg中打印如下信息:
- <6>pre_handler: p->addr = 0xc0439cc0, ip = c0439cc1, flags = 0x246
- <6>post_handler: p->addr = 0xc0439cc0, flags = 0x246
- <6>pre_handler: p->addr = 0xc0439cc0, ip = c0439cc1, flags = 0x246
- <6>post_handler: p->addr = 0xc0439cc0, flags = 0x246
- <6>pre_handler: p->addr = 0xc0439cc0, ip = c0439cc1, flags = 0x246
- <6>post_handler: p->addr = 0xc0439cc0, flags = 0x246
可以看到被探測(cè)點(diǎn)的地址為0xc0439cc0,用以下命令確定這個(gè)地址就是do_fork的入口地址。
- echo 0 > /proc/sys/kernel/kptr_restrict
- cat /proc/kallsyms | grep do_fork
- c0439cc0 T do_fork
kprobes on trace
- /sys/kernel/debug/kprobes/list: 列出內(nèi)核中已經(jīng)設(shè)置kprobe斷點(diǎn)的函數(shù)
- /sys/kernel/debug/kprobes/enabled: kprobe開啟/關(guān)閉開關(guān)
- /sys/kernel/debug/kprobes/blacklist: kprobe黑名單(無法設(shè)置斷點(diǎn)函數(shù))
- /proc/sys/debug/kprobes-optimization: Turn kprobes optimization ON/OFF
Documentation/trace/kprobetrace.txt
使用前確定內(nèi)核CONFIG打開:CONFIG_KPROBE_EVENT=y
/sys/kernel/debug/tracing/kprobe_events:添加斷點(diǎn)接口
/sys/kernel/debug/tracing/events/kprobes/enabled:斷點(diǎn)使能開關(guān)
/sys/kernel/debug/tracing/trace:查看trace日志接口
規(guī)則:
- Synopsis of kprobe_events-------------------------
- p[:[GRP/]EVENT] [MOD:]SYM[+offs]|MEMADDR [FETCHARGS] : Set a probe
- r[:[GRP/]EVENT] [MOD:]SYM[+0] [FETCHARGS] : Set a return probe
- -:[GRP/]EVENT : Clear a probe
- GRP : Group name. If omitted, use "kprobes" for it.
- EVENT : Event name. If omitted, the event name is generated
- based on SYM+offs or MEMADDR.
- MOD : Module name which has given SYM.
- SYM[+offs] : Symbol+offset where the probe is inserted.
- MEMADDR : Address where the probe is inserted.
- FETCHARGS : Arguments. Each probe can have up to 128 args.
- %REG : Fetch register REG
- @ADDR : Fetch memory at ADDR (ADDR should be in kernel)
- @SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
- $stackN : Fetch Nth entry of stack (N >= 0)
- $stack : Fetch stack address.
- $retval : Fetch return value.(*)
- $comm : Fetch current task comm.
- +|-offs(FETCHARG) : Fetch memory at FETCHARG +|- offs address.(**)
- NAME=FETCHARG : Set NAME as the argument name of FETCHARG.
- FETCHARG:TYPE : Set TYPE as the type of FETCHARG. Currently, basic types
- (u8/u16/u32/u64/s8/s16/s32/s64), hexadecimal types
- (x8/x16/x32/x64), "string" and bitfield are supported.
- (*) only for return probe.
- (**) this is useful for fetching a field of data structures.
查看對(duì)應(yīng)的模塊:
- 130|mek_8q:/sys/kernel/debug/tracing # cat /proc/devices
- Character devices:
- 1 mem
- 4 /dev/vc/0
- 4 tty
- 4 ttyS
- 5 /dev/tty
- 5 /dev/console
- 5 /dev/ptmx
- 7 vcs
- 10 misc
- 13 input
- 29 fb
- 81 video4linux
- 89 i2c
- 90 mtd
- 108 ppp
- 116 alsa
可以在System.map文件里找一下有沒有你要觀察的內(nèi)核函數(shù)方法。這個(gè)文件其實(shí)相當(dāng)于內(nèi)核的符號(hào)表(symbol table)。如果拿不準(zhǔn)內(nèi)核方法名的時(shí)候可以在這里面grep一下看看。
- mek_8q:/ # cat /proc/kallsyms | grep do_sys_open
- 0000000000000000 T do_sys_open
以do_sys_open為例添加kprobe為例:
- 添加kprobe:
- echo 'p:myprobe do_sys_open' > /sys/kernel/debug/tracing/kprobe_events
- 添加kretprobe,返回值是數(shù)字:
- echo 'r:myretprobe do_sys_open $retval' > /sys/kernel/debug/tracing/kprobe_events
- 添加kretprobe,返回值是字符串:
- echo 'r:myprobe getname +0($retval):string' > /sys/kernel/debug/tracing/kprobe_events
- 刪除添加的kprobe:
- echo '-:myprobe' > /sys/kernel/debug/tracing/events/kprobe_events
執(zhí)行:
- cd /sys/kernel/debug/tracing
- echo 'p:myprobe do_sys_open' > kprobe_events
- echo 'r:myretprobe do_sys_open $retval' > kprobe_events
- echo 1 > tracing_on
- echo 1 > events/kprobes/myprobe/enable
結(jié)果為:
刪除注冊(cè)的kprobe:
- echo 0 > /sys/kernel/debug/tracing/events/kprobes/myprobe/enable
- echo 0 > /sys/kernel/debug/tracing/events/kprobes/myretprobe/enable
- echo '-:myprobe' > /sys/kernel/debug/tracing/events/kprobe_events
- echo '-:myretprobe' > /sys/kernel/debug/tracing/events/kprobe_events