處理一次系統(tǒng)假死工作紀(jì)實(shí)
本文轉(zhuǎn)載自微信公眾號(hào)「相遇Linux」,作者JeffXie 。轉(zhuǎn)載本文請聯(lián)系相遇Linux公眾號(hào)。
最近碰到客戶反饋一個(gè)問題,系統(tǒng)hang了,ssh登錄不上,但是可以ping通,通過串口登錄進(jìn)去之后,敲有些命令會(huì)卡住,查看cpu負(fù)載內(nèi)存都很正常,手動(dòng)觸發(fā)kdump之后看不出死鎖/softlockup/hardlockup等異常狀態(tài).而且重要的是已經(jīng)影響到客戶幾十億的業(yè)務(wù)了。
我能拿到的只有客戶提供的vmcore,為了生活而奔波的人兒趕緊分析起來:
系統(tǒng)(SUSE 11SP1)
#crash ./vmcore ./vmlinux-2.6.32.59-0.19-default
./vmlinux-2.6.32.59-0.19-default.gz
crash>foreach IN bt > inbt
任意找到一個(gè)IN(可中斷睡眠)的進(jìn)程 12813 查看堆棧:
- crash> bt 12813
- PID:12813 TASK: ffff880262eda140 CPU: 8 COMMAND: "sshd"
- #0 [ffff880f98e1ba58] schedule atffffffff8139d2a4
- #1 [ffff880f98e1bb10] schedule_timeout atffffffff8139d935
- #2 [ffff880f98e1bba0] unix_wait_for_peer atffffffff81376e27
- #3 [ffff880f98e1bc00] unix_dgram_sendmsg atffffffff813785c2
- #4 [ffff880f98e1bcb0] sock_sendmsg atffffffff812e8ddc
- #5 [ffff880f98e1be50] sys_sendto atffffffff812e9318
- #6 [ffff880f98e1bf80] system_call_fastpath atffffffff81002f7b
匯編unix_dgram_sendmsg()
- crash>dis unix_dgram_sendmsg
- ...
- 2410xffffffff813785b7 <unix_dgram_sendmsg+1143>: mov %r13,%rsi
- 2420xffffffff813785ba <unix_dgram_sendmsg+1146>: mov %rbp,%rdi (key1)
- rdi代表unix_wait_for_peer函數(shù)的第一個(gè)參數(shù)
- 2430xffffffff813785bd <unix_dgram_sendmsg+1149>: callq 0xffffffff81376da0 <unix_wait_for_peer>
進(jìn)程12813 unix_wait_for_peer() 函數(shù)的堆棧數(shù)據(jù):
- crash>rd -SS ffff880f98e1bba0 -e ffff880f98e1bc00
- ffff880f98e1bba0: unix_wait_for_peer+135 0000000000000001
- ffff880f98e1bbb0: [ffff880262eda140:task_struct]autoremove_wake_function
- ffff880f98e1bbc0: ffff88021ea8bbc0 ffff880238315bc0
- ffff880f98e1bbd0: memcpy_fromiovec+87 [ffff880238514980:UNIX]
- ffff880f98e1bbe0: [ffff880238514c30:UNIX] [ffff880238514980:UNIX]
- ffff880f98e1bbf0: [ffff880ff708b630:UNIX] 7fffffffffffffff
匯編unix_wait_for_peer()
- crash>dis unix_wait_for_peer
- 0xffffffff81376da0<unix_wait_for_peer>: sub $0x58,%rsp
- 0xffffffff81376da4<unix_wait_for_peer+4>: mov $0x1,%edx
- 0xffffffff81376da9<unix_wait_for_peer+9>: mov %r13,0x50(%rsp)
- 0xffffffff81376dae<unix_wait_for_peer+14>: lea 0x2b8(%rdi),%r13
- 0xffffffff81376db5<unix_wait_for_peer+21>: mov %rbx,0x38(%rsp)
- 0xffffffff81376dba<unix_wait_for_peer+26>: mov %gs:0xa580,%rax
- 0xffffffff81376dc3<unix_wait_for_peer+35>: mov %rax,0x8(%rsp)
- 0xffffffff81376dc8<unix_wait_for_peer+40>: lea 0x18(%rsp),%rax
- 0xffffffff81376dcd<unix_wait_for_peer+45>: mov %rbp,0x40(%rsp) (key2)
rbp壓入堆棧(0x58-0x40)處,正好是unix_wait_for_peer堆棧中ffff880f98e1bbe8處,即( [ffff880238514980:UNIX])
- 0xffffffff81376dd2<unix_wait_for_peer+50>: mov %rdi,%rbx
- 0xffffffff81376dd5<unix_wait_for_peer+53>: mov %rsi,%rbp
- 0xffffffff81376dd8<unix_wait_for_peer+56>: mov %r13,%rdi
查看unix_wait_for_peer()函數(shù)源碼:
- 1005static long unix_wait_for_peer(struct sock *other, long timeo) (key3)
- 1006{
- 1007 struct unix_sock *u = unix_sk(other);
- 1008 int sched;
- 1009 DEFINE_WAIT(wait);
- 1011 prepare_to_wait_exclusive(&u->peer_wait, &wait,TASK_INTERRUPTIBLE);
- ..
- 1019 if (sched)
- 1020 timeo =schedule_timeout(timeo);
- 1021
- 1022 finish_wait(&u->peer_wait,&wait);
- 1023 return timeo;
- 1024}
根據(jù)s(key1)(key2)(key3)可以知道12813進(jìn)程在等待unix_sock(ffff880238514980)
- crash>struct unix_sock ffff880238514980 | grep peer_wait -A 10
- peer_wait = {
- lock = {
- raw_lock = {
- slock = 3369322707
- }
- },
- task_list = {
- next = 0xffff880173427bc0,
- prev = 0xffff8801877a7bc0
- }
- }
查詢出這個(gè)unix_sock是系統(tǒng)中的/dev/log文件(由syslog-ng創(chuàng)建,系統(tǒng)中大量需要記錄log的進(jìn)程需要通過這個(gè)unix_sock與syslog-ng通信,可以參考mansyslog-ng 和/etc/syslog-ng/syslog-ng.conf 配置文件)
- crash>struct unix_sock ffff880238514980
- ...
- dentry= 0xffff880e7634b9c0,
- ...
- crash>struct dentry 0xffff880e7634b9c0
- ...
- name = 0xffff880e7634ba60 "log"
- ...
遍歷這個(gè)unix_sock的所有等待隊(duì)列,
說明有645個(gè)進(jìn)程正在等待這個(gè)unix_sock(ffff880238514980/dev/log)
- crash>list __wait_queue.task_list -s__wait_queue.private -H 0xffff880173427bc0 | grep -i private | wc -l
- 645
所有等待unix_sock的進(jìn)程重定向到一個(gè)wait_unix_sock_process文件
- crash>list __wait_queue.task_list -s__wait_queue.private -H 0xffff880173427bc0 | grep -i private > wait_unix_sock_process
查詢所有的sshd進(jìn)程
- crash>ps | grep sshd
- 7097 1 6 ffff880244cae5c0 IN 0.0 54088 1392 sshd
- 12813 7097 8 ffff880262eda140 IN 0.0 60148 2900 sshd
- 19055 7097 2 ffff880ff74c4200 IN 0.0 60148 2900 sshd
- 21642 7097 1 ffff880262fb4340 IN 0.0 60148 2904 sshd
- 21958 7097 1 ffff880ffab501c0 IN 0.0 60148 2900 sshd
- 22459 7097 6 ffff880ff1ed4100 IN 0.0 121052 10272 sshd
- 22476 22459 0 ffff88022d11a500 IN 0.0 121352 4552 sshd
- 23334 7097 2 ffff880174b04640 IN 0.0 60148 2900 sshd
sshd進(jìn)程大部分都在wait_unix_sock_process文件中,所以ssh登錄之后會(huì)卡住
查看syslog-ng進(jìn)程堆棧
- crash>bt 15232
- PID:15232 TASK: ffff8801aba90680 CPU: 6 COMMAND: "syslog-ng"
- #0 [ffff88021ea8ba58] schedule atffffffff8139d2a4
- #1 [ffff88021ea8bb10] schedule_timeout atffffffff8139d935
- #2 [ffff88021ea8bba0] unix_wait_for_peer atffffffff81376e27
- #3 [ffff88021ea8bc00] unix_dgram_sendmsg atffffffff813785c2
- #4 [ffff88021ea8bcb0] sock_sendmsg atffffffff812e8ddc
- #5 [ffff88021ea8be50] sys_sendto atffffffff812e9318
- #6 [ffff88021ea8bf80] system_call_fastpath atffffffff81002f7b
查看 unix_wait_for_peer()堆棧數(shù)據(jù):
- crash>rd -SS ffff88021ea8bba0 -e ffff88021ea8bc00
- ffff88021ea8bba0: unix_wait_for_peer+135 0000000000000001
- ffff88021ea8bbb0: [ffff8801aba90680:task_struct]autoremove_wake_function
- ffff88021ea8bbc0: ffff880fce8dfbc0 ffff880f98e1bbc0
- ffff88021ea8bbd0: memcpy_fromiovec+87 [ffff880238514980:UNIX]
- ffff88021ea8bbe0: [ffff880238514c30:UNIX][ffff880238514980:UNIX] (key4)
- ffff88021ea8bbf0: [ffff88104c1c3930:UNIX] 7fffffffffffffff
由(key4)可知syslog-ng也是在等待ffff880238514980 unix_sock
所以系統(tǒng)中大量的進(jìn)程,包括sshd cron等其他的645個(gè)進(jìn)程都在等待unix_sock(/dev/log)(/dev/log由syslog-ng創(chuàng)建),syslog-ng卡住在等這個(gè)unix_sock之后,系統(tǒng)中大量的進(jìn)程都會(huì)卡住,包括sshd cron進(jìn)程等. 但是syslog-ng為什么卡住從vmcore根本看不出來,很可能進(jìn)程在用戶層鎖住了.
通過查看客戶系統(tǒng)中最近安裝的軟件發(fā)現(xiàn),他們最近安裝了兩個(gè)定制軟件,卸載之后系統(tǒng)恢復(fù)正常。
- # /bin/rpm -qa --last
- CARKpam-11.01-1.19 Tue Jun 9 22:49:22 2020
- CARKaim-11.01-1.5 Tue Jun 9 22:38:48 2020