中國(guó)移動(dòng)網(wǎng)站控件引發(fā)的藍(lán)屏問(wèn)題分析
本周四(2月23日),我接到了我們同事的一個(gè)奇怪的藍(lán)屏case,據(jù)他回憶,他最近沒(méi)有安裝任何軟件和驅(qū)動(dòng),也沒(méi)有更改計(jì)算機(jī)的硬件配置,除了Windows后臺(tái)進(jìn)行的自動(dòng)更新之外,他實(shí)在想不起來(lái)到底對(duì)計(jì)算機(jī)有什么額外的改變??墒峭蝗唬蛷那耙惶?3日周三晚上起,他的計(jì)算機(jī)就開始藍(lán)屏,重啟之后,進(jìn)系統(tǒng)之前就會(huì)藍(lán)屏,或者進(jìn)了系統(tǒng)用不到一會(huì)兒也會(huì)藍(lán)屏。因此,他懷疑是硬件(如內(nèi)存)故障導(dǎo)致的,或者是 Windows Update 導(dǎo)致的問(wèn)題。
照說(shuō),例如內(nèi)存條松動(dòng)的這種突發(fā)硬件故障的確有可能引發(fā)藍(lán)屏,但是由于 Windows Update 推送的補(bǔ)丁缺陷導(dǎo)致的藍(lán)屏可實(shí)屬少見,在排查藍(lán)屏問(wèn)題時(shí),我們一般應(yīng)該遵從默認(rèn)信任微軟自身組件的原則。
據(jù)了解,他的藍(lán)屏一般有幾個(gè)隨機(jī)的錯(cuò)誤代碼,查詢 Debugging Help 之后,得到的解釋如下:
藍(lán)屏代碼 (Bug Check) | 關(guān)鍵參數(shù) (Parameter) | 說(shuō)明 |
PFN_LIST_CORRUPT (0x4E) | This is typically caused by a driver passing a bad memory descriptor list. For example, the driver might have called MmUnlockPages twice with the same list. Stack trace examination is needed. | |
MEMORY_MANAGEMENT (0x1A) | P1: 0x41287 | Internal memory management structures are corrupted. To further investigate the cause, a kernel memory dump file is needed. |
NTFS_FILE_SYSTEM (0x24) | One possible cause of this bug check is disk corruption. Corruption in the NTFS file system or bad blocks (sectors) on the hard disk can induce this error. Corrupted SCSI and IDE drivers can also adversely affect the system's ability to read and write to disk, thus causing the error. Another possible cause is depletion of nonpaged pool memory. If the nonpaged pool memory is completely depleted, this error can stop the system. However, during the indexing process, if the amount of available nonpaged pool memory is very low, another kernel-mode driver requiring nonpaged pool memory can also trigger this error. | |
SYSTEM_SERVICE_EXCEPTION (0x3B) | This error has been linked to excessive paged pool usage and may occur due to user-mode graphics drivers crossing over and passing bad data to the kernel code. | |
SYSTEM_THREAD_EXCEPTION _NOT_HANDLED_M (1000007e) |
This indicates that a system thread generated an exception which the error handler did not catch. |
在以上錯(cuò)誤中,前兩種出現(xiàn)的頻率最高。如果您百度一下以上錯(cuò)誤,已經(jīng)有足夠的理由開始拆下內(nèi)存條,擦拭金手指了。但是我個(gè)人認(rèn)為,這一定不是一個(gè)硬件產(chǎn)生的錯(cuò)誤。在我看來(lái),這樣的錯(cuò)誤看似隨機(jī),其實(shí)應(yīng)該有一種共性的可能性——系統(tǒng)中存在一個(gè)寫的很爛的驅(qū)動(dòng)。為什么這么講呢,我們可以從查到的描述中看見"bad""depletion""nonpaged pool"出現(xiàn)的頻率很高;另外值得注意的是,對(duì)于0x24 NTFS文件系統(tǒng)的 bug check,在很多時(shí)候容易產(chǎn)生磁盤損壞的誤導(dǎo),殊不知,它還有一種可能就是非換頁(yè)池耗盡,如上表中我加粗了的部分。
對(duì)于如此隨機(jī)的錯(cuò)誤,我們往往是無(wú)法通過(guò)分析棧去找到兇手的。例如,我在這里給出一個(gè)棧的示例:
- MEMORY_MANAGEMENT (1a)
- # Any other values for parameter 1 must be individually examined.
- Arguments:
- Arg1: 0000000000041287, The subtype of the bugcheck.
- Arg2: 0000000000000030
- Arg3: 0000000000000000
- Arg4: 0000000000000000
- Debugging Details:
- ------------------
- BUGCHECK_STR: 0x1a_41287
- DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
- PROCESS_NAME: WmiPrvSE.exe
- CURRENT_IRQL: 0
- TRAP_FRAME: fffff88007e6d6e0 -- (.trap 0xfffff88007e6d6e0)
- NOTE: The trap frame does not contain all registers.
- Some register values may be zeroed or incorrect.
- STACK_TEXT:
- fffff880`07e6d578 fffff800`02c62d7e : 00000000`0000001a 00000000`00041287 00000000`00000030 00000000`00000000 : nt!KeBugCheckEx
- fffff880`07e6d580 fffff800`02ccdd6e : 00000000`00000000 00000000`00000030 00000000`00000000 00000000`fffffa80 : nt! ?? ::FNODOBFM::`string'+0x46485
- fffff880`07e6d6e0 fffff800`02dadbc5 : 00000000`000af94a 00000000`00000000 ffffffff`ffffffff 00000000`01464000 : nt!KiPageFault+0x16e
- fffff880`07e6d870 fffff800`02d426b0 : fffffa80`098d5058 fffff6fd`4004c6a8 fffff800`02f055c0 fffff880`07e6db11 : nt!MiResolvePageFileFault+0x1115
- fffff880`07e6d9b0 fffff800`02cdea07 : 00000000`00000000 00000000`01440004 00000000`0240f3c4 fffff800`00000000 : nt! ?? ::FNODOBFM::`string'+0x399d4
- fffff880`07e6dac0 fffff800`02ccdd6e : 00000000`00000001 00000000`01440004 00000000`023ae701 00000000`00000460 : nt!MmAccessFault+0x1e47
- fffff880`07e6dc20 00000000`76b87222 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiPageFault+0x16e
- 00000000`0240f394 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x76b87222
可以看見,除了ntkrnlmp.exe里面的函數(shù),最前面發(fā)生的0x76b87222根本無(wú)法解析出來(lái)。dds 命令也是不能夠解析出具體名稱的。
那么,究竟該怎么樣才能找到問(wèn)題的元兇呢?
其實(shí),將隨機(jī)的藍(lán)屏錯(cuò)誤通過(guò)啟用特殊池來(lái)轉(zhuǎn)化為明顯的錯(cuò)誤是比較好的選擇。對(duì)于特殊池(special pool)的概念,我并不是第一次介紹了,關(guān)于這個(gè)神奇的特殊內(nèi)存區(qū)域的調(diào)試方法,請(qǐng)參見我早些時(shí)候的文章《啟用特殊池解讀 0x000000c5 藍(lán)屏》,或者其英文版《Enable "Special Pool" to Interpret 0x000000c5 Blue Screen》。
從安全模式啟動(dòng)系統(tǒng),啟動(dòng) verifier,配置啟用 special pool. 當(dāng)然,安全模式下,可能引發(fā)問(wèn)題的驅(qū)動(dòng)也許并未加載,因此,我們最好選擇"從一個(gè)列表選擇驅(qū)動(dòng)程序名",然后繼續(xù)選擇"將目前沒(méi)有加載的驅(qū)動(dòng)程序添加到列表中…",在彈出的選擇文件對(duì)話框中,瀏覽 %systemroot%\system32\derivers , 然后增加"版權(quán)"和"產(chǎn)品名稱"兩列文件屬性,并按照它們排序。選擇所有不是微軟的程序驅(qū)動(dòng),或者選擇沒(méi)有數(shù)字簽名/版權(quán)和產(chǎn)品信息不完整的看似不專業(yè)的驅(qū)動(dòng),添加進(jìn)來(lái)選中應(yīng)用特殊池即可。

這里需要說(shuō)明一下,其實(shí)特殊池的設(shè)置保存在注冊(cè)表之中,具體是在內(nèi)存管理器的分支里:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management
它靠 DWord 值 VerifyDriverLevel 和 String 值 VerifyDrivers 控制。有興趣大家可以去窺探一下 J
啟用特殊池之后,我們就重啟計(jì)算機(jī),正常進(jìn)入系統(tǒng)嘗試 repro 這個(gè)問(wèn)題。沒(méi)一會(huì)兒,還沒(méi)登錄果然就又藍(lán)了。這回直接進(jìn)入安全模式,獲得內(nèi)存轉(zhuǎn)儲(chǔ)文件進(jìn)行分析:
首先我們可以看見,特殊池生效了,而且成功進(jìn)行了內(nèi)存池分配:
- 4: kd> !verifier
- Verify Level 1 ... enabled options are:
- Special pool
- Summary of All Verifier Statistics
- RaiseIrqls 0x0
- AcquireSpinLocks 0x0
- Synch Executions 0x0
- Trims 0x0
- Pool Allocations Attempted 0x2
- Pool Allocations Succeeded 0x2
- Pool Allocations Succeeded SpecialPool 0x2
- Pool Allocations With NO TAG 0x0
- Pool Allocations Failed 0x0
- Resource Allocations Failed Deliberately 0x0
- Current paged pool allocations 0x0 for 00000000 bytes
- Peak paged pool allocations 0x0 for 00000000 bytes
- Current nonpaged pool allocations 0x0 for 00000000 bytes
- Peak nonpaged pool allocations 0x0 for 00000000 bytes
然后,我們可以直接看到問(wèn)題驅(qū)動(dòng)究竟是誰(shuí)了:
- DRIVER_VERIFIER_DETECTED_VIOLATION (c4)
- A device driver attempting to corrupt the system has been caught. This is
- because the driver was specified in the registry as being suspect (by the
- administrator) and the kernel has enabled substantial checking of this driver.
- If the driver attempts to corrupt the system, bugchecks 0xC4, 0xC1 and 0xA will
- be among the most commonly seen crashes.
- Arguments:
- Arg1: 00000000000000b2, MmMapLockedPages called on an MDL having incorrect flags.
- For example, calling MmMapLockedPages for an MDL
- that is already mapped to a system address is incorrect.
- Arg2: fffffa800a4e71b0, MDL address.
- Arg3: 0000000000000005, MDL flags.
- Arg4: 0000000000000005, Incorrect MDL flags.
- Debugging Details:
- ------------------
- BUGCHECK_STR: 0xc4_b2
- DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT
- PROCESS_NAME: System
- CURRENT_IRQL: 0
- LAST_CONTROL_TRANSFER: from fffff8000311f3dc to fffff80002c95c40
- STACK_TEXT:
- fffff880`033697e8 fffff800`0311f3dc : 00000000`000000c4 00000000`000000b2 fffffa80`0a4e71b0 00000000`00000005 : nt!KeBugCheckEx
- fffff880`033697f0 fffff800`0311ffb3 : fffff880`05926f60 fffff880`05926f60 fffffa80`069ce700 fffff800`0312e09a : nt!VerifierBugCheckIfAppropriate+0x3c
- fffff880`03369830 fffff800`031327bb : fffffa80`0a4e71b0 fffffa80`09b69000 fffffa80`0a4e71b0 fffff880`05926f60 : nt!ViMmMapLockedPagesSanityChecks+0xa3
- fffff880`03369870 fffff880`06220009 : fffffa80`0a4e72c0 ffffffff`8000069c fffffa80`0a4e72c0 00000000`00000000 : nt!VerifierMmMapLockedPages+0x1b
- fffff880`033698b0 fffff880`0624c93a : fffff880`03369970 fffff880`05926f60 fffffa80`00000032 00000000`0000001c : PassGuard_x64!distorm_version+0x6809
- fffff880`033698f0 fffff880`03369970 : fffff880`05926f60 fffffa80`00000032 00000000`0000001c fffffa80`06768f30 : PassGuard_x64!distorm_version+0x3313a
- fffff880`033698f8 fffff880`05926f5f : fffffa80`00000032 00000000`0000001c fffffa80`06768f30 00000000`00000200 : 0xfffff880`03369970
- fffff880`03369900 fffffa80`00000032 : 00000000`0000001c fffffa80`06768f30 00000000`00000200 00000000`00000000 : usbhub!UsbhSyncSendCommand+0x327
- fffff880`03369908 00000000`0000001c : fffffa80`06768f30 00000000`00000200 00000000`00000000 fffff880`06232040 : 0xfffffa80`00000032
- fffff880`03369910 fffffa80`06768f30 : 00000000`00000200 00000000`00000000 fffff880`06232040 00000000`001e001c : 0x1c
- fffff880`03369918 00000000`00000200 : 00000000`00000000 fffff880`06232040 00000000`001e001c fffff880`062563f8 : 0xfffffa80`06768f30
- fffff880`03369920 00000000`00000000 : fffff880`06232040 00000000`001e001c fffff880`062563f8 00000000`00220020 : 0x200
- STACK_COMMAND: kb
- FOLLOWUP_IP:
- PassGuard_x64!distorm_version+6809
- fffff880`06220009 4889442428 mov qword ptr [rsp+28h],rax
- SYMBOL_STACK_INDEX: 4
- SYMBOL_NAME: PassGuard_x64!distorm_version+6809
- FOLLOWUP_NAME: MachineOwner
- MODULE_NAME: PassGuard_x64
- IMAGE_NAME: PassGuard_x64.sys
- DEBUG_FLR_IMAGE_TIMESTAMP: 4e2fb9f4
- FAILURE_BUCKET_ID: X64_0xc4_b2_VRF_PassGuard_x64!distorm_version+6809
- BUCKET_ID: X64_0xc4_b2_VRF_PassGuard_x64!distorm_version+6809
- Followup: MachineOwner
- ---------
- 4: kd> lmvm PassGuard_x64
- start end module name
- fffff880`06218000 fffff880`06261000 PassGuard_x64 (export symbols) PassGuard_x64.sys
- Loaded symbol image file: PassGuard_x64.sys
- Image path: \??\C:\windows\system32\drivers\PassGuard_x64.sys
- Image name: PassGuard_x64.sys
- Timestamp: Wed Jul 27 15:10:44 2011 (4E2FB9F4)
- CheckSum: 0004A5F0
- ImageSize: 00049000
- File version: 1.0.0.6
- Product version: 1.0.0.6
- File flags: 0 (Mask 17)
- File OS: 4 Unknown Win32
- File type: 1.0 App
- File date: 00000000.00000000
- Translations: 0804.04b0
- ProductName: SysEnter Application
- InternalName: SysEnter
- OriginalFilename: SysEnter.exe
- ProductVersion: 1, 0, 0, 6
- FileVersion: 1, 0, 0, 6
- FileDescription: SysEnter Application
- LegalCopyright: Copyright (C) 2011
好了,知道了這個(gè)叫 PassGuard_x64.sys 的驅(qū)動(dòng)是罪魁禍?zhǔn)字?,那我們就該移除它的啟?dòng)加載了。直接在安全模式打開注冊(cè)表編輯器,刪除HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services 下面的 PassGuard 整個(gè)鍵,當(dāng)然,你還需要找到 ControlSet001 / 002 下面的同樣的鍵刪除。這里,我順便把這個(gè) PassGuard 鍵的內(nèi)容展示出來(lái):
- [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\PassGuard]
- "Type"=dword:00000001
- "Start"=dword:00000002
- "ErrorControl"=dword:00000001
- "ImagePath"=hex(2):5c,00,3f,00,3f,00,5c,00,43,00,3a,00,5c,00,77,00,69,00,6e,00,\
- 64,00,6f,00,77,00,73,00,5c,00,73,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,\
- 00,5c,00,64,00,72,00,69,00,76,00,65,00,72,00,73,00,5c,00,50,00,61,00,73,00,\
- 73,00,47,00,75,00,61,00,72,00,64,00,5f,00,78,00,36,00,34,00,2e,00,73,00,79,\
- 00,73,00,00,00
- "DisplayName"="PassGuard"
- "WOW64"=dword:00000001
- [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\PassGuard\Enum]
- "0"="Root\\LEGACY_PASSGUARD\\0000"
- "Count"=dword:00000001
- "NextInstance"=dword:00000001
刪除這個(gè)鍵,如果沒(méi)有其他的多重問(wèn)題,那么系統(tǒng)就能正常運(yùn)行了。在重啟進(jìn)入正常模式之前,記得刪除 verifier 中的特殊池配置。
為了進(jìn)一步刪除該驅(qū)動(dòng)所關(guān)聯(lián)的程序或者其他文件(如果有的話),我請(qǐng)這位同事仔細(xì)回憶這是什么。于是,我叫他回憶一下任何可能的情況,例如IE插件的變化、某些程序里面的捆綁、惡意程序等等。結(jié)果,他想起周三下午有去中國(guó)移動(dòng)的 10086.cn 充值話費(fèi),安裝了一個(gè)安全控件。
我們打開IE加載項(xiàng)管理器,選擇所有加載項(xiàng),果然發(fā)現(xiàn)了這一個(gè):

還好不是惡意軟件,而且可以通過(guò)下面的 remove 按鈕刪除關(guān)聯(lián)的控件文件。好了,不多說(shuō)了,看看中國(guó)移動(dòng)為64位操作系統(tǒng)寫的驅(qū)動(dòng)有多爛你就明白了。
從這個(gè)案例中,我想我們看到的不僅僅是特殊池的使用方法和排錯(cuò)思路,而且還應(yīng)看到現(xiàn)在很多信息服務(wù)公司所面臨的一個(gè)問(wèn)題,自己的產(chǎn)品的驅(qū)動(dòng)外包出去,而且承包開發(fā)驅(qū)動(dòng)的公司沒(méi)有足夠的驅(qū)動(dòng)撰寫經(jīng)驗(yàn)和規(guī)范,或者沒(méi)有經(jīng)過(guò)測(cè)試就投入使用,受損的不僅是客戶,更是這個(gè)服務(wù)公司自身的品牌。這樣的例子太多了,延伸起來(lái)不僅有 badly written driver, 還有 badly written software, badly written website… 鐵道部12306網(wǎng)站說(shuō)多了就沒(méi)意思了,說(shuō)句實(shí)話,現(xiàn)在中國(guó)聯(lián)通在營(yíng)業(yè)廳推行的銀行卡繳費(fèi)機(jī)我真是不敢用。想想要把銀行卡插進(jìn)聯(lián)通外包商開發(fā)的機(jī)器中去,還要輸入銀行卡密碼我就覺(jué)得膽戰(zhàn)心驚.
原文:http://www.cnblogs.com/mvperic/archive/2012/02/26/2369225.html
【編輯推薦】