Android 12 疑難崩潰解決之路
UC內(nèi)核在Android 12上發(fā)現(xiàn)一個(gè)致命的崩潰。約有10%的用戶在冷啟動(dòng)的時(shí)候會(huì)遇到這個(gè)問題,嚴(yán)重影響了UC內(nèi)核的發(fā)布。它的調(diào)用棧是這樣的:
10-12 19:03:21.461 1038 2723 I id.AlipayGphon: Rejecting re-init on previously-failed class java.lang.Class<com.uc.webkit.impl.WebViewChromiumFactoryProvider>: java.lang.VerifyError: Verifier rejected class com.uc.webkit.impl.WebViewChromiumFactoryProvider: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g() failed to verify: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g(): [0x15] can't resolve returned type 'Unresolved Reference: com.uc.webkit.an' or 'Unresolved Reference: com.uc.webkit.impl.ak' (declaration of 'com.uc.webkit.impl.WebViewChromiumFactoryProvider' appears in /data/user/0/com.eg.android.AlipayGphone/app_h5container/uc/3.22.2.28.21092218119_64/so/core.jar)
10-12 19:03:21.461 1038 2723 I id.AlipayGphon: (Throwable with empty stack trace)
10-12 19:03:21.464 1038 2723 E WebViewEntry: init error and prepare native crash
10-12 19:03:21.464 1038 2723 E WebViewEntry: java.lang.NoClassDefFoundError: com.uc.webkit.impl.WebViewChromiumFactoryProvider
10-12 19:03:21.464 1038 2723 E WebViewEntry: at com.uc.webkit.impl.WebViewChromiumFactoryProvider.i(Unknown Source:0)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at com.uc.webkit.WebViewEntry.p(U4Source:193)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at com.uc.webkit.bg.run(Unknown Source:0)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at android.os.Handler.handleCallback(Handler.java:938)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at android.os.Handler.dispatchMessage(Handler.java:99)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at android.os.Looper.loopOnce(Looper.java:201)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at android.os.Looper.loop(Looper.java:288)
10-12 19:03:21.464 1038 2723 E WebViewEntry: at android.os.HandlerThread.run(HandlerThread.java:67)
10-12 19:03:21.464 1038 2723 E WebViewEntry: Caused by: java.lang.VerifyError: Verifier rejected class com.uc.webkit.impl.WebViewChromiumFactoryProvider: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g() failed to verify: com.uc.webkit.an com.uc.webkit.impl.WebViewChromiumFactoryProvider.g(): [0x15] can't resolve returned type 'Unresolved Reference: com.uc.webkit.an' or 'Unresolved Reference: com.uc.webkit.impl.ak' (declaration of 'com.uc.webkit.impl.WebViewChromiumFactoryProvider' appears in /data/user/0/com.eg.android.AlipayGphone/app_h5container/uc/3.22.2.28.21092218119_64/so/core.jar)
不解決這個(gè)問題我們的內(nèi)核可能無法在Android 12上啟用了,對于內(nèi)核來說又是一個(gè)生死攸關(guān)的問題。這個(gè)問題正常操作無法重現(xiàn),只能通過monkey瘋狂冷啟動(dòng)才能偶現(xiàn)。
另外一個(gè)背景是UC瀏覽器把sdk level提高到了30才引發(fā)這個(gè)問題。
調(diào)用棧分析
從調(diào)用棧的信息我們看到最頂層的Error是NoClassDefFoundError,但他是由下面的VerifyError引起的。這個(gè)調(diào)用棧顯示正在進(jìn)行正常的啟動(dòng)過程。
Rejecting re-init on previously-failed class 顯示com.uc.webkit.impl.WebViewChromiumFactoryProvider應(yīng)該已經(jīng)嘗試過Verify,但是Error了。按照常理應(yīng)該還有一個(gè)VerifyError的拋出。但找了多個(gè)崩潰日志都沒有發(fā)現(xiàn)第一次VerifyError拋出的位置。
另外,這個(gè)VerifyError的 Caused by: java.lang.VerifyError位置應(yīng)該后面還跟著它第一次Verify的調(diào)用棧,但它卻顯示(Throwable with empty stack trace)。
黑科技分析:手段一
帶著上述的諸多疑問,我們發(fā)現(xiàn)目前的數(shù)據(jù)不足以我們進(jìn)行分析,我們需要更多的和Verify有關(guān)的信息才能處理問題。
Android的art虛擬機(jī)是帶著verbose log的。它是按照模塊分類的,平時(shí)不會(huì)打開。需要啟動(dòng)art的時(shí)候通過傳參讓它打開。
我們嘗試了wrapper技術(shù),即在lib目錄加上文件wrapper.sh,系統(tǒng)就會(huì)用wrapper.sh啟動(dòng)虛擬機(jī),而不是通過Zygote。很遺憾這個(gè)手段沒有作用,分析了AndroidRuntime.cpp里面的源碼后,我們發(fā)現(xiàn)wrapper傳入的虛擬機(jī)參賽會(huì)被它過濾掉,完全無視。
我們只能使用正經(jīng)途徑之外的方法了。
上圖是Verbose log的結(jié)構(gòu),我們看到有個(gè)全局變量gLogVerbosity控制這它們的開關(guān)。我們能不能通過修改gLogVerbosity達(dá)到啟動(dòng)verbose log的目的?
UC內(nèi)核有著一系列強(qiáng)大的黑科技組合。適應(yīng)這種需求的黑科技是symbol_resolver模塊。這個(gè)技術(shù)能夠從/proc/self/maps文件里面分析指名的so映射的位置,并通過elf解析拿到所有的符號,然后我們就能夠從Key-Value對里面找到想要的符號的位置。
用這個(gè)技術(shù)我們很快定位了libart.so里面的gLogVerbosity位置,并且當(dāng)作一個(gè)bool數(shù)組把verifier和verifier_debug項(xiàng)置為true。于是我們有了新的log:
Verification failed on class org.chromium.ui.base.WindowAndroid in /data/user/0/com.eg.android.AlipayGphone/app_h5container/uc/3.22.2.31.10191532_64/so/core.jar because: Verifier rejected class org.chromium.ui.base.WindowAndroid: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken() failed to verify: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken(): [0x10] can't resolve returned type 'Unresolved Reference: android.os.IBinder' or 'Reference: android.os.IBinder'
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x0] : Processing const/4 v1, #+0
0:[Undefined],1:[Undefined],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1] : Processing iget-object v0, v2, Ljava/lang/ref/WeakReference; org.chromium.ui.base.WindowAndroid.e // field@7982
0:[Undefined],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x3] : Processing invoke-virtual {v0}, java.lang.Object java.lang.ref.WeakReference.get() // method@7347
0:[Reference: java.lang.ref.WeakReference],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x6] : Processing move-result-object v0
0:[Reference: java.lang.ref.WeakReference],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x7] : Processing check-cast v0, android.content.Context // type@TypeIndex[61]
0:[Reference: java.lang.Object],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x9] : Processing invoke-static {v0}, android.app.Activity org.chromium.ui.base.WindowAndroid.a(android.content.Context) // method@17017
0:[Reference: android.content.Context],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0xc] : Processing move-result-object v0
0:[Reference: android.content.Context],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0xd] : Processing if-nez v0, +4
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0xf] : Processing move-object v0, v1
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x10] : Processing return-object v0
0:[Zero/null],1:[Conflict],2:[Conflict],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x11] : Processing invoke-virtual {v0}, android.view.Window android.app.Activity.getWindow() // method@26
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x14] : Processing move-result-object v0
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x15] : Processing if-nez v0, +4
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x17] : Processing move-object v0, v1
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x18] : Processing goto -8
0:[Zero/null],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x19] : Processing invoke-virtual {v0}, android.view.View android.view.Window.peekDecorView() // method@1459
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1c] : Processing move-result-object v0
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1d] : Processing if-nez v0, +4
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x1f] : Processing move-object v0, v1
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x20] : Processing goto -16
0:[Zero/null],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x21] : Processing invoke-virtual {v0}, android.os.IBinder android.view.View.getWindowToken() // method@1318
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x24] : Processing move-result-object v0
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x25] : Processing goto -21
0:[Reference: android.os.IBinder],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x25] : Merging at [0x25] to [0x10]:
0:[Zero/null],1:[Conflict],2:[Conflict], MERGE
0:[Reference: android.os.IBinder],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid], ==
0:[Reference: android.os.IBinder],1:[Conflict],2:[Conflict],
VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x10] : Processing return-object v0
0:[Reference: android.os.IBinder],1:[Conflict],2:[Conflict],
Rejecting opcode return-object v0
Register Types:
0: Undefined
1: Conflict
2: null
3: Boolean
4: Byte
5: Short
6: Char
7: Integer
8: Long (Low Half)
9: Long (High Half)
10: Float
11: Double (Low Half)
12: Double (High Half)
13: Precise Constant: -1
14: Zero/null
15: Precise Constant: 1
16: Precise Constant: 2
17: Precise Constant: 3
18: Precise Constant: 4
19: Reference: org.chromium.ui.base.WindowAndroid
20: Reference: java.lang.Object
21: Reference: java.lang.ref.WeakReference
22: Reference: java.lang.ref.Reference
23: Reference: android.content.Context
24: Reference: android.app.Activity
25: Unresolved Reference: android.os.IBinder
26: Reference: android.view.Window
27: Reference: android.view.View
28: Reference: android.os.IBinder
Dumping instructions and register lines:
0:[Undefined],1:[Undefined],2:[Reference: org.chromium.ui.base.WindowAndroid],
0x0000: V-O-B-- const/4 v1, #+0
0x0001: V-O---- iget-object v0, v2, Ljava/lang/ref/WeakReference; org.chromium.ui.base.WindowAndroid.e // field@7982
0x0003: V-O---- invoke-virtual {v0}, java.lang.Object java.lang.ref.WeakReference.get() // method@7347
0x0006: V-O---- move-result-object v0
0x0007: V-O--G- check-cast v0, android.content.Context // type@TypeIndex[61]
0x0009: V-O---- invoke-static {v0}, android.app.Activity org.chromium.ui.base.WindowAndroid.a(android.content.Context) // method@17017
0x000c: V-O---- move-result-object v0
0x000d: V-O---- if-nez v0, +4
0x000f: V-O---- move-object v0, v1
0:[Reference: android.os.IBinder],1:[Conflict],2:[Conflict],
0x0010: VCO-B-R return-object v0
0:[Reference: android.app.Activity],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
0x0011: V-O-B-- invoke-virtual {v0}, android.view.Window android.app.Activity.getWindow() // method@26
0x0014: V-O---- move-result-object v0
0x0015: V-O---- if-nez v0, +4
0x0017: V-O---- move-object v0, v1
0x0018: V-O---- goto -8
0:[Reference: android.view.Window],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
0x0019: V-O-B-- invoke-virtual {v0}, android.view.View android.view.Window.peekDecorView() // method@1459
0x001c: V-O---- move-result-object v0
0x001d: V-O---- if-nez v0, +4
0x001f: V-O---- move-object v0, v1
0x0020: V-O---- goto -16
0:[Reference: android.view.View],1:[Zero/null],2:[Reference: org.chromium.ui.base.WindowAndroid],
0x0021: V-O-B-- invoke-virtual {v0}, android.os.IBinder android.view.View.getWindowToken() // method@1318
0x0024: V-O---- move-result-object v0
0x0025: V-O---- goto -21
Setting org.chromium.ui.base.WindowAndroid to erroneous.
這個(gè)log最值得關(guān)注的有兩點(diǎn):
1、[0x10] can't resolve returned type 'Unresolved Reference: android.os.IBinder' or 'Reference: android.os.IBinder' VFY: android.os.IBinder org.chromium.ui.base.WindowAndroid.getWindowToken()[0x0] : Processing const/4 v1, #+0
根據(jù)打log的代碼,我們看到return_type對應(yīng)著'Unresolved Reference: android.os.IBinder'。
但return_type的來源是:
而GetMethodReturnType:
會(huì)調(diào)用FromDescriptor:
會(huì)調(diào)用ResolveClass,ResolveClass會(huì)調(diào)用ClassLinker::FindClass,F(xiàn)indClass有個(gè)顯而易見的失敗前提是:
也就是在當(dāng)前線程是RuntimeThread的時(shí)候,會(huì)拒絕FindClass。因?yàn)檫@可能會(huì)導(dǎo)致class進(jìn)入初始化過程,導(dǎo)致它調(diào)用class里面static block中的class初始化函數(shù)。在RuntimeThread缺少允許java 函數(shù)的環(huán)境,不能允許它這么做。
難道由于當(dāng)前線程是Runtime Thread嗎?是的話這個(gè)Thread是哪個(gè)Runtime Thread?難道是gc thread嗎?
2、對這個(gè)日志前后的Verify動(dòng)作進(jìn)行分析。發(fā)現(xiàn)正常能Verify過的線程,都有l(wèi)oad class的日志。但出問題的這條線程一條load class的日志都沒有,后面它還因?yàn)橥瑯拥脑騐erify失敗了好幾個(gè)class。這更加肯定失敗的線程是一個(gè)Runtime Thread。另外前面提到的VerifyError沒有調(diào)用棧記錄的現(xiàn)象也在側(cè)面印證這是個(gè)Runtime Thread。因?yàn)镽untime Thread沒有Java環(huán)境,不能調(diào)用Java函數(shù),所以沒有記錄。但我們還是需要找到這個(gè)線程是什么。為此我們動(dòng)用了第二個(gè)黑科技。
黑科技分析:手段二
通過觀察代碼,我們發(fā)現(xiàn)VerifyError都是通過同一個(gè)函數(shù)拋出的:
我們也能找到它的全局符號,所以我們只需要在這個(gè)符號的位置加上執(zhí)行馬上崩潰的代碼,然后讓monkey觸發(fā)這個(gè)問題就能處理它了。
這里有個(gè)問題:android為了安全的原因禁止我們把代碼段的權(quán)限改為可寫。
如何安全的把代碼段改了呢?我們使用了/prof/self/mem技術(shù):打開/proc/self/mem文件,然后用pwrite api往符號的位置寫入必崩代碼。
這樣我們就發(fā)現(xiàn)了Verify失敗的那個(gè)線程:
根本原因分析
我們拿到了線程名Verification th。也拿到了線程啟動(dòng)的調(diào)用棧。他是從ThreadPool啟動(dòng)的,ThreadPool中的Thread都是RuntimeThread,坐實(shí)了之前的猜測。線程運(yùn)行的任務(wù)是BackgroundVerificationTask??梢匝杆僬业剿鼏?dòng)的位置:
再找一下是這個(gè)提交出的問題:
commit 0d5f6402ff925ac1385ccb349f8a2798a4816458 Author: Nicolas Geoffray ngeoffray@google.com Date: Tue Apr 13 13:05:36 2021 +0100
Only run background verification when dexPathList is set.
Otherwise, the runtime will not be able to find the classes.
Test: 692-vdex-secondary-loader
Bug: 185088679
Change-Id: Idd39eabe00faa017aa5254f7188e7adbcaa23c74
diff --git a/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java b/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java
index 710a88cc6d0..afbc9ec9de7 100644
--- a/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java
+++ b/dalvik/src/main/java/dalvik/system/BaseDexClassLoader.java
@@ -128,6 +128,9 @@ public class BaseDexClassLoader extends ClassLoader {
: Arrays.copyOf(sharedLibraryLoaders, sharedLibraryLoaders.length);
this.pathList = new DexPathList(this, dexPath, librarySearchPath, null, isTrusted);
+ // Run background verification after having set 'pathList'.
+ this.pathList.maybeRunBackgroundVerification(this);
+
reportClassLoaderChain();
}
@@ -186,6 +189,8 @@ public class BaseDexClassLoader extends ClassLoader {
this.sharedLibraryLoaders = null;
this.pathList = new DexPathList(this, librarySearchPath);
this.pathList.initByteBufferDexPath(dexFiles);
+ // Run background verification after having set 'pathList'.
+ this.pathList.maybeRunBackgroundVerification(this);
}
@Override
用git tag --contain命令找了下,發(fā)現(xiàn)確實(shí)是android 12 beta版開始帶上的。
解決方案
除了向谷歌報(bào)告問題,抱怨一通之外我們還是要找到解決方案。谷歌說他們下一版android 12的12月更新就會(huì)解決這個(gè)問題,但很多老機(jī)器根本不更新,所以他們是指望不上的了。
我們必須從OatFileManager::RunBackgroundVerification函數(shù)里面找到逼迫它不要啟動(dòng)后臺驗(yàn)證線程的方法。我們的目光很快落在了:
上面。因?yàn)槲覀冞€是能控制文件名的。前面的邏輯也有判斷sdk level,只要sdk level<=29也不會(huì)啟動(dòng)這個(gè)線程,但UC瀏覽器已經(jīng)把sdk level打開到30了(這也印證了背景提到UC瀏覽器把sdk level提高到30才出現(xiàn))。
觀察了函數(shù)DexLocationToOdexFilename,發(fā)現(xiàn)一行很有幫助:
// Get the base part of the file without the extension.
std::string file = location.substr(pos+1);
pos = file.rfind('.');
if (pos == std::string::npos) {
*error_msg = "Dex location " + location + " has no extension.";
return false;
}
只要我們讓它找不到suffix separator "."就能迫使它退出了。
結(jié)果
對android 12使用了軟鏈接core.jar為corejar的方法后, 這個(gè)問題就消失了。威脅UC內(nèi)核的怪獸被打敗了,世界又恢復(fù)往日的和平。