Watchdog機制源碼分析
前言
Linux引入Watchdog,在Linux內核下,當Watchdog啟動后,便設定了一個定時器,如果在超時時間內沒有對/dev/Watchdog進行寫操作,則會導致系統(tǒng)重啟。通過定時器實現(xiàn)的Watchdog屬于軟件層面;
Android設計了一個軟件層面Watchdog,用于保護一些重要的系統(tǒng)服務,當出現(xiàn)故障時,通常會讓Android系統(tǒng)重啟,由于這種機制的存在,就經(jīng)常會出現(xiàn)一些system_server進程被Watchdog殺掉而發(fā)生手機重啟的問題;
今天我們就來分析下原理;
一、WatchDog啟動機制詳解
ANR機制是針對應用的,對于系統(tǒng)進程來說,如果長時間“無響應”,Android系統(tǒng)設計了WatchDog機制來管控。如果超過了“無響應”的延時,那么系統(tǒng)WatchDog會觸發(fā)自殺機制;
Watchdog是一個線程,繼承于Thread,在SystemServer.java里面通過getInstance獲取watchdog的對象;
1、在SystemServer.java中啟動
- private void startOtherServices() {
- ······
- traceBeginAndSlog("InitWatchdog");
- final Watchdog watchdog = Watchdog.getInstance();
- watchdog.init(context, mActivityManagerService);
- traceEnd();
- ······
- traceBeginAndSlog("StartWatchdog");
- Watchdog.getInstance().start();
- traceEnd();
- }
因為是線程,所以,只要start即可;
2、查看WatchDog的構造方法
- private Watchdog() {
- super("watchdog");
- // Initialize handler checkers for each common thread we want to check. Note
- // that we are not currently checking the background thread, since it can
- // potentially hold longer running operations with no guarantees about the timeliness
- // of operations there.
- // The shared foreground thread is the main checker. It is where we
- // will also dispatch monitor checks and do other work.
- mMonitorChecker = new HandlerChecker(FgThread.getHandler(),
- "foreground thread", DEFAULT_TIMEOUT);
- mHandlerCheckers.add(mMonitorChecker);
- // Add checker for main thread. We only do a quick check since there
- // can be UI running on the thread.
- mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()),
- "main thread", DEFAULT_TIMEOUT));
- // Add checker for shared UI thread.
- mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(),
- "ui thread", DEFAULT_TIMEOUT));
- // And also check IO thread.
- mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(),
- "i/o thread", DEFAULT_TIMEOUT));
- // And the display thread.
- mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(),
- "display thread", DEFAULT_TIMEOUT));
- // Initialize monitor for Binder threads.
- addMonitor(new BinderThreadMonitor());
- mOpenFdMonitor = OpenFdMonitor.create();
- // See the notes on DEFAULT_TIMEOUT.
- assert DB ||
- DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS;
- // mtk enhance
- exceptionHWT = new ExceptionLog();
- }
重點關注兩個對象:mMonitorChecker和mHandlerCheckers
mHandlerCheckers列表元素的來源:
構造對象的導入:UiThread、IoThread、DisplatyThread、FgThread加入
外部導入:Watchdog.getInstance().addThread(handler);
mMonitorChecker列表元素的來源:
外部導入:Watchdog.getInstance().addMonitor(monitor);
特別說明:addMonitor(new BinderThreadMonitor());
3、查看WatchDog的run方法
- public void run() {
- boolean waitedHalf = false;
- boolean mSFHang = false;
- while (true) {
- ······
- synchronized (this) {
- ······
- for (int i=0; i<mHandlerCheckers.size(); i++) {
- HandlerChecker hc = mHandlerCheckers.get(i);
- hc.scheduleCheckLocked();
- }
- ······
- }
- ······
- }
對mHandlerCheckers列表元素進行檢測;
4、查看HandlerChecker的scheduleCheckLocked
- public void scheduleCheckLocked() {
- if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) {
- // If the target looper has recently been polling, then
- // there is no reason to enqueue our checker on it since that
- // is as good as it not being deadlocked. This avoid having
- // to do a context switch to check the thread. Note that we
- // only do this if mCheckReboot is false and we have no
- // monitors, since those would need to be executed at this point.
- mCompleted = true;
- return;
- }
- if (!mCompleted) {
- // we already have a check in flight, so no need
- return;
- }
- mCompleted = false;
- mCurrentMonitor = null;
- mStartTime = SystemClock.uptimeMillis();
- mHandler.postAtFrontOfQueue(this);
- }
mMonitors.size() == 0的情況:主要為了檢查mHandlerCheckers中的元素是否超時,運用的手段:mHandler.getLooper().getQueue().isPolling();
mMonitorChecker對象的列表元素一定是大于0,此時,關注點在mHandler.postAtFrontOfQueue(this);
- public void run() {
- final int size = mMonitors.size();
- for (int i = 0 ; i < size ; i++) {
- synchronized (Watchdog.this) {
- mCurrentMonitor = mMonitors.get(i);
- }
- mCurrentMonitor.monitor();
- }
- synchronized (Watchdog.this) {
- mCompleted = true;
- mCurrentMonitor = null;
- }
- }
監(jiān)聽monitor方法,這里是對mMonitors進行monitor,而能夠滿足條件的只有:mMonitorChecker,例如:各種服務通過addMonitor加入列表;
- ActivityManagerService.java
- Watchdog.getInstance().addMonitor(this);
- InputManagerService.java
- Watchdog.getInstance().addMonitor(this);
- PowerManagerService.java
- Watchdog.getInstance().addMonitor(this);
- ActivityManagerService.java
- Watchdog.getInstance().addMonitor(this);
- WindowManagerService.java
- Watchdog.getInstance().addMonitor(this);
而被執(zhí)行的monitor方法很簡單,例如ActivityManagerService:
- public void monitor() {
- synchronized (this) { }
- }
這里僅僅是檢查系統(tǒng)服務是否被鎖住;
Watchdog的內部類;
- private static final class BinderThreadMonitor implements Watchdog.Monitor {
- @Override
- public void monitor() {
- Binder.blockUntilThreadAvailable();
- }
- }
- android.os.Binder.java
- public static final native void blockUntilThreadAvailable();
- android_util_Binder.cpp
- static void android_os_Binder_blockUntilThreadAvailable(JNIEnv* env, jobject clazz)
- {
- return IPCThreadState::self()->blockUntilThreadAvailable();
- }
- IPCThreadState.cpp
- void IPCThreadState::blockUntilThreadAvailable()
- {
- pthread_mutex_lock(&mProcess->mThreadCountLock);
- while (mProcess->mExecutingThreadsCount >= mProcess->mMaxThreads) {
- ALOGW("Waiting for thread to be free. mExecutingThreadsCount=%lu mMaxThreads=%lu\n",
- static_cast<unsigned long>(mProcess->mExecutingThreadsCount),
- static_cast<unsigned long>(mProcess->mMaxThreads));
- pthread_cond_wait(&mProcess->mThreadCountDecrement, &mProcess->mThreadCountLock);
- }
- pthread_mutex_unlock(&mProcess->mThreadCountLock);
- }
這里僅僅是檢查進程中包含的可執(zhí)行線程的數(shù)量不能超過mMaxThreads,如果超過了最大值(31個),就需要等待;
- ProcessState.cpp
- #define DEFAULT_MAX_BINDER_THREADS 15
- 但是systemserver.java進行了設置
- // maximum number of binder threads used for system_server
- // will be higher than the system default
- private static final int sMaxBinderThreads = 31;
- private void run() {
- ······
- BinderInternal.setMaxThreads(sMaxBinderThreads);
- ······
- }
5、發(fā)生超時后退出
- public void run() {
- ······
- Process.killProcess(Process.myPid());
- System.exit(10);
- ······
- }
kill自己所在進程(system_server),并退出;
二、原理解釋
1、系統(tǒng)中所有需要監(jiān)控的服務都調用Watchdog的addMonitor添加Monitor Checker到mMonitors這個List中或者addThread方法添加Looper Checker到mHandlerCheckers這個List中;
2、當Watchdog線程啟動后,便開始無限循環(huán),它的run方法就開始執(zhí)行;
- 第一步調用HandlerChecker#scheduleCheckLocked處理所有的mHandlerCheckers
- 第二步定期檢查是否超時,每一次檢查的間隔時間由CHECK_INTERVAL常量設定,為30秒,每一次檢查都會調用evaluateCheckerCompletionLocked()方法來評估一下HandlerChecker的完成狀態(tài):
- COMPLETED表示已經(jīng)完成;
- WAITING和WAITED_HALF表示還在等待,但未超時,WAITED_HALF時候會dump一次trace.
- OVERDUE表示已經(jīng)超時。默認情況下,timeout是1分鐘;
3、如果超時時間到了,還有HandlerChecker處于未完成的狀態(tài)(OVERDUE),則通過getBlockedCheckersLocked()方法,獲取阻塞的HandlerChecker,生成一些描述信息,保存日志,包括一些運行時的堆棧信息。
4、最后殺死SystemServer進程;
總結
Watchdog是一個線程,用來監(jiān)聽系統(tǒng)各項服務是否正常運行,沒有發(fā)生死鎖;
HandlerChecker用來檢查Handler以及monitor;
monitor通過鎖來判斷是否死鎖;
超時30秒會輸出log,超時60秒會重啟;
Watchdog會殺掉自己的進程,也就是此時system_server進程id會變化;
本文轉載自微信公眾號「Android開發(fā)編程」