自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<ul id="vu68j"></ul>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考華為認證廠商認證 IT技術(shù)PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

線上CPU100%及應(yīng)用OOM的排查和解決過程

作者： aflyun 2021-06-04 15:58:53

商務(wù)辦公

最近工作又遇到幾次線上告警的問題，排查基本上就是cup100%以及內(nèi)存OOM問題，再分享一下之前遇到這類問題排查的一些思路和過程，希望對你有所幫助，感謝你的閱讀。

最近工作又遇到幾次線上告警的問題，排查基本上就是cup100%以及內(nèi)存OOM問題，再分享一下之前遇到這類問題排查的一些思路和過程，希望對你有所幫助，感謝你的閱讀。

問題現(xiàn)象

【告警通知-應(yīng)用異常告警】

簡單看下告警的信息：拒絕連接，反正就是服務(wù)有問題了，請不要太在意馬賽克。

環(huán)境說明

Spring Cloud F版。

項目中默認使用 spring-cloud-sleuth-zipkin 依賴得到 zipkin-reporter。分析的版本發(fā)現(xiàn)是 zipkin-reporter版本是 2.7.3 。

<dependency> 
 <groupId>org.springframework.cloud</groupId> 
 <artifactId>spring-cloud-sleuth-zipkin</artifactId> 
</dependency>

版本：2.0.0.RELEASE

版本說明

問題排查

通過告警信息，知道是哪一臺服務(wù)器的哪個服務(wù)出現(xiàn)問題。首先登錄服務(wù)器進行檢查。

1、檢查服務(wù)狀態(tài)和驗證健康檢查URL是否ok

“這一步可忽略/跳過，與實際公司的的健康檢查相關(guān)，不具有通用性。

①查看服務(wù)的進程是否存在。

“ps -ef | grep 服務(wù)名 ps -aux | grep 服務(wù)名

②查看對應(yīng)服務(wù)健康檢查的地址是否正常，檢查 ip port 是否正確

“是不是告警服務(wù)檢查的url配置錯了，一般這個不會出現(xiàn)問題

③驗證健康檢查地址

“這個健康檢查地址如：http://192.168.1.110:20606/serviceCheck 檢查 IP 和 Port 是否正確。

# 服務(wù)正常返回結(jié)果 
curl http://192.168.1.110:20606/serviceCheck 
{"appName":"test-app","status":"UP"} 
 
# 服務(wù)異常，服務(wù)掛掉 
curl http://192.168.1.110:20606/serviceCheck 
curl: (7) couldn't connect to host

2、查看服務(wù)的日志

查看服務(wù)的日志是否還在打印，是否有請求進來。查看發(fā)現(xiàn)服務(wù)OOM了。

OOM錯誤

tips：java.lang.OutOfMemoryError GC overhead limit exceeded

oracle官方給出了這個錯誤產(chǎn)生的原因和解決方法：Exception in thread thread_name: java.lang.OutOfMemoryError: GC Overhead limit exceeded Cause: The detail message "GC overhead limit exceeded" indicates that the garbage collector is running all the time and Java program is making very slow progress. After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 (compile time constant) consecutive garbage collections, then a java.lang.OutOfMemoryError is thrown. This exception is typically thrown because the amount of live data barely fits into the Java heap having little free space for new allocations. Action: Increase the heap size. The java.lang.OutOfMemoryError exception for GC Overhead limit exceeded can be turned off with the command line flag -XX:-UseGCOverheadLimit.

原因：大概意思就是說，JVM花費了98%的時間進行垃圾回收，而只得到2%可用的內(nèi)存，頻繁的進行內(nèi)存回收(最起碼已經(jīng)進行了5次連續(xù)的垃圾回收)，JVM就會曝出ava.lang.OutOfMemoryError: GC overhead limit exceeded錯誤。

上面tips來源：java.lang.OutOfMemoryError GC overhead limit exceeded原因分析及解決方案

3、檢查服務(wù)器資源占用狀況

查詢系統(tǒng)中各個進程的資源占用狀況，使用 top 命令。查看出有一個進程為 11441 的進程 CPU 使用率達到300%，如下截圖：

CPU爆表

然后查詢這個進程下所有線程的CPU使用情況：

top -H -p pid 保存文件：top -H -n 1 -p pid > /tmp/pid_top.txt

# top -H -p 11441 
PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 
11447 test    20   0 4776m 1.6g  13m R 92.4 20.3  74:54.19 java 
11444 test    20   0 4776m 1.6g  13m R 91.8 20.3  74:52.53 java 
11445 test    20   0 4776m 1.6g  13m R 91.8 20.3  74:50.14 java 
11446 test    20   0 4776m 1.6g  13m R 91.4 20.3  74:53.97 java 
....

查看 PID：11441 下面的線程，發(fā)現(xiàn)有幾個線程占用cpu較高。

4、保存堆棧數(shù)據(jù)

1、打印系統(tǒng)負載快照

top -b -n 2 > /tmp/top.txt 
 
top -H -n 1 -p pid > /tmp/pid_top.txt

2、cpu升序打印進程對應(yīng)線程列表

ps -mp-o THREAD,tid,time | sort -k2r > /tmp/進程號_threads.txt

3、看tcp連接數(shù) (最好多次采樣)

lsof -p 進程號 > /tmp/進程號_lsof.txt 
lsof -p 進程號 > /tmp/進程號_lsof2.txt

4、查看線程信息 (最好多次采樣)

jstack -l 進程號 > /tmp/進程號_jstack.txt  
jstack -l 進程號 > /tmp/進程號_jstack2.txt 
jstack -l 進程號 > /tmp/進程號_jstack3.txt

5、查看堆內(nèi)存占用概況

jmap -heap 進程號 > /tmp/進程號_jmap_heap.txt

6、查看堆中對象的統(tǒng)計信息

jmap -histo 進程號 | head -n 100 > /tmp/進程號_jmap_histo.txt

7、查看GC統(tǒng)計信息

jstat -gcutil 進程號 > /tmp/進程號_jstat_gc.txt

8、生產(chǎn)對堆快照Heap dump

jmap -dump:format=b,file=/tmp/進程號_jmap_dump.hprof 進程號

“堆的全部數(shù)據(jù)，生成的文件較大。

jmap -dump:live,format=b,file=/tmp/進程號_live_jmap_dump.hprof 進程號

“dump:live,這個參數(shù)表示我們需要抓取目前在生命周期內(nèi)的內(nèi)存對象，也就是說GC收不走的對象，一般用這個就行。

拿到出現(xiàn)問題的快照數(shù)據(jù)，然后重啟服務(wù)。

問題分析

根據(jù)上述的操作，已經(jīng)獲取了出現(xiàn)問題的服務(wù)的GC信息、線程堆棧、堆快照等數(shù)據(jù)。下面就進行分析，看問題到底出在哪里。

1、分析cpu占用100%的線程

轉(zhuǎn)換線程ID

從jstack生成的線程堆棧進程分析。

將上面線程ID 為

11447 ：0x2cb7 
 
11444 ：0x2cb4 
 
11445 ：0x2cb5 
 
11446 ：0x2cb6

轉(zhuǎn)為 16進制(jstack命令輸出文件記錄的線程ID是16進制)。

第一種轉(zhuǎn)換方法 :

$ printf “0x%x” 11447 
 
“0x2cb7”

第二種轉(zhuǎn)換方法 : 在轉(zhuǎn)換的結(jié)果加上 0x即可。

查找線程堆棧

$ cat 11441_jstack.txt | grep "GC task thread" 
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f971401e000 nid=0x2cb4 runnable 
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f9714020000 nid=0x2cb5 runnable 
"GC task thread#2 (ParallelGC)" os_prio=0 tid=0x00007f9714022000 nid=0x2cb6 runnable 
"GC task thread#3 (ParallelGC)" os_prio=0 tid=0x00007f9714023800 nid=0x2cb7 runnable

發(fā)現(xiàn)這些線程都是在做GC操作。

2、分析生成的GC文件

S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT    
 0.00   0.00 100.00  99.94  90.56  87.86    875    9.307  3223 5313.139 5322.446

S0：幸存1區(qū)當前使用比例
S1：幸存2區(qū)當前使用比例
E：Eden Space(伊甸園)區(qū)使用比例
O：Old Gen(老年代)使用比例
M：元數(shù)據(jù)區(qū)使用比例
CCS：壓縮使用比例
YGC：年輕代垃圾回收次數(shù)
FGC：老年代垃圾回收次數(shù)
FGCT：老年代垃圾回收消耗時間
GCT：垃圾回收消耗總時間

FGC 十分頻繁。

3、分析生成的堆快照

使用 Eclipse Memory Analyzer 工具。下載地址：https://www.eclipse.org/mat/downloads.php

分析的結(jié)果：

看到堆積的大對象的具體內(nèi)容：

問題大致原因，InMemoryReporterMetrics 引起的OOM。

zipkin2.reporter.InMemoryReporterMetrics @ 0xc1aeaea8 
 
Shallow Size: 24 B Retained Size: 925.9 MB

也可以使用：Java內(nèi)存Dump(https://www.perfma.com/docs/memory/memory-start)進行分析，如下截圖，功能沒有MAT強大，有些功能需收費。

4、原因分析和驗證

因為出現(xiàn)了這個問題，查看出現(xiàn)問題的這個服務(wù) zipkin的配置，和其他服務(wù)沒有區(qū)別。發(fā)現(xiàn)配置都一樣。

然后看在試著對應(yīng)的 zipkin 的jar包，發(fā)現(xiàn)出現(xiàn)問題的這個服務(wù)依賴的 zipkin版本較低。

有問題的服務(wù)的 zipkin-reporter-2.7.3.jar

其他沒有問題的服務(wù) 依賴的包：zipkin-reporter-2.8.4.jar

將有問題的服務(wù)依賴的包版本升級，在測試環(huán)境進行驗證，查看堆棧快照發(fā)現(xiàn)沒有此問題了。

原因探索

查 zipkin-reporter的 github：搜索相應(yīng)的資料

https://github.com/openzipkin/zipkin-reporter-java/issues?q=InMemoryReporterMetrics

找到此下面這個issues：

https://github.com/openzipkin/zipkin-reporter-java/issues/139

修復(fù)代碼和驗證代碼：

https://github.com/openzipkin/zipkin-reporter-java/pull/119/files

對比兩個版本代碼的差異：

簡單的DEMO驗證：

// 修復(fù)前的代碼： 
  private final ConcurrentHashMap<Throwable, AtomicLong> messagesDropped = 
      new ConcurrentHashMap<Throwable, AtomicLong>(); 
// 修復(fù)后的代碼： 
  private final ConcurrentHashMap<Class<? extends Throwable>, AtomicLong> messagesDropped = 
      new ConcurrentHashMap<>();

修復(fù)后使用這個key ：Class 替換 Throwable。

簡單驗證：

解決方案

將zipkin-reporter 版本進行升級即可。使用下面依賴配置，引入的 zipkin-reporter版本為 2.8.4 。

<!-- zipkin 依賴包 --> 
<dependency> 
  <groupId>io.zipkin.brave</groupId> 
  <artifactId>brave</artifactId> 
  <version>5.6.4</version> 
</dependency>

小建議：配置JVM參數(shù)的時候還是加上下面參數(shù),設(shè)置內(nèi)存溢出的時候輸出堆?？煺?

-XX:+HeapDumpOnOutOfMemoryError  
 -XX:HeapDumpPath=path/filename.hprof

參考文章

記一次sleuth發(fā)送zipkin異常引起的OOM

https://www.jianshu.com/p/f8c74943ccd8

本文轉(zhuǎn)載自微信公眾號「Java編程技術(shù)樂園」，可以通過以下二維碼關(guān)注。轉(zhuǎn)載本文請聯(lián)系Java編程技術(shù)樂園公眾號。

責任編輯：武曉燕來源： Java編程技術(shù)樂園

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<sub id="4hepl"><p id="4hepl"></p></sub>

<blockquote id="4hepl"><p id="4hepl"></p></blockquote>

<ruby id="4hepl"></ruby>

<pre id="4hepl"></pre>

<li id="4hepl"></li>

<strike id="4hepl"><cite id="4hepl"><th id="4hepl"></th></cite></strike>

<abbr id="4hepl"><tt id="4hepl"></tt></abbr>

<code id="4hepl"></code>