JVM發(fā)生CMS GC的 5 種情況,你知道的肯定不全!
經(jīng)常有同學(xué)會(huì)問,為啥我的應(yīng)用 Old Gen 的使用占比沒達(dá)到 CMSInitiatingOccupancyFraction 參數(shù)配置的閾值,就觸發(fā)了 CMS GC,表示很莫名奇妙,不知道問題出在哪?
其實(shí) CMS GC 的觸發(fā)條件非常多,不只是 CMSInitiatingOccupancyFraction 閾值觸發(fā)這么簡(jiǎn)單。本文通過源碼全面梳理了觸發(fā) CMS GC 的條件,盡可能的幫你了解平時(shí)遇到的奇奇怪怪的 CMS GC 問題。
先拋出一些問題,來吸引你的注意力。
- 為什么 Old Gen 使用占比僅 50% 就進(jìn)行了一次 CMS GC?
- Metaspace 的使用也會(huì)觸發(fā) CMS GC 嗎?
- 為什么 Old Gen 使用占比非常小就進(jìn)行了一次 CMS GC?
觸發(fā)條件
CMS GC 在實(shí)現(xiàn)上分成 foreground collector 和 background collector。foreground collector 相對(duì)比較簡(jiǎn)單,background collector 比較復(fù)雜,情況比較多。
下面我們從 foreground collector 和 background collector 分別來說明他們的觸發(fā)條件:
說明:本文內(nèi)容是基于 JDK 8
說明:本文僅涉及 CMS GC 的觸發(fā)條件,至于算法的具體過程,以及什么時(shí)候進(jìn)行 MSC(mark sweep compact)不在本文范圍。
foreground collector
foreground collector 觸發(fā)條件比較簡(jiǎn)單,一般是遇到對(duì)象分配但空間不夠,就會(huì)直接觸發(fā) GC,來立即進(jìn)行空間回收。采用的算法是 mark sweep,不壓縮。
background collector
說明 background collector 的觸發(fā)條件之前,先來說下 background collector 的流程,它是通過 CMS 后臺(tái)線程不斷的去掃描,過程中主要是判斷是否符合 background collector 的觸發(fā)條件,一旦有符合的情況,就會(huì)進(jìn)行一次 background 的 collect。
- void ConcurrentMarkSweepThread::run() {
- ...//省略
- while (!_should_terminate) {
- sleepBeforeNextCycle();
- if (_should_terminate) break;
- GCCause::Cause cause = _collector->_full_gc_requested ?
- _collector->_full_gc_cause : GCCause::_cms_concurrent_mark;
- _collector->collect_in_background(false, cause);
- }
- ...//省略
- }
每次掃描過程中,先等 CMSWaitDuration 時(shí)間,然后再去進(jìn)行一次 shouldConcurrentCollect 判斷,看是否滿足 CMS background collector 的觸發(fā)條件。CMSWaitDuration 默認(rèn)時(shí)間是 2s(經(jīng)常會(huì)有業(yè)務(wù)遇到頻繁的 CMS GC,注意看每次 CMS GC 之間的時(shí)間間隔,如果是 2s,那基本就可以斷定是 CMS 的 background collector)。
- void ConcurrentMarkSweepThread::sleepBeforeNextCycle() {
- while (!_should_terminate) {
- if (CMSIncrementalMode) {
- icms_wait();
- if(CMSWaitDuration >= 0) {
- // Wait until the next synchronous GC, a concurrent full gc
- // request or a timeout, whichever is earlier.
- wait_on_cms_lock_for_scavenge(CMSWaitDuration);
- }
- return;
- } else {
- if(CMSWaitDuration >= 0) {
- // Wait until the next synchronous GC, a concurrent full gc
- // request or a timeout, whichever is earlier.
- wait_on_cms_lock_for_scavenge(CMSWaitDuration);
- } else {
- // Wait until any cms_lock event or check interval not to call shouldConcurrentCollect permanently
- wait_on_cms_lock(CMSCheckInterval);
- }
- }
- // Check if we should start a CMS collection cycle
- if (_collector->shouldConcurrentCollect()) {
- return;
- }
- // .. collection criterion not yet met, let's go back
- // and wait some more
- }
- }
那 shouldConcurrentCollect() 方法中都有哪些條件呢?
- bool CMSCollector::shouldConcurrentCollect() {
- // ***種觸發(fā)情況
- if (_full_gc_requested) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr("CMSCollector: collect because of explicit "
- " gc request (or gc_locker)");
- }
- return true;
- }
- // For debugging purposes, change the type of collection.
- // If the rotation is not on the concurrent collection
- // type, don't start a concurrent collection.
- NOT_PRODUCT(
- if (RotateCMSCollectionTypes &&
- (_cmsGen->debug_collection_type() !=
- ConcurrentMarkSweepGeneration::Concurrent_collection_type)) {
- assert(_cmsGen->debug_collection_type() !=
- ConcurrentMarkSweepGeneration::Unknown_collection_type,
- "Bad cms collection type");
- return false;
- }
- )
- FreelistLocker x(this);
- // ------------------------------------------------------------------
- // Print out lots of information which affects the initiation of
- // a collection.
- if (PrintCMSInitiationStatistics && stats().valid()) {
- gclog_or_tty->print("CMSCollector shouldConcurrentCollect: ");
- gclog_or_tty->stamp();
- gclog_or_tty->print_cr("");
- stats().print_on(gclog_or_tty);
- gclog_or_tty->print_cr("time_until_cms_gen_full %3.7f",
- stats().time_until_cms_gen_full());
- gclog_or_tty->print_cr("free="SIZE_FORMAT, _cmsGen->free());
- gclog_or_tty->print_cr("contiguous_available="SIZE_FORMAT,
- _cmsGen->contiguous_available());
- gclog_or_tty->print_cr("promotion_rate=%g", stats().promotion_rate());
- gclog_or_tty->print_cr("cms_allocation_rate=%g", stats().cms_allocation_rate());
- gclog_or_tty->print_cr("occupancy=%3.7f", _cmsGen->occupancy());
- gclog_or_tty->print_cr("initiatingOccupancy=%3.7f", _cmsGen->initiating_occupancy());
- gclog_or_tty->print_cr("metadata initialized %d",
- MetaspaceGC::should_concurrent_collect());
- }
- // ------------------------------------------------------------------
- // 第二種觸發(fā)情況
- // If the estimated time to complete a cms collection (cms_duration())
- // is less than the estimated time remaining until the cms generation
- // is full, start a collection.
- if (!UseCMSInitiatingOccupancyOnly) {
- if (stats().valid()) {
- if (stats().time_until_cms_start() == 0.0) {
- return true;
- }
- } else {
- // We want to conservatively collect somewhat early in order
- // to try and "bootstrap" our CMS/promotion statistics;
- // this branch will not fire after the first successful CMS
- // collection because the stats should then be valid.
- if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr(
- " CMSCollector: collect for bootstrapping statistics:"
- " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
- _bootstrap_occupancy);
- }
- return true;
- }
- }
- }
- // 第三種觸發(fā)情況
- // Otherwise, we start a collection cycle if
- // old gen want a collection cycle started. Each may use
- // an appropriate criterion for making this decision.
- // XXX We need to make sure that the gen expansion
- // criterion dovetails well with this. XXX NEED TO FIX THIS
- if (_cmsGen->should_concurrent_collect()) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr("CMS old gen initiated");
- }
- return true;
- }
- // 第四種觸發(fā)情況
- // We start a collection if we believe an incremental collection may fail;
- // this is not likely to be productive in practice because it's probably too
- // late anyway.
- GenCollectedHeap* gch = GenCollectedHeap::heap();
- assert(gch->collector_policy()->is_two_generation_policy(),
- "You may want to check the correctness of the following");
- if (gch->incremental_collection_will_fail(true /* consult_young */)) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print("CMSCollector: collect because incremental collection will fail ");
- }
- return true;
- }
- // 第五種觸發(fā)情況
- if (MetaspaceGC::should_concurrent_collect()) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print("CMSCollector: collect for metadata allocation ");
- }
- return true;
- }
- return false;
- }
上述代碼可知,從大類上分, background collector 一共有 5 種觸發(fā)情況:
1.是否是并行 Full GC
指的是在 GC cause 是 gclocker 且配置了 GCLockerInvokesConcurrent 參數(shù), 或者 GC cause 是javalangsystemgc(就是 System.gc()調(diào)用)and 且配置了 ExplicitGCInvokesConcurrent 參數(shù),這時(shí)會(huì)觸發(fā)一次 background collector。
2.根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)計(jì)算(僅未配置 UseCMSInitiatingOccupancyOnly 時(shí)) 未配置 UseCMSInitiatingOccupancyOnly 時(shí),會(huì)根據(jù)統(tǒng)計(jì)數(shù)據(jù)動(dòng)態(tài)判斷是否需要進(jìn)行一次 CMS GC。
判斷邏輯是,如果預(yù)測(cè) CMS GC 完成所需要的時(shí)間大于預(yù)計(jì)的老年代將要填滿的時(shí)間,則進(jìn)行 GC。 這些判斷是需要基于歷史的 CMS GC 統(tǒng)計(jì)指標(biāo),然而,***次 CMS GC 時(shí),統(tǒng)計(jì)數(shù)據(jù)還沒有形成,是無效的,這時(shí)會(huì)跟據(jù) Old Gen 的使用占比來判斷是否要進(jìn)行 GC。
- if (!UseCMSInitiatingOccupancyOnly) {
- if (stats().valid()) {
- if (stats().time_until_cms_start() == 0.0) {
- return true;
- }
- } else {
- // We want to conservatively collect somewhat early in order
- // to try and "bootstrap" our CMS/promotion statistics;
- // this branch will not fire after the first successful CMS
- // collection because the stats should then be valid.
- if (_cmsGen->occupancy() >= _bootstrap_occupancy) {
- if (Verbose && PrintGCDetails) {
- gclog_or_tty->print_cr(
- " CMSCollector: collect for bootstrapping statistics:"
- " occupancy = %f, boot occupancy = %f", _cmsGen->occupancy(),
- _bootstrap_occupancy);
- }
- return true;
- }
- }
- }
那占多少比率,開始回收呢?(也就是 bootstrapoccupancy 的值是多少呢?) 答案是 50%。或許你已經(jīng)遇到過類似案例,在沒有配置 UseCMSInitiatingOccupancyOnly 時(shí),發(fā)現(xiàn)老年代占比到 50% 就進(jìn)行了一次 CMS GC,當(dāng)時(shí)的你或許還一頭霧水呢。
- _bootstrap_occupancy = ((double)CMSBootstrapOccupancy)/(double)100;
- //參數(shù)默認(rèn)值
- product(uintx, CMSBootstrapOccupancy, 50,
- "Percentage CMS generation occupancy at which to initiate CMS collection for bootstrapping collection stats")
3.根據(jù) Old Gen 情況判斷
- bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {
- assert_lock_strong(freelistLock());
- if (occupancy() > initiating_occupancy()) {
- if (PrintGCDetails && Verbose) {
- gclog_or_tty->print(" %s: collect because of occupancy %f / %f ",
- short_name(), occupancy(), initiating_occupancy());
- }
- return true;
- }
- if (UseCMSInitiatingOccupancyOnly) {
- return false;
- }
- if (expansion_cause() == CMSExpansionCause::_satisfy_allocation) {
- if (PrintGCDetails && Verbose) {
- gclog_or_tty->print(" %s: collect because expanded for allocation ",
- short_name());
- }
- return true;
- }
- if (_cmsSpace->should_concurrent_collect()) {
- if (PrintGCDetails && Verbose) {
- gclog_or_tty->print(" %s: collect because cmsSpace says so ",
- short_name());
- }
- return true;
- }
- return false;
- }
從源碼上看,這里主要分成兩類: (a) Old Gen 空間使用占比情況與閾值比較,如果大于閾值則進(jìn)行 CMS GC 也就是"occupancy() > initiatingoccupancy()",occupancy 毫無疑問是 Old Gen 當(dāng)前空間的使用占比,而 initiatingoccupancy 是多少呢?
- _cmsGen ->init_initiating_occupancy(CMSInitiatingOccupancyFraction, CMSTriggerRatio);
- ...
- void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
- assert(io <= 100 && tr <= 100, "Check the arguments");
- if (io >= 0) {
- _initiating_occupancy = (double)io / 100.0;
- } else {
- _initiating_occupancy = ((100 - MinHeapFreeRatio) +
- (double)(tr * MinHeapFreeRatio) / 100.0)
- / 100.0;
- }
- }
可以看到當(dāng) CMSInitiatingOccupancyFraction 參數(shù)配置值大于 0,就是 “io / 100.0”;
當(dāng) CMSInitiatingOccupancyFraction 參數(shù)配置值小于 0 時(shí)(注意,默認(rèn)是 -1),是 “((100 - MinHeapFreeRatio) + (double)(tr * MinHeapFreeRatio) / 100.0) / 100.0”,這到底是多少呢?是 92%,這里就不貼出具體的計(jì)算過程了,或許你已經(jīng)在某些書或者博客中了解過,CMSInitiatingOccupancyFraction 沒有配置,就是 92,但是其實(shí) CMSInitiatingOccupancyFraction 沒有配置是 -1,所以閾值取后者 92%,并不是 CMSInitiatingOccupancyFraction 的值是 92。
(b) 接下來沒有配置 UseCMSInitiatingOccupancyOnly 的情況
這里也分成有兩小類情況:
- 當(dāng) Old Gen 剛因?yàn)閷?duì)象分配空間而進(jìn)行擴(kuò)容,且成功分配空間,這時(shí)會(huì)考慮進(jìn)行一次 CMS GC;
- 根據(jù) CMS Gen 空閑鏈判斷,這里有點(diǎn)復(fù)雜,目前也沒整清楚,好在按照默認(rèn)配置其實(shí)這里返回的是 false,所以默認(rèn)是不用考慮這種觸發(fā)條件了。
4.根據(jù)增量 GC 是否可能會(huì)失敗(悲觀策略)
什么意思呢?兩代的 GC 體系中,主要指的是 Young GC 是否會(huì)失敗。如果 Young GC 已經(jīng)失敗或者可能會(huì)失敗,JVM 就認(rèn)為需要進(jìn)行一次 CMS GC。
- bool incremental_collection_will_fail(bool consult_young) {
- // Assumes a 2-generation system; the first disjunct remembers if an
- // incremental collection failed, even when we thought (second disjunct)
- // that it would not.
- assert(heap()->collector_policy()->is_two_generation_policy(),
- "the following definition may not be suitable for an n(>2)-generation system");
- return incremental_collection_failed() ||
- (consult_young && !get_gen(0)->collection_attempt_is_safe());
- }
我們看兩個(gè)判斷條件,“incrementalcollectionfailed()” 和 “!getgen(0)->collectionattemptissafe()” incrementalcollectionfailed() 這里指的是 Young GC 已經(jīng)失敗,至于為什么會(huì)失敗一般是因?yàn)?Old Gen 沒有足夠的空間來容納晉升的對(duì)象。
!getgen(0)->collectionattemptissafe() 指的是新生代晉升是否安全。 通過判斷當(dāng)前 Old Gen 剩余的空間大小是否足夠容納 Young GC 晉升的對(duì)象大小。 Young GC 到底要晉升多少是無法提前知道的,因此,這里通過統(tǒng)計(jì)平均每次 Young GC 晉升的大小和當(dāng)前 Young GC 可能晉升的***大小來進(jìn)行比較。
- //av_promo 是平均每次 YoungGC 晉升的大小,max_promotion_in_bytes 是當(dāng)前可能的***晉升大小( eden+from 當(dāng)前使用空間的大小)
- bool res = (available >= av_promo) || (available >= max_promotion_in_bytes);
5.根據(jù) meta space 情況判斷
這里主要看 metaspace 的 shouldconcurrent_collect 標(biāo)志,這個(gè)標(biāo)志在 meta space 進(jìn)行擴(kuò)容前如果配置了 CMSClassUnloadingEnabled 參數(shù)時(shí),會(huì)進(jìn)行設(shè)置。這種情況下就會(huì)進(jìn)行一次 CMS GC。因此經(jīng)常會(huì)有應(yīng)用啟動(dòng)不久,Old Gen 空間占比還很小的情況下,進(jìn)行了一次 CMS GC,讓你很莫名其妙,其實(shí)就是這個(gè)原因?qū)е碌摹?/p>
總結(jié)
本文梳理了 CMS GC 的 foreground collector 和 background collector 的觸發(fā)條件,foreground collector 的觸發(fā)條件相對(duì)來說比較簡(jiǎn)單,而 background collector 的觸發(fā)條件比較多,分成 5 大種情況,各大種情況種還有一些小的觸發(fā)分支。尤其是在沒有配置 UseCMSInitiatingOccupancyOnly 參數(shù)的情況下,會(huì)多出很多種觸發(fā)可能,一般在生產(chǎn)環(huán)境是強(qiáng)烈建議配置 UseCMSInitiatingOccupancyOnly 參數(shù),以便于能夠比較確定的執(zhí)行 CMS GC,另外,也方便排查 GC 原因。