自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

Java服務(wù)總在半夜掛,背后的真相竟然是...

開發(fā)
最近有用戶反饋測試環(huán)境Java服務(wù)總在凌晨00:00左右掛掉,用戶反饋Java服務(wù)沒有定時(shí)任務(wù),也沒有流量突增的情況,Jvm配置也合理,莫名其妙就掛了。


問題排查

問題復(fù)現(xiàn)

為了復(fù)現(xiàn)該問題,寫了個(gè)springboot的demo部署在測試環(huán)境,其中demo里只做了hello world功能,應(yīng)用類型為web_tomcat (war包部署),基礎(chǔ)鏡像是base_tomcat/java-centos6-jdk18-60-tom8050-ngx197,鏡像使用的Java版本是1.8.0_60,有了上次 MySQL被kill的經(jīng)驗(yàn),盲猜是linux limit惹的禍,因此將打好的鏡像分別部署了兩批不同的機(jī)器,果不其然,新機(jī)器當(dāng)晚掛掉了,老機(jī)器服務(wù)正常

看一下掛掉的limit設(shè)置

排查過程

Java進(jìn)程會(huì)受到limits影響?

按理說Java進(jìn)程是不會(huì)受到系統(tǒng)limit open files(系統(tǒng)最大句柄數(shù))影響的,但是為了驗(yàn)證這個(gè)問題,我們將他修改為正常機(jī)器的值,由于demo是web_tomcat應(yīng)用,沒法修改啟動(dòng)腳本,因此我們通過prlimit修改java進(jìn)程的limit

prlimit -p 32672 --nofile=1048576

JJ

結(jié)果當(dāng)晚00:00左右還是掛了,看來open files和java進(jìn)程掛掉沒關(guān)系,看dmesg也沒發(fā)現(xiàn)什么問題


Java版本過低導(dǎo)致內(nèi)存分配不合理?

通過尋求jdos研發(fā)組的幫助,jdos研發(fā)組的同學(xué)認(rèn)為是java版本的問題,低版本可能沒有限制住申請(qǐng)的內(nèi)存大小,具體原因如下

https://blog.softwaremill.com/docker-support-in-new-java-8-finally-fd595df0ca54?gi=a0cc6736ed14

異常機(jī)器java內(nèi)存情況

Java服務(wù)總在半夜掛,背后的真相竟然是... | 京東云技術(shù)團(tuán)隊(duì)_EXEC_05Java服務(wù)總在半夜掛,背后的真相竟然是... | 京東云技術(shù)團(tuán)隊(duì)_EXEC_05

正常機(jī)器java內(nèi)存情況

Java服務(wù)總在半夜掛,背后的真相竟然是... | 京東云技術(shù)團(tuán)隊(duì)_定時(shí)任務(wù)_06Java服務(wù)總在半夜掛,背后的真相竟然是... | 京東云技術(shù)團(tuán)隊(duì)_定時(shí)任務(wù)_06

按照這個(gè) 文檔描述,使用docker cgroups限制內(nèi)存可能會(huì)導(dǎo)致JVM進(jìn)程被終止,原因是Java讀取的還是宿主機(jī)的CPU,而不是docker cgroups限制的CPU,高版本的Java解決了這個(gè)問題,文檔解決方案截圖如下:

Java服務(wù)總在半夜掛,背后的真相竟然是... | 京東云技術(shù)團(tuán)隊(duì)_Java_07Java服務(wù)總在半夜掛,背后的真相竟然是... | 京東云技術(shù)團(tuán)隊(duì)_Java_07

對(duì)此我們表示懷疑,因?yàn)槲覀兊某绦蚶镌O(shè)置了JVM參數(shù)

保持著試一試的心態(tài),我們增加了一個(gè)實(shí)驗(yàn)組,實(shí)驗(yàn)組使用的Java版本是11.0.8

結(jié)果當(dāng)晚實(shí)驗(yàn)組的Java進(jìn)程還是死了,看來和Java版本也沒關(guān)系

容器上存在定時(shí)任務(wù)導(dǎo)致的?

由于基礎(chǔ)鏡像是jdos官方提供的鏡像,所以之前從來沒有懷疑過是定時(shí)任務(wù)的問題,但是現(xiàn)在別無他法了,檢查下容器的定時(shí)任務(wù)

雖然有定時(shí)任務(wù),但是這個(gè)執(zhí)行的時(shí)間點(diǎn)和Java掛掉的時(shí)間對(duì)不上,為此我們決定刪除定時(shí)任務(wù)試試

結(jié)果當(dāng)晚Java進(jìn)程還是掛了,并且這次有dmesg的日志,發(fā)現(xiàn)Java被kill的同時(shí)crond也被kill了,被kill的原因是crond內(nèi)存過高導(dǎo)致oom

JJ

難道還有系統(tǒng)級(jí)cron任務(wù)?于是查了一下/etc/crontab,發(fā)現(xiàn)果然還有cron任務(wù)(這是誰打的鏡像?。。。?/p>

這個(gè)時(shí)間點(diǎn)和Java進(jìn)程掛掉的時(shí)間點(diǎn)吻合,但是問題來了,執(zhí)行的任務(wù)并沒有l(wèi)ogrotate.sh這個(gè)腳本,應(yīng)該不會(huì)出現(xiàn)問題才對(duì)

到底是不是定時(shí)任務(wù)的問題,我們修改下cron的時(shí)間驗(yàn)證下,調(diào)整時(shí)間為中午11:00,驗(yàn)證下Java進(jìn)程是否會(huì)掛,同時(shí)使用strace打印進(jìn)程trace log

果然Java進(jìn)程在中午11.00掛了,看來真的是cron任務(wù)導(dǎo)致的,讓我們一起看一下strace

19:59:01 close(3)                        = 0
19:59:01 stat("/etc/pam.d", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
19:59:01 open("/etc/pam.d/crond", O_RDONLY) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=293, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "#\n# The PAM configuration file f"..., 4096) = 293
19:59:01 open("/lib64/security/pam_access.so", O_RDONLY) = 5
19:59:01 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\17\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(5, {st_mode=S_IFREG|0755, st_size=18552, ...}) = 0
19:59:01 mmap(NULL, 2113800, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7fd769322000
19:59:01 mprotect(0x7fd769325000, 2097152, PROT_NONE) = 0
19:59:01 mmap(0x7fd769525000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0x3000) = 0x7fd769525000
19:59:01 close(5) = 0
19:59:01 open("/etc/ld.so.cache", O_RDONLY) = 5
19:59:01 fstat(5, {st_mode=S_IFREG|0644, st_size=16203, ...}) = 0
19:59:01 mmap(NULL, 16203, PROT_READ, MAP_PRIVATE, 5, 0) = 0x7fd7707f8000
19:59:01 close(5) = 0
19:59:01 open("/lib64/libnsl.so.1", O_RDONLY) = 5
19:59:01 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p@\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(5, {st_mode=S_IFREG|0755, st_size=113432, ...}) = 0
19:59:01 mmap(NULL, 2198192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7fd769109000
19:59:01 mprotect(0x7fd76911f000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd76931e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0x15000) = 0x7fd76931e000
19:59:01 mmap(0x7fd769320000, 6832, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fd769320000
19:59:01 close(5) = 0
19:59:01 mprotect(0x7fd76931e000, 4096, PROT_READ) = 0
19:59:01 mprotect(0x7fd769525000, 4096, PROT_READ) = 0
19:59:01 munmap(0x7fd7707f8000, 16203) = 0
19:59:01 open("/etc/pam.d/password-auth", O_RDONLY) = 5
19:59:01 fstat(5, {st_mode=S_IFREG|0644, st_size=692, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)                     = 0x7fd770803000
19:59:01 read(5, "#%PAM-1.0\n# This file is auto-ge"..., 4096) = 692
19:59:01 open("/lib64/security/pam_unix.so", O_RDONLY) = 6
19:59:01 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240&\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(6, {st_mode=S_IFREG|0755, st_size=51960, ...}) = 0
19:59:01 mmap(NULL, 2196352, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7fd768ef0000
19:59:01 mprotect(0x7fd768efc000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd7690fb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0xb000) = 0x7fd7690fb000
19:59:01 mmap(0x7fd7690fd000, 45952, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fd7690fd000
19:59:01 close(6)                       = 0
19:59:01 mprotect(0x7fd7690fb000, 4096, PROT_READ) = 0
19:59:01 read(5, "", 4096)              = 0
19:59:01 close(5) = 0
19:59:01 munmap(0x7fd770803000, 4096) = 0
19:59:01 open("/lib64/security/pam_loginuid.so", O_RDONLY) = 5
19:59:01 read(5, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\t\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(5, {st_mode=S_IFREG|0755, st_size=10240, ...}) = 0
19:59:01 mmap(NULL, 2105480, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 5, 0) = 0x7fd768ced000
19:59:01 mprotect(0x7fd768cef000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd768eee000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 5, 0x1000) = 0x7fd768eee000
19:59:01 close(5) = 0
19:59:01 mprotect(0x7fd768eee000, 4096, PROT_READ) = 0
19:59:01 open("/etc/pam.d/password-auth", O_RDONLY) = 5
19:59:01 fstat(5, {st_mode=S_IFREG|0644, st_size=692, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770803000
19:59:01 read(5, "#%PAM-1.0\n# This file is auto-ge"..., 4096) = 692
19:59:01 open("/lib64/security/pam_keyinit.so", O_RDONLY) = 6
19:59:01 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`\10\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(6, {st_mode=S_IFREG|0755, st_size=10224, ...}) = 0
19:59:01 mmap(NULL, 2105488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0)                      = 0x7fd768aea000
19:59:01 mprotect(0x7fd768aec000, 2093056, PROT_NONE)                     = 0
19:59:01 mmap(0x7fd768ceb000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x1000) = 0x7fd768ceb000
19:59:01 close(6) = 0
19:59:01 mprotect(0x7fd768ceb000, 4096, PROT_READ) = 0
19:59:01 open("/lib64/security/pam_limits.so", O_RDONLY) = 6
19:59:01 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\20\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(6, {st_mode=S_IFREG|0755, st_size=18600, ...}) = 0
19:59:01 mmap(NULL, 2113848, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7fd7688e5000
19:59:01 mprotect(0x7fd7688e9000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd768ae8000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x7fd768ae8000
19:59:01 close(6) = 0
19:59:01 mprotect(0x7fd768ae8000, 4096, PROT_READ) = 0
19:59:01 open("/lib64/security/pam_succeed_if.so", O_RDONLY) = 6
19:59:01 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\v\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(6, {st_mode=S_IFREG|0755, st_size=14384, ...}) = 0
19:59:01 mmap(NULL, 2109624, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0) = 0x7fd7686e1000
19:59:01 mprotect(0x7fd7686e4000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd7688e3000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x2000) = 0x7fd7688e3000
19:59:01 close(6) = 0
19:59:01 mprotect(0x7fd7688e3000, 4096, PROT_READ)                       = 0
19:59:01 read(5, "", 4096) = 0
19:59:01 close(5)                     = 0
19:59:01 munmap(0x7fd770803000, 4096) = 0
19:59:01 open("/etc/pam.d/password-auth", O_RDONLY)                      = 5
19:59:01 fstat(5, {st_mode=S_IFREG|0644, st_size=692, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)                      = 0x7fd770803000
19:59:01 read(5, "#%PAM-1.0\n# This file is auto-ge"..., 4096) = 692
19:59:01 open("/lib64/security/pam_env.so", O_RDONLY) = 6
19:59:01 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\300\r\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(6, {st_mode=S_IFREG|0755, st_size=18592, ...}) = 0
19:59:01 mmap(NULL, 2113776, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0)                       = 0x7fd7684dc000
19:59:01 mprotect(0x7fd7684e0000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd7686df000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0x3000) = 0x7fd7686df000
19:59:01 close(6) = 0
19:59:01 mprotect(0x7fd7686df000, 4096, PROT_READ)                     = 0
19:59:01 open("/lib64/security/pam_deny.so", O_RDONLY) = 6
19:59:01 read(6, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\5\0\0\0\0\0\0"..., 832) = 832
19:59:01 fstat(6, {st_mode=S_IFREG|0755, st_size=5952, ...}) = 0
19:59:01 mmap(NULL, 2101272, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 6, 0)                       = 0x7fd7682da000
19:59:01 mprotect(0x7fd7682db000, 2093056, PROT_NONE) = 0
19:59:01 mmap(0x7fd7684da000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 6, 0)                      = 0x7fd7684da000
19:59:01 close(6) = 0
19:59:01 mprotect(0x7fd7684da000, 4096, PROT_READ) = 0
19:59:01 read(5, "", 4096) = 0
19:59:01 close(5) = 0
19:59:01 munmap(0x7fd770803000, 4096) = 0
19:59:01 read(3, "", 4096)             = 0
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096)                      = 0
19:59:01 open("/etc/pam.d/other", O_RDONLY)                      = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=154, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)   = 0x7fd770804000
19:59:01 read(3, "#%PAM-1.0\nauth     required     "..., 4096) = 154
19:59:01 read(3, "", 4096) = 0
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 open("/etc/passwd", O_RDONLY|O_CLOEXEC)   = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1057, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1057
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 uname({sys="Linux", node="host-11-159-73-176", ...}) = 0
19:59:01 open("/etc/security/access.conf", O_RDONLY) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=4620, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "# Login access control table.\n#\n"..., 4096) = 4096
19:59:01 read(3, " should get access from ipv4 net"..., 4096) = 524
19:59:01 read(3, "", 4096) = 0
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 getuid() = 0
19:59:01 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1057, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)  = 0x7fd770804000
19:59:01 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1057
19:59:01 close(3)                       = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 geteuid() = 0
19:59:01 open("/etc/shadow", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG, st_size=901, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "root:$6$4.53VPrJ$1wxMpbsWYp4VKea"..., 4096) = 901
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096)                      = 0
19:59:01 socket(PF_NETLINK, SOCK_RAW, 9)                       = 3
19:59:01 fcntl(3, F_SETFD, FD_CLOEXEC)   = 0
19:59:01 readlink("/proc/self/exe", "/usr/sbin/crond", 4096) = 15
19:59:01 sendto(3, "p\0\0\0M\4\5\0\1\0\0\0\0\0\0\0op=PAM:accountin"..., 112, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12)                      = 112
19:59:01 poll([{fd=3, events=POLLIN}], 1, 500)   = 1 ([{fd=3, revents=POLLIN}])
19:59:01 recvfrom(3, "$\0\0\0\2\0\0\1\1\0\0\0\227\7\0\0\0\0\0\0p\0\0\0M\4\5\0\1\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
19:59:01 recvfrom(3, "$\0\0\0\2\0\0\1\1\0\0\0\227\7\0\0\0\0\0\0p\0\0\0M\4\5\0\1\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
19:59:01 close(3) = 0
19:59:01 open("/etc/security/pam_env.conf", O_RDONLY) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=2980, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "#\n# This is the configuration fi"..., 4096) = 2980
19:59:01 read(3, "", 4096) = 0
19:59:01 close(3)                      = 0
19:59:01 munmap(0x7fd770804000, 4096)                       = 0
19:59:01 open("/etc/environment", O_RDONLY)   = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)               = 0x7fd770804000
19:59:01 read(3, "", 4096) = 0
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 socket(PF_NETLINK, SOCK_RAW, 9) = 3
19:59:01 fcntl(3, F_SETFD, FD_CLOEXEC) = 0
19:59:01 sendto(3, "p\0\0\0O\4\5\0\2\0\0\0\0\0\0\0op=PAM:setcred a"..., 112, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12)                       = 112
19:59:01 poll([{fd=3, events=POLLIN}], 1, 500)   = 1 ([{fd=3, revents=POLLIN}])
19:59:01 recvfrom(3, "$\0\0\0\2\0\0\1\2\0\0\0\227\7\0\0\0\0\0\0p\0\0\0O\4\5\0\2\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
19:59:01 recvfrom(3, "$\0\0\0\2\0\0\1\2\0\0\0\227\7\0\0\0\0\0\0p\0\0\0O\4\5\0\2\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
19:59:01 close(3) = 0
19:59:01 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1057, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1057
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 open("/proc/self/loginuid", O_WRONLY|O_TRUNC|O_NOFOLLOW)        = 3
19:59:01 write(3, "0", 1) = 1
19:59:01 close(3) = 0
19:59:01 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1057, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1057
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 getuid() = 0
19:59:01 getgid() = 0
19:59:01 keyctl(0, 0xfffffffd, 0, 0, 0) = 496466385
19:59:01 keyctl(0, 0xfffffffb, 0, 0, 0x30) = 785702132
19:59:01 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1057, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1057
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 getrlimit(RLIMIT_CPU, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_FSIZE, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_DATA, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_CORE, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_RSS, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_NPROC, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_NOFILE, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
19:59:01 getrlimit(RLIMIT_MEMLOCK, {rlim_cur=64*1024, rlim_max=64*1024}) = 0
19:59:01 getrlimit(RLIMIT_AS, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_LOCKS, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 getrlimit(RLIMIT_SIGPENDING, {rlim_cur=883632, rlim_max=883632}) = 0
19:59:01 getrlimit(RLIMIT_MSGQUEUE, {rlim_cur=800*1024, rlim_max=800*1024}) = 0
19:59:01 getrlimit(RLIMIT_NICE, {rlim_cur=0, rlim_max=0}) = 0
19:59:01 getrlimit(RLIMIT_RTPRIO, {rlim_cur=0, rlim_max=0}) = 0
19:59:01 getpriority(PRIO_PROCESS, 0) = 20
19:59:01 open("/etc/security/limits.conf", O_RDONLY) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1835, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "# /etc/security/limits.conf\n#\n#E"..., 4096) = 1835
19:59:01 read(3, "", 4096) = 0
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096) = 0
19:59:01 open("/etc/security/limits.d", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 3
19:59:01 getdents(3, /* 3 entries */, 32768) = 88
19:59:01 open("/usr/lib64/gconv/gconv-modules.cache", O_RDONLY)                       = 5
19:59:01 fstat(5, {st_mode=S_IFREG|0644, st_size=26060, ...}) = 0
19:59:01 mmap(NULL, 26060, PROT_READ, MAP_SHARED, 5, 0) = 0x7fd7707f5000
19:59:01 close(5)  = 0
19:59:01 getdents(3, /* 0 entries */, 32768) = 0
19:59:01 close(3) = 0
19:59:01 open("/etc/security/limits.d/90-nproc.conf", O_RDONLY) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=193, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "# Default limit for number of us"..., 4096) = 193
19:59:01 read(3, "", 4096)              = 0
19:59:01 close(3)                       = 0
19:59:01 munmap(0x7fd770804000, 4096)   = 0
19:59:01 setrlimit(RLIMIT_NPROC, {rlim_cur=RLIM_INFINITY, rlim_max=RLIM_INFINITY}) = 0
19:59:01 setpriority(PRIO_PROCESS, 0, 0) = 0
19:59:01 getuid() = 0
19:59:01 open("/etc/passwd", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=1057, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1057
19:59:01 close(3) = 0
19:59:01 munmap(0x7fd770804000, 4096)                     = 0
19:59:01 socket(PF_NETLINK, SOCK_RAW, 9)                      = 3
19:59:01 fcntl(3, F_SETFD, FD_CLOEXEC)                      = 0
19:59:01 sendto(3, "t\0\0\0Q\4\5\0\3\0\0\0\0\0\0\0op=PAM:session_o"..., 116, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 116
19:59:01 poll([{fd=3, events=POLLIN}], 1, 500) = 1 ([{fd=3, revents=POLLIN}])
19:59:01 recvfrom(3, "$\0\0\0\2\0\0\1\3\0\0\0\227\7\0\0\0\0\0\0t\0\0\0Q\4\5\0\3\0\0\0"..., 8988, MSG_PEEK|MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
19:59:01 recvfrom(3, "$\0\0\0\2\0\0\1\3\0\0\0\227\7\0\0\0\0\0\0t\0\0\0Q\4\5\0\3\0\0\0"..., 8988, MSG_DONTWAIT, {sa_family=AF_NETLINK, pid=0, groups=00000000}, [12]) = 36
19:59:01 close(3) = 0
19:59:01 setgid(0) = 0
19:59:01 open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 3
19:59:01 read(3, "65536\n", 31)         = 6
19:59:01 close(3)                       = 0
19:59:01 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
19:59:01 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
19:59:01 close(3) = 0
19:59:01 socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
19:59:01 connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110)                       = -1 ENOENT (No such file or directory)
19:59:01 close(3) = 0
19:59:01 open("/etc/group", O_RDONLY|O_CLOEXEC) = 3
19:59:01 fstat(3, {st_mode=S_IFREG|0644, st_size=497, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd770804000
19:59:01 lseek(3, 0, SEEK_CUR) = 0
19:59:01 read(3, "root:x:0:\nbin:x:1:bin,daemon\ndae"..., 4096) = 497
19:59:01 read(3, "", 4096)              = 0
19:59:01 close(3)                       = 0
19:59:01 munmap(0x7fd770804000, 4096)                     = 0
19:59:01 setgroups(1, [0]) = 0
19:59:01 setreuid(0, 4294967295) = 0
19:59:01 rt_sigaction(SIGCHLD, {SIG_DFL, [CHLD], SA_RESTORER|SA_RESTART, 0x7fd76fa316a0}, {0x558826e03b80, [], SA_RESTORER|SA_RESTART, 0x7fd76fa316a0}, 8) = 0
19:59:01 pipe([3, 5])                   = 0
19:59:01 pipe([6, 7])                   = 0
19:59:01 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fd7707fca70) = 1946
19:59:01 gettid()                     = 1943
19:59:01 open("/proc/self/task/1943/attr/exec", O_RDWR) = 8
19:59:01 write(8, NULL, 0) = -1 EINVAL (Invalid argument)
19:59:01 close(8) = 0
19:59:01 close(3) = 0
19:59:01 close(7) = 0
19:59:01 close(5) = 0
19:59:01 fcntl(6, F_GETFL)                       = 0 (flags O_RDONLY)
19:59:01 fstat(6, {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
19:59:01 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)                     = 0x7fd770804000
19:59:01 lseek(6, 0, SEEK_CUR)                     = -1 ESPIPE (Illegal seek)
19:59:01 read(6, "/bin/bash: ./logrotate.sh: \346\262\241\346\234"..., 4096) = 55
19:59:01 uname({sys="Linux", node="host-11-159-73-176", ...}) = 0
19:59:01 getrlimit(RLIMIT_NOFILE, {rlim_cur=1073741816, rlim_max=1073741816}) = 0
19:59:01 mmap(NULL, 4294967296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd6682da000
19:59:01 --- SIGCHLD (Child exited) @ 0 (0) ---
19:59:06 +++ killed by SIGKILL +++

可以看到最后用 mmap 一次分配了 4G 內(nèi)存,然后就被kill了。

mmap前調(diào)用了getrlimit,和上次 MySQL的問題一樣,都是根據(jù)系統(tǒng)資源限制來分配內(nèi)存

為了確定就是cron導(dǎo)致java掛掉的元兇,我們把cron進(jìn)程手動(dòng)kill掉,這樣就不會(huì)執(zhí)行定時(shí)任務(wù)了,這次我們在驗(yàn)證下Java進(jìn)程是否會(huì)掛掉

果不其然,Java進(jìn)程并沒有掛掉,看來真的是cron任務(wù)導(dǎo)致的

高版本CentOS是否也會(huì)出現(xiàn)類似問題?

按理說oom killer應(yīng)該只kill掉占用內(nèi)存最高的才對(duì),Java進(jìn)程占用內(nèi)存又不是最高的,高版本的CentOS系統(tǒng)oom killer策略會(huì)不會(huì)有升級(jí)?

讓我們來一起驗(yàn)證下高版本的CentOS系統(tǒng)是否有這個(gè)問題

當(dāng)前鏡像的CentOS版本是CentOS release 6.6 (Final),為了驗(yàn)證高版本的CentOS是否也有類似的問題,我們將增加兩個(gè)實(shí)驗(yàn)組,分別升級(jí)基礎(chǔ)鏡像至CentOS release 6.10 (Final)和CentOS Linux release 7.9.2009 (Core),也添加相同的cron任務(wù)

結(jié)果發(fā)現(xiàn)CentOS release 6.10 (Final)和CentOS Linux release 7.9.2009 (Core)都沒有kill掉Java進(jìn)程,只kill掉了cron的子進(jìn)程


結(jié)論

由于容器limit open files(系統(tǒng)最大句柄數(shù))設(shè)置不合理導(dǎo)致cron執(zhí)行任務(wù)時(shí)使容器內(nèi)存飆升,存在內(nèi)存溢出的風(fēng)險(xiǎn),linux由于保護(hù)機(jī)制會(huì)kill掉占用內(nèi)存高的進(jìn)程,導(dǎo)致cron子任務(wù)進(jìn)程和Java進(jìn)程一起被kill(但是問題來了,這個(gè)jdos基礎(chǔ)鏡像為什么會(huì)執(zhí)行一個(gè)完全不存在的shell腳本,而且還是執(zhí)行兩次???),高版本的CentOS系統(tǒng)不會(huì)kill java進(jìn)程,猜測不同版本的CentOS的kill選擇策略略有不同

問題分析

Cron任務(wù)執(zhí)行邏輯

在Linux中,crontab工具是由croine軟件包提供的,讓我們一起看下cron的執(zhí)行過程

其中child_process()執(zhí)行了cron子進(jìn)程,cron執(zhí)行子進(jìn)程時(shí)會(huì)有發(fā)送mail的動(dòng)作

cron_popen在執(zhí)行時(shí)會(huì)按照open files(系統(tǒng)最大句柄數(shù))清除內(nèi)存

綜上,cron oom的原因找到了,是由于open files設(shè)置過大且cron任務(wù)沒有標(biāo)準(zhǔn)輸出,導(dǎo)致執(zhí)行了發(fā)送mail邏輯,而清除的內(nèi)存大小超出了容器本身內(nèi)存的大小,導(dǎo)致oom。

croine 1.5.4 版本之后修復(fù)了該問題,如果想查看當(dāng)前容器croine版本可執(zhí)行如下命令:

1.rpm -q cronie

Linux內(nèi)核OOM killer機(jī)制

Linux 內(nèi)核有個(gè)機(jī)制叫OOM killer(Out Of Memory killer),該機(jī)制會(huì)監(jiān)控那些占用內(nèi)存過大,尤其是瞬間占用內(nèi)存很快的進(jìn)程,然后防止內(nèi)存耗盡而自動(dòng)把該進(jìn)程殺掉。內(nèi)核檢測到系統(tǒng)內(nèi)存不足、挑選并殺掉某個(gè)進(jìn)程的過程可以參考內(nèi)核源代碼linux/mm/oom_kill.c,當(dāng)系統(tǒng)內(nèi)存不足的時(shí)候,out_of_memory()被觸發(fā),然后調(diào)用select_bad_process()選擇一個(gè)”bad”進(jìn)程殺掉。

以下是一些主要的進(jìn)程選擇策略:

  1. 內(nèi)存使用情況:OOM Killer首先傾向于選擇占用內(nèi)存最多的進(jìn)程,因?yàn)榻K止這些進(jìn)程可以釋放最多的內(nèi)存。
  2. OOM分?jǐn)?shù):每個(gè)進(jìn)程都有一個(gè)OOM分?jǐn)?shù),該分?jǐn)?shù)是基于其內(nèi)存使用情況和其他因素計(jì)算出來的。OOM Killer傾向于終止OOM分?jǐn)?shù)最高的進(jìn)程。
  3. 進(jìn)程優(yōu)先級(jí):在選擇要終止的進(jìn)程時(shí),OOM Killer通常會(huì)避免終止對(duì)系統(tǒng)至關(guān)重要的系統(tǒng)進(jìn)程。這些進(jìn)程通常具有較高的優(yōu)先級(jí),因此它們更不容易成為終止目標(biāo)。
  4. 進(jìn)程資源需求:OOM Killer還會(huì)考慮進(jìn)程的資源需求。它傾向于終止那些請(qǐng)求較少資源的進(jìn)程,以最小化影響其他進(jìn)程的運(yùn)行。
  5. 進(jìn)程屬性:某些進(jìn)程可能被標(biāo)記為不可終止,例如通過設(shè)置/proc/\[PID\]/oom\_score\_adj的值來調(diào)整OOM分?jǐn)?shù)。這些進(jìn)程通常不容易被OOM Killer終止。

注:不同版本的Linux oom killer機(jī)制可能會(huì)存在一些差異

解決方案

使用高版本穩(wěn)定的CentOS系統(tǒng),如果業(yè)務(wù)無法升級(jí)CentOS,則需要設(shè)置合理的limit open files數(shù)量,application\_worker類型應(yīng)用可以在啟動(dòng)腳本中手動(dòng)修改limit,web\_tomcat類型應(yīng)用沒法修改啟動(dòng)腳本,可以選擇kill掉cron進(jìn)程或刪除系統(tǒng)cron任務(wù),也可以手動(dòng)升級(jí)cronie的版本至1.5.7-5

寫在后面

open files這個(gè)坑很大,栽這個(gè)坑兩次了,大家一定要檢查自己服務(wù)對(duì)應(yīng)容器的CentOS版本和limit設(shè)置是否合理,本次案例發(fā)生在測試環(huán)境,尚不會(huì)引起事故,如果在生產(chǎn)出現(xiàn)類似情況,后果不堪設(shè)想

由于測試環(huán)境新增的這批機(jī)器都存在這個(gè)問題,我們團(tuán)隊(duì)已經(jīng)聯(lián)系機(jī)器提供方上報(bào)了該問題,后續(xù)這批機(jī)器會(huì)由提供方統(tǒng)一修改系統(tǒng)最大句柄數(shù),如果當(dāng)前問題影響到了業(yè)務(wù)的正常使用,可以臨時(shí)刪除容器中/etc/crontab中的任務(wù)

參考文獻(xiàn)

https://cloud.tencent.com/developer/article/1183262

https://github.com/cronie-crond/cronie

責(zé)任編輯:龐桂玉 來源: 51CTO博客
相關(guān)推薦

2019-12-24 11:00:51

NVMeSSDSATA

2023-04-06 09:44:00

ChatGPT行業(yè)質(zhì)量

2020-06-17 10:52:30

運(yùn)維故障技術(shù)

2020-12-29 05:39:44

日志服務(wù)環(huán)境

2020-09-29 06:45:49

JDK

2015-06-18 11:04:58

2020-12-15 08:05:40

路由器服務(wù)器網(wǎng)絡(luò)層

2021-07-28 06:51:08

Nacos代理模式

2024-08-05 01:28:26

2024-09-27 11:38:49

2023-06-05 00:21:33

0.001服務(wù)可用漏洞

2019-12-16 09:53:34

Nginx程序員開源

2021-10-18 13:42:52

加密貨幣金融工具

2023-03-13 08:09:03

Protobuffeature分割

2018-07-06 00:09:47

2020-10-20 17:18:00

戴爾

2021-08-28 10:15:26

項(xiàng)目結(jié)構(gòu)Flask

2022-07-07 19:44:22

Python 3.1

2017-06-02 10:57:29

Android內(nèi)存泄漏Dialog
點(diǎn)贊
收藏

51CTO技術(shù)棧公眾號(hào)