Go 超時(shí)引發(fā)大量 Fin-Wait2
本文轉(zhuǎn)載自微信公眾號(hào)「碼農(nóng)桃花源」,作者峰云就她了。轉(zhuǎn)載本文請(qǐng)聯(lián)系碼農(nóng)桃花源公眾號(hào)。
通過(guò)grafana監(jiān)控面板,發(fā)現(xiàn)了幾個(gè)高頻的業(yè)務(wù)緩存節(jié)點(diǎn)出現(xiàn)了大量的fin-wait2,而且fin-wait2狀態(tài)持續(xù)了不短的時(shí)間。通過(guò)連接的ip地址和抓包數(shù)據(jù)判斷出對(duì)端的業(yè)務(wù)。除此之外,頻繁地去創(chuàng)建新連接,我們對(duì)golang net/http transport的連接池已優(yōu)化過(guò),但established已建連的連接沒(méi)有得到復(fù)用。
另外,隨之帶來(lái)的問(wèn)題是大量time-wait的出現(xiàn),畢竟fin-wait2在拿到對(duì)端fin后會(huì)轉(zhuǎn)變?yōu)閠ime-wait狀態(tài)。但該狀態(tài)是正常的。
分析問(wèn)題
通過(guò)分析業(yè)務(wù)日志發(fā)現(xiàn)了大量的接口超時(shí)問(wèn)題,連接的地址跟netstat中fin-wait2目的地址是一致的。那么問(wèn)題已經(jīng)明確了,當(dāng)http的請(qǐng)求觸發(fā)超時(shí),定時(shí)器對(duì)連接對(duì)象進(jìn)行了關(guān)閉。這邊都close了,那么連接自然無(wú)法復(fù)用,所以就需要?jiǎng)?chuàng)建新連接,但由于對(duì)端的API接口出現(xiàn)邏輯阻塞,自然就又觸發(fā)了超時(shí),continue。
- // xiaorui.cc
- Get "http://xxxx": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
- Get "http://xxxx": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
- Get "http://xxxx": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
通過(guò)strace追蹤socket的系統(tǒng)調(diào)用,發(fā)現(xiàn)golang的socket讀寫(xiě)超時(shí)沒(méi)有使用setsockopt so_sndtimeo so_revtimeo參數(shù)。
- [pid 34262] epoll_ctl(3, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=1310076696, u64=140244877192984}}) = 0
- [pid 34265] epoll_pwait(3, <unfinished ...>
- [pid 34262] <... getsockname resumed>{sa_family=AF_INET, sin_port=htons(45242), sin_addr=inet_addr("127.0.0.1")}, [112->16]) = 0
- [pid 34264] epoll_pwait(3, <unfinished ...>
- [pid 34262] setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4 <unfinished ...>
- [pid 34262] setsockopt(6, SOL_SOCKET, SO_KEEPALIVE, [1], 4 <unfinished ...>
- [pid 34264] read(4, <unfinished ...>
- [pid 34262] setsockopt(6, SOL_TCP, TCP_KEEPINTVL, [30], 4 <unfinished ...>
代碼分析
通過(guò)net/http源碼可以看到socket的超時(shí)控制是通過(guò)定時(shí)器來(lái)實(shí)現(xiàn)的,在連接的roundTrip方法里有超時(shí)引發(fā)關(guān)閉連接的邏輯。由于http的語(yǔ)義不支持多路復(fù)用,所以為了規(guī)避超時(shí)后再回來(lái)的數(shù)據(jù)造成混亂,索性直接關(guān)閉連接。
當(dāng)觸發(fā)超時(shí)會(huì)主動(dòng)關(guān)閉連接,這里涉及到了四次揮手,作為關(guān)閉方會(huì)發(fā)送fin,對(duì)端內(nèi)核會(huì)回應(yīng)ack,這時(shí)候客戶端從fin-wait1到fin-wait2,而服務(wù)端在close-wait狀態(tài),等待觸發(fā)close syscall系統(tǒng)調(diào)用。服務(wù)端什么時(shí)候觸發(fā)close動(dòng)作?需要等待net/http handler業(yè)務(wù)邏輯執(zhí)行完畢。
- // xiaorui.cc
- var errTimeout error = &httpError{err: "net/http: timeout awaiting response headers", timeout: true}
- func (pc *persistConn) roundTrip(req *transportRequest) (resp *Response, err error) {
- for {
- testHookWaitResLoop()
- select {
- case err := <-writeErrCh:
- if debugRoundTrip {
- req.logf("writeErrCh resv: %T/%#v", err, err)
- }
- if err != nil {
- pc.close(fmt.Errorf("write error: %v", err))
- return nil, pc.mapRoundTripError(req, startBytesWritten, err)
- }
- if d := pc.t.ResponseHeaderTimeout; d > 0 {
- if debugRoundTrip {
- req.logf("starting timer for %v", d)
- }
- timer := time.NewTimer(d)
- defer timer.Stop() // prevent leaks
- respHeaderTimer = timer.C
- }
- case <-pc.closech:
- ...
- case <-respHeaderTimer:
- if debugRoundTrip {
- req.logf("timeout waiting for response headers.")
- }
- pc.close(errTimeout)
- return nil, errTimeout
如何解決
要么加大客戶端的超時(shí)時(shí)間,要么優(yōu)化對(duì)端的獲取數(shù)據(jù)的邏輯,總之減少超時(shí)的觸發(fā)。這個(gè)問(wèn)題其實(shí)跟 Go 沒(méi)有關(guān)系,換成openresyt和python同樣有這個(gè)問(wèn)題。