MySQL令人頭疼的Aborted告警案例分析
實(shí)戰(zhàn)
Part1:寫在最前
在MySQL的error log中,我們會(huì)經(jīng)常性看到一些各類的Aborted connection錯(cuò)誤,本文中會(huì)針對(duì)這類錯(cuò)誤進(jìn)行一個(gè)初步分析,并了解一個(gè)問(wèn)題產(chǎn)生后的基本排查思路和方法。掌握這種方法是至關(guān)重要的,而不是出現(xiàn)問(wèn)題了,去猜,去試。數(shù)據(jù)庫(kù)出現(xiàn)問(wèn)題的時(shí)候需要DBA在短時(shí)間內(nèi)快速解決問(wèn)題,因此一個(gè)好與壞的DBA,區(qū)別也在于此。
Part2:種類
- [Warning] Aborted connection 305628 to db: 'db' user: 'dbuser' host: 'hostname' (Got an error reading communication packets)
- [Warning] Aborted connection 81 to db:'unconnected' user: 'root' host: '127.0.0.1' (Got timeout reading communication
- packets)
- [Warning] Aborted connection 109 to db:'helei1' user: 'sys_admin' host: '192.168.1.1' (Got an error writing communication packets)
- [Warning] Access denied for user 'root'@'127.0.0.1' (using password: YES)
- [Warning] Got an error writing communication packets
Part3:重點(diǎn)參數(shù)分析
wait_timeout
Command-Line Format | --wait-timeout=# |
||
System Variable | Name | wait_timeout |
|
Variable Scope | Global, Session | ||
Dynamic Variable | Yes | ||
Permitted Values (Windows) | Type | integer | |
Default | 28800 |
||
Min Value | 1 |
||
Max Value | 2147483 |
||
Permitted Values (Other) | Type | integer | |
Default | 28800 |
||
Min Value | 1 |
||
Max Value | 31536000 |
這個(gè)參數(shù)指的是數(shù)據(jù)庫(kù)系統(tǒng)在關(guān)閉它之前,服務(wù)器等待非交互式連接上的活動(dòng)的秒數(shù)。
interactive_timeout
Command-Line Format | --interactive-timeout=# |
||
System Variable | Name | interactive_timeout |
|
Variable Scope | Global, Session | ||
Dynamic Variable | Yes | ||
Permitted Values | Type | integer | |
Default | 28800 |
||
Min Value | 1 |
這個(gè)參數(shù)指的是在關(guān)閉交互式連接之前,服務(wù)器等待活動(dòng)的秒數(shù)
Warning:警告這兩個(gè)參數(shù)建議一起調(diào)節(jié),能夠避免一些坑。
本文的兩個(gè)參數(shù)值采用的是默認(rèn)值
- mysql> show global variables like '%timeout%';
- +----------------------------+----------+
- | Variable_name | Value |
- +----------------------------+----------+
- | connect_timeout | 10 |
- | delayed_insert_timeout | 300 |
- | innodb_lock_wait_timeout | 50 |
- | innodb_rollback_on_timeout | OFF |
- |interactive_timeout | 28800 |
- | lock_wait_timeout | 31536000 |
- | net_read_timeout | 30 |
- | net_write_timeout | 60 |
- | slave_net_timeout | 3600 |
- |wait_timeout | 28800 |
- +----------------------------+----------+
- 10 rows in set (0.01 sec)
另外在數(shù)據(jù)庫(kù)中,我們重點(diǎn)關(guān)注下這兩個(gè)參數(shù),看看什么情況下Aborted_clients會(huì)提升,什么情況下Aborted_connects 會(huì)提升
- mysql>show global status like 'aborted%';
- +------------------+-------+
- |Variable_name | Value |
- +------------------+-------+
- |Aborted_clients | 19 |
- |Aborted_connects | 0 |
- +------------------+-------+
- 2 rows inset (0.00 sec)
Part4:案例1
這里我故意輸入錯(cuò)誤的密碼5次,來(lái)看下數(shù)據(jù)庫(kù)的error log和Aborted的哪個(gè)參數(shù)記載了這一問(wèn)題
- [root@HE3~]# mysql -uroot -pwrongpass -h127.0.0.1
- ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES)
- [root@HE3~]# mysql -uroot -pwrongpass -h127.0.0.1
- ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES)
- [root@HE3~]# mysql -uroot -pwrongpass -h127.0.0.1
- ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES)
- [root@HE3~]# mysql -uroot -pwrongpass -h127.0.0.1
- ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES)
- [root@HE3~]# mysql -uroot -pwrongpass -h127.0.0.1
- ERROR 1045 (28000): Access denied for user 'root'@'127.0.0.1' (using password: YES)
可以看出,這里的Aborted_connects 記錄了密碼錯(cuò)誤的這一問(wèn)題
- mysql>show global status like 'aborted%';
- +------------------+-------+
- |Variable_name | Value |
- +------------------+-------+
- |Aborted_clients | 19 |
- |Aborted_connects | 5 |
- +------------------+-------+
- 2 rows inset (0.00 sec)
error log中,也記載了這類密碼輸錯(cuò)的信息
- [Warning] Access denied for user'root'@'127.0.0.1' (using password: YES)
- [Warning] Access denied for user 'root'@'127.0.0.1' (using password:YES)
- [Warning] Access denied for user 'root'@'127.0.0.1' (using password:YES)
- [Warning] Access denied for user 'root'@'127.0.0.1' (using password:YES)
- [Warning] Access denied for user 'root'@'127.0.0.1' (using password:YES)
Part5:案例2
接下來(lái)我們看下文章第三節(jié)提到的兩個(gè)重點(diǎn)參數(shù)對(duì)數(shù)據(jù)庫(kù)連接的行為影響
這里我們將這兩個(gè)參數(shù)均配置為10秒
- mysql>set global wait_timeout=10;
- Query OK,0 rows affected (0.00 sec)
- mysql>set global interactive_timeout=10;
- Query OK,0 rows affected (0.00 sec)
- mysql>show processlist;
- ERROR 2006 (HY000): MySQL server has gone away
- No connection. Trying to reconnect... Connection id: 79 Current database: *** NONE ***
- +----+------+-----------------+------+---------+------+-------+------------------+
- | Id |User | Host | db | Command | Time | State | Info |
- +----+------+-----------------+------+---------+------+-------+------------------+
- | 79 |root | 127.0.0.1:42016 | NULL | Query | 0 | NULL | show processlist |
- +----+------+-----------------+------+---------+------+-------+------------------+
- 1 row in set (0.00 sec)
這里三次操作,可以看到clients數(shù)上升,這是由于timeout參數(shù)控制的,已經(jīng)連接上數(shù)據(jù)的連接被殺掉。
- mysql>show global status like 'aborted%';
- ERROR 2006 (HY000): MySQL server has gone away
- No connection. Trying to reconnect... Connection id: 81 Current database: *** NONE ***
- +------------------+-------+
- |Variable_name | Value |
- +------------------+-------+
- |Aborted_clients | 22 |
- |Aborted_connects | 5 |
- +------------------+-------+
- 2 rows in set (0.01 sec)
error log中記載的是
- [Warning] Aborted connection 81 to db: 'unconnected' user: 'root' host: '127.0.0.1' (Got timeout reading communication packets)
- [Warning] Aborted connection 78 to db: 'unconnected' user: 'root' host: '127.0.0.1' (Got timeout reading communication packets)
- [Warning] Aborted connection 79 to db: 'unconnected' user: 'root' host: '127.0.0.1' (Got timeout reading communication packets)
Part6:案例3
在這個(gè)案例中我們看下***連接數(shù)對(duì)數(shù)據(jù)庫(kù)連接的行為影響
- mysql>show global variables like 'max_conn%';
- +--------------------+-------+
- |Variable_name | Value |
- +--------------------+-------+
- |max_connect_errors | 1000 |
- |max_connections | 1024 |
- +--------------------+-------+
- 2 rows in set (0.00 sec)
- mysql>set global max_connections=2;
- Query OK,0 rows affected (0.00 sec)
這里看到爆出了連接數(shù)過(guò)多的問(wèn)題
- [root@HE3~]# mysql -uroot -pMANAGER -h127.0.0.1
- ERROR 1040 (HY000): Too many connections
而錯(cuò)誤日志沒(méi)有任何記錄
Part7:案例4
第三方工具navicat select結(jié)果沒(méi)有出來(lái)的時(shí)候選擇停止則出現(xiàn)
clients上漲
- mysql>show global status like 'aborted%';
- +------------------+-------+
- |Variable_name | Value |
- +------------------+-------+
- |Aborted_clients | 28 |
- |Aborted_connects | 10 |
- +------------------+-------+
- 2 rows in set (0.00 sec)
error log日志記錄
- 170626 16:26:56 [Warning] Aborted connection 109 to db: 'helei1' user: 'sys_admin' host: '192.168.1.1' (Got an error writing communication packets)
Part8:原因總結(jié)
- 在MySQL中sleep狀態(tài)數(shù)百秒的而且經(jīng)常重復(fù)連接是應(yīng)用程序在工作后沒(méi)有關(guān)閉連接的癥狀之一,而是依靠數(shù)據(jù)庫(kù)wait_timeout來(lái)關(guān)閉它們。強(qiáng)烈建議在操作結(jié)束時(shí)更改應(yīng)用程序邏輯以正確關(guān)閉連接;
- 檢查以確保max_allowed_packet的值足夠高,并且客戶端沒(méi)有收到“數(shù)據(jù)包太大”消息。 這種情況他會(huì)中止連接,而不正確關(guān)閉它;
- 另一種可能性是TIME_WAIT。建議您確認(rèn)連接被妥善管理并且是在應(yīng)用端正常關(guān)閉;
- 確保事務(wù)正確提交(開(kāi)始和提交),以便一旦應(yīng)用程序“完成”連接,它將處于“clean”的狀態(tài);
- 您應(yīng)該確保客戶端應(yīng)用程序不中止連接。 例如,如果PHP的選項(xiàng)max_execution_time設(shè)置為5秒,增加connect_timeout是沒(méi)用的,因?yàn)镻HP會(huì)殺死腳本。 其他編程語(yǔ)言和環(huán)境也有類似的選項(xiàng);
- 連接延遲的另一個(gè)原因是DNS問(wèn)題。 檢查是否啟用了skip-name-resolve,檢查主機(jī)根據(jù)其IP地址而不是其主機(jī)名進(jìn)行身份驗(yàn)證;
- 嘗試增加MySQL的net_read_timeout和net_write_timeout值,看看是否減少了錯(cuò)誤的數(shù)量。
——總結(jié)——
通過(guò)這4個(gè)案例,我們能夠了解到,Aborted_clients、和Aborted_connects的區(qū)別,以及什么情況下會(huì)爆出什么樣的錯(cuò)誤日志,文章第二節(jié)中的幾個(gè)Aborted錯(cuò)誤是常見(jiàn)的錯(cuò)誤,這類錯(cuò)誤出現(xiàn)的時(shí)候腦海里要有一個(gè)理論知識(shí),知道什么情況下,會(huì)出現(xiàn)什么樣的錯(cuò)誤,以便快速定位問(wèn)題。由于筆者的水平有限,編寫時(shí)間也很倉(cāng)促,文中難免會(huì)出現(xiàn)一些錯(cuò)誤或者不準(zhǔn)確的地方,不妥之處懇請(qǐng)讀者批評(píng)指正。