MySQL去除“關(guān)聯(lián)表”重復(fù)數(shù)據(jù),以建立聯(lián)合唯一索引
前言
昨天遇到一個(gè)問題,需要對一張關(guān)系表進(jìn)行重構(gòu)和優(yōu)化。然而這張關(guān)系表由于已有代碼沒有注重并發(fā)導(dǎo)致了很多的臟數(shù)據(jù),即重復(fù)數(shù)據(jù)。
表名thread_recommend,帖子推薦表,為兩個(gè)實(shí)體user_id和thread_id的(推薦)關(guān)系表,表結(jié)構(gòu)很簡單如下:
- /*用戶推薦帖子記錄表*/
- CREATE TABLE `thread_recommend` (
- `id` int(11) NOT NULL AUTO_INCREMENT,
- `thread_id` int(11) DEFAULT NULL COMMENT '被用戶推薦的帖子編號',
- `user_id` int(11) DEFAULT NULL COMMENT '推薦該帖子的用戶編號',
- `status` int(11) DEFAULT '1' COMMENT '狀態(tài)0 取消推薦,1推薦',
- `created` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '推薦時(shí)間',
- PRIMARY KEY (`id`),
- KEY `userid` (`user_id`) USING BTREE
- ) ENGINE=InnoDB;
問題在于,由于代碼不規(guī)范,在高并發(fā)時(shí)(或數(shù)據(jù)庫壓力大時(shí)造成的延時(shí)積壓時(shí))會(huì)出現(xiàn)多個(gè)(相同thread_id和user_id的)組合,如下:
之后你們懂的,各種和原想不一致的神奇bug噴涌而出,比如:
我剛剛?cè)∠送扑],怎么還顯示我推薦著!!
顯示的總推薦數(shù)怎么和實(shí)際推薦用戶加起來不一樣!!
解決方案一:使用insert where not exists語句
聲明:此方案并不是***方案,不推薦使用。
先上代碼:(這里拿另一個(gè)關(guān)系表的真實(shí)query舉例,原理一樣)
- INSERT INTO `user_topic` (`user_id`, `topic_id`)
- SELECT :userId, :topicid FROM `user_topic`
- WHERE NOT EXISTS (SELECT * FROM `user_topic`
- WHERE `user_topic`.`user_id` = :userId
- AND `user_topic`.`topic_id` = :topicid)
- LIMIT 1;
(相同方法見http://stackoverflow.com/a/31...)
通過這種“插入時(shí)判斷不存在才插入并返回行數(shù)為1,存在的話返回行數(shù)為0”的方法,可以做到:
- 只有在返回行數(shù)為1的情況下才執(zhí)行之后邏輯(如緩存內(nèi)的統(tǒng)計(jì)數(shù)+1,緩存內(nèi)帖子推薦人增加此userId等等)
- 如果返回行數(shù)為0,則接口返回error
解決方案二:清理臟數(shù)據(jù)并建立聯(lián)合唯一索引
這個(gè)方案是本文的核心了,也是我們目前認(rèn)為的***實(shí)踐。
***步:查找user_id, thread_id的聯(lián)合duplication
- SELECT a.* FROM `thread_recommend` a
- INNER JOIN (SELECT * FROM `thread_recommend` GROUP BY `thread_id`, `user_id` HAVING COUNT(id) > 1) b ON a.`thread_id` = b.`thread_id` AND a.`user_id` = b.`user_id`
- ORDER BY a.`user_id` ASC, a.`thread_id` ASC, a.`id` DESC
或簡單的版本
- SELECT * FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
得到
哇!所有的重復(fù)項(xiàng)都在這里了,好想馬上把它們干掉!
現(xiàn)在需要將重復(fù)的條目中ID更大的所有條目都刪除,只留ID最小的那一個(gè)。
刪之前先獲得需要?jiǎng)h除項(xiàng),比對一下,
- SELECT * FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
下一步,SELECT * FROM改成DELETE FROM,刪除!
- DELETE FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM `thread_recommend` GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
Oops!報(bào)錯(cuò)! You can't specify target table 'thread_recommend' for update in FROM clause
這是Mysql的一個(gè)小問題,我們參見解決方案 http://stackoverflow.com/a/14... 后修改一下SQL就好:
- DELETE FROM `thread_recommend`
- WHERE (`user_id`, `thread_id`) IN (SELECT `user_id`, `thread_id` FROM (SELECT * FROM `thread_recommend`) a GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1)
- AND `id` NOT IN (SELECT MIN(`id`) FROM (SELECT * FROM `thread_recommend`) b GROUP BY `user_id`, `thread_id` HAVING COUNT(1) > 1);
***,加聯(lián)合唯一索引!
- ALTER TABLE `thread_recommend`
- ADD UNIQUE KEY `thread_id_user_id_unique`(`thread_id`,`user_id`) USING BTREE;
Of course,如果上述清理工作沒有完成將會(huì)報(bào)錯(cuò)!
完!