提升 MySQL 批量更新效率的底層原理與優(yōu)化策略

作者：碼農(nóng)SharkChili 2025-01-10 11:28:58

網(wǎng)絡(luò) 網(wǎng)絡(luò)優(yōu)化 MySQL

? 如果網(wǎng)絡(luò)情況良好且 MySQL 連接池資源充分的情況下，筆者更推薦使用并行進(jìn)行逐條更新。

近期進(jìn)行項(xiàng)目優(yōu)化梳理工作時，發(fā)現(xiàn)某些功能模塊進(jìn)行MySQL數(shù)據(jù)庫批量更新操作比較耗時，對此筆者查閱相關(guān)資料比進(jìn)行壓測后，得出最優(yōu)解，遂以此文章記錄一下筆者的解決方案。

一、前置準(zhǔn)備

為方便演示，筆者先說明一下本文進(jìn)行實(shí)驗(yàn)的數(shù)據(jù)表，對應(yīng)的DDL語句如下，可以看到該表有一個自增的主鍵ID和9個字段以及一個日期字段：

CREATE TABLE `batch_insert_test` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `fileid_1` varchar(100) DEFAULT NULL,
  `fileid_2` varchar(100) DEFAULT NULL,
  `fileid_3` varchar(100) DEFAULT NULL,
  `fileid_4` varchar(100) DEFAULT NULL,
  `fileid_5` varchar(100) DEFAULT NULL,
  `fileid_6` varchar(100) DEFAULT NULL,
  `fileid_7` varchar(100) DEFAULT NULL,
  `fileid_8` varchar(100) DEFAULT NULL,
  `fileid_9` varchar(100) DEFAULT NULL,
  `create_date` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `batch_insert_test_create_date_IDX` (`create_date`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=19091237 DEFAULT CHARSET=utf8 COMMENT='測試批量插入，一行數(shù)據(jù)1k左右';

特別注意，讀者在根據(jù)本文進(jìn)行操作時需要對數(shù)據(jù)庫連接配置上追加如下兩個參數(shù)，否則優(yōu)化方案不會生效：

&rewriteBatchedStatements=true&allowMultiQueries=true

二、三種方案壓測實(shí)驗(yàn)

1. 逐條更新

首先查看逐條更新的解決方案，筆者通過分頁查詢查詢大約3000條數(shù)據(jù)，然后逐條進(jìn)行遍歷更新：

 /**
     * 使用foreach進(jìn)行逐條插入
     */
    @Test
    public void foreachUpdate() {
        //分頁查詢3k的數(shù)據(jù)
        PageHelper.startPage(PAGE, SIZE);
        List<BatchInsertTest> insertTestList = batchInsertTestMapper.selectByExample(null);

        //逐條更新
        StopWatch stopWatch = new StopWatch("foreachUpdate");
        stopWatch.start();
        for (BatchInsertTest insertTest : insertTestList) {
            batchInsertTestMapper.updateByPrimaryKey(insertTest);
        }
        stopWatch.stop();

        log.info("逐條更新完成,size:{},耗時:{}ms", insertTestList.size(), stopWatch.getLastTaskTimeMillis());


    }

對應(yīng)耗時結(jié)果如下，可以看到耗時花費(fèi)了1592ms，表現(xiàn)比較遜色，原因很簡單，每條數(shù)據(jù)操作時都涉及網(wǎng)絡(luò)IO，3000次串行的網(wǎng)絡(luò)IO+DB更新，執(zhí)行效率自然上不去：

2025-01-10 09:07:02.920  INFO 19328 --- [           main] c.s.mapper.BatchInsertTestMapperTest     : 逐條更新完成,size:3000,耗時:1592ms

2. 并行運(yùn)算

不知道讀者是否留意筆者上文所說的串行DB更新，既然串行的網(wǎng)絡(luò)IO會降低執(zhí)行效率，那么我們并行更新呢？

所以筆者將代碼進(jìn)行進(jìn)一步的優(yōu)化：

/**
     * 使用并行流foreach進(jìn)行逐條插入
     */
    @Test
    public void foreachParallelStreamUpdate() {
        PageHelper.startPage(PAGE, SIZE);

        List<BatchInsertTest> insertTestList = batchInsertTestMapper.selectByExample(null);

        //采用并行流的方式進(jìn)行并行更新
        StopWatch stopWatch = new StopWatch("foreachUpdate");
        stopWatch.start();
        insertTestList.parallelStream()
                .forEach(i -> {
                    batchInsertTestMapper.updateByPrimaryKey(i);
                });


        stopWatch.stop();

        log.info("逐條更新完成,size:{},耗時:{}ms", insertTestList.size(), stopWatch.getLastTaskTimeMillis());


    }

可以看到3000條數(shù)據(jù)花費(fèi)了357ms，執(zhí)行效率還是很客觀的，但筆者認(rèn)為這還不是最優(yōu)解，原因很簡單，每次進(jìn)行批量更新操作都需要進(jìn)行多次網(wǎng)絡(luò)IO，如果在并發(fā)量非常大的場景，比較吃MySQL的連接池資源：

2025-01-10 09:07:00.789  INFO 19328 --- [           main] c.s.mapper.BatchInsertTestMapperTest     : 逐條更新完成,size:3000,耗時:357ms

3. foreach更新

我們再來看看mybatis內(nèi)置的foreach語法的批量更新：

<update id="updateByPrimaryKeyForeach" parameterType="com.sharkChili.domain.BatchInsertTest">
        <foreach collection="list" item="item" separator=";">
            update batch_insert_test
            set fileid_1 = #{item.fileid1,jdbcType=VARCHAR},
            fileid_2 = #{item.fileid2,jdbcType=VARCHAR},
            fileid_3 = #{item.fileid3,jdbcType=VARCHAR},
            fileid_4 = #{item.fileid4,jdbcType=VARCHAR},
            fileid_5 = #{item.fileid5,jdbcType=VARCHAR},
            fileid_6 = #{item.fileid6,jdbcType=VARCHAR},
            fileid_7 = #{item.fileid7,jdbcType=VARCHAR},
            fileid_8 = #{item.fileid8,jdbcType=VARCHAR},
            fileid_9 = #{item.fileid9,jdbcType=VARCHAR},
            create_date = #{item.createDate,jdbcType=TIMESTAMP}
            where id = #{item.id,jdbcType=INTEGER}
        </foreach>
    </update>

對應(yīng)測試代碼如下：

@Test
    public void updateByPrimaryKeyForeach() {
        PageHelper.startPage(PAGE, SIZE);

        List<BatchInsertTest> insertTestList = batchInsertTestMapper.selectByExample(null);
        StopWatch stopWatch = new StopWatch("updateByPrimaryKeyForeach");
        stopWatch.start();

        batchInsertTestMapper.updateByPrimaryKeyForeach(insertTestList);

        stopWatch.stop();

        log.info("使用updateByPrimaryKeyForeach更新完成,size:{},耗時:{}ms", insertTestList.size(), stopWatch.getLastTaskTimeMillis());
    }

耗時563ms左右，性能表現(xiàn)也還行，并且foreach操作會因?yàn)樽址唇硬賹?dǎo)致Packet for query is too large (106,100,142 > 4,194,304). You can change this value on the server by setting the 'max_allowed_packet' variable.即提交的SQL執(zhí)行數(shù)據(jù)包過大被拒絕的風(fēng)險:

2025-01-10 09:10:57.592  INFO 18332 --- [           main] c.s.mapper.BatchInsertTestMapperTest     : 使用updateByPrimaryKeyForeach更新完成,size:3000,耗時:563ms

4. 批處理更新

筆者希望可以一批更新操作可以一個批次的進(jìn)行提交，所以接下來介紹這種方案就是一次性組裝一批量的更新語句，然后一次性提交。

 /**
     * 使用批處理進(jìn)行更新
     */
    @Test
    public void updateBatch() {
        PageHelper.startPage(PAGE, SIZE);

        List<BatchInsertTest> insertTestList = batchInsertTestMapper.selectByExample(null);
        StopWatch stopWatch = new StopWatch("updateBatch");
        stopWatch.start();

        //創(chuàng)建一個進(jìn)行批處理操作的sqlsession組裝一批更新語句
        try (SqlSession sqlSession = sqlSessionFactory.openSession(ExecutorType.BATCH)) {
            BatchInsertTestMapper batchInsertTestMapper = sqlSession.getMapper(BatchInsertTestMapper.class);

            insertTestList.parallelStream()
                    .forEach(i -> {
                        batchInsertTestMapper.updateByPrimaryKey(i);
                    });
            //手動提交
            sqlSession.commit();
            stopWatch.stop();
        } catch (Exception e) {

        }

        log.info("批處理更新完成,size:{},耗時:{}ms", insertTestList.size(), stopWatch.getLastTaskTimeMillis());

    }

最終更新耗時為1s左右，相較于上述方案相對遜色一些，但是網(wǎng)絡(luò)IO的開銷以及MySQL的連接池使用都減小了，綜合起來性價比還是蠻高的：

2024-02-22 23:25:05.265  INFO 18844 --- [           main] c.s.mapper.BatchInsertTestMapperTest     : 批處理更新完成,size:3000,耗時:1566ms

5. case-when更新

最后一種case-when更新，語法如下，猛的一看比較復(fù)雜，實(shí)際理解起來還是蠻簡單的，對每個字段進(jìn)行set操作，例如:當(dāng)id等于1時，fileid_1則取id為1的那條數(shù)據(jù)的值，通過case分支實(shí)現(xiàn)一條SQL批量更新多條數(shù)據(jù)：

update batch_insert_test
  -- 當(dāng)id=1 則設(shè)置fileid_1為aaa
        set fileid_1=
            when 1 then aaa
  ....其余同理
   where id in (本次批處理的id列表)

所以結(jié)合mybatis框架的語法，我們得出下面這樣一個SQL語句：

<update id="updateBatch" parameterType="java.util.List">
        update batch_insert_test
        set fileid_1=
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid1,jdbcType=VARCHAR}
        </foreach>,
        fileid_2 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid2,jdbcType=VARCHAR}
        </foreach>,
        fileid_3 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid3,jdbcType=VARCHAR}
        </foreach>,
        fileid_4 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid4,jdbcType=VARCHAR}
        </foreach>,
        fileid_5 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid5,jdbcType=VARCHAR}
        </foreach>,
        fileid_6 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid6,jdbcType=VARCHAR}
        </foreach>,
        fileid_7 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid7,jdbcType=VARCHAR}
        </foreach>,
        fileid_8 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid8,jdbcType=VARCHAR}
        </foreach>,
        fileid_9 =
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.fileid9,jdbcType=VARCHAR}
        </foreach>,
        create_date=
        <foreach collection="list" item="item" index="index"
                 separator=" " open="case ID" close="end">
            when #{item.id} then #{item.createDate,jdbcType=TIMESTAMP}
        </foreach>
        where id in
        <foreach collection="list" index="index" item="item"
                 separator="," open="(" close=")">
            #{item.id,jdbcType=INTEGER}
        </foreach>
    </update>

對應(yīng)的Java代碼如下，比較簡單，筆者這里就不多做贅述了：

@Test
    public void updateDateByWhenCase() {
        PageHelper.startPage(PAGE, SIZE);

        List<BatchInsertTest> insertTestList = batchInsertTestMapper.selectByExample(null);
        StopWatch stopWatch = new StopWatch("updateBatch");
        stopWatch.start();

        batchInsertTestMapper.updateBatch(insertTestList);

        stopWatch.stop();

        log.info("使用when case更新完成,size:{},耗時:{}ms", insertTestList.size(), stopWatch.getLastTaskTimeMillis());
    }

最終可以看到耗時800毫秒左右，相較于批處理更加出色一些，而且網(wǎng)絡(luò)和連接池的開銷都是差不多的，但和foreach意義也可能存在數(shù)據(jù)包過大導(dǎo)致報錯的風(fēng)險：

2025-01-10 09:17:06.878  INFO 16788 --- [           main] c.s.mapper.BatchInsertTestMapperTest     : 使用when case更新完成,size:3000,耗時:738ms

小結(jié)

以上便是筆者本次大量壓測后得出的解決方案，總結(jié)如下：

如果網(wǎng)絡(luò)情況良好且MySQL連接池資源充分的情況下，筆者更推薦使用并行進(jìn)行逐條更新。
如果網(wǎng)絡(luò)情況不好或者M(jìn)ySQL資源緊張，筆者更推薦使用foreach更新，相較于同等一次性更新多條語句的when-case語法,它語法更簡單且執(zhí)行性能更好一些。
當(dāng)然如果一次性要更新比較大基數(shù)的數(shù)據(jù)，考慮到MySQL的傳輸packet size我們還是優(yōu)先考慮批處理這個性能和穩(wěn)定性處于折中的方案。

責(zé)任編輯：趙寧寧來源：寫代碼的SharkChili