數(shù)據(jù)庫(kù)壓縮到底怎么做?
redis
redis的壓縮是針對(duì)key的壓縮
只針對(duì)string和list的value
所有的壓縮最終都會(huì)調(diào)用lzf_compress/lzf_decompress
需要配置文件配置rdb_compression rdb壓縮才會(huì)生效
lzf壓縮限制長(zhǎng)度要大于20,即使是aaaaaaaaaaaaaaaaaaaa也壓不了,大于20才能壓。原因沒有深究
rdb內(nèi)部的壓縮
- 如何確認(rèn)這個(gè)record是被壓縮/解壓的?
rdb解析每條數(shù)據(jù),都有標(biāo)識(shí)字段,壓縮的record自然是單獨(dú)的類型
- ssize_t rdbSaveLzfStringObject(rio *rdb, unsigned char *s, size_t len) {
- ...
- comprlen = lzf_compress(s, len, out, outlen);
- if (comprlen == 0) {
- zfree(out);
- return 0;
- }
- ssize_t nwritten = rdbSaveLzfBlob(rdb, out, comprlen, len);
- ...
- }
- ssize_t rdbSaveLzfBlob(rio *rdb, void *data, size_t compress_len,
- size_t original_len) {
- ...
- /* Data compressed! Let's save it on disk */
- byte = (RDB_ENCVAL<<6)|RDB_ENC_LZF;
- if ((n = rdbWriteRaw(rdb,&byte,1)) == -1) goto writeerr;
- nwritten += n;
- ...
- }
解壓縮
- void *rdbGenericLoadStringObject(rio *rdb, int flags, size_t *lenptr) {
- ...
- if (isencoded) {
- switch(len) {
- case RDB_ENC_INT8:
- case RDB_ENC_INT16:
- case RDB_ENC_INT32:
- return rdbLoadIntegerObject(rdb,len,flags,lenptr);
- case RDB_ENC_LZF:
- return rdbLoadLzfStringObject(rdb,flags,lenptr);
- default:
- rdbReportCorruptRDB("Unknown RDB string encoding type %llu",len);
- return NULL;
- }
- }
- ...
- void *rdbLoadLzfStringObject(rio *rdb, int flags, size_t *lenptr) {
- ...
- /* Load the compressed representation and uncompress it to target. */
- if (rioRead(rdb,c,clen) == 0) goto err;
- if (lzf_decompress(c,clen,val,len) != len) {
- rdbReportCorruptRDB("Invalid LZF compressed string");
- ...
- }
接口簡(jiǎn)單容易定位
所有的類型string/hash具體到底層,都是string,就會(huì)走這個(gè)壓縮的過程rdbSaveRawString,內(nèi)部來調(diào)用rdbSaveLzfStringObject
- ssize_t rdbSaveObject(rio *rdb, robj *o, robj *key, int dbid) {
- ssize_t n = 0, nwritten = 0;
- if (o->type == OBJ_STRING) {
- /* Save a string value */
- if ((n = rdbSaveStringObject(rdb,o)) == -1) return -1;
- nwritten += n;
- } else if (o->type == OBJ_LIST) {
- if (quicklistNodeIsCompressed(node)) {
- void *data;
- size_t compress_len = quicklistGetLzf(node, &data);
- if ((n = rdbSaveLzfBlob(rdb,data,compress_len,node->sz)) == -1) return -1;
- nwritten += n;
- } else {
- if ((n = rdbSaveRawString(rdb,node->zl,node->sz)) == -1) return -1;
- nwritten += n;
- }
- node = node->next;
- }
- } else {
- serverPanic("Unknown list encoding");
- }
- 。。。
- }
quicklist的壓縮
鏈表壓縮可以選擇深度,quicklist是redis list的底層數(shù)據(jù)結(jié)構(gòu)
什么時(shí)候做壓縮?
- /* Insert 'new_node' after 'old_node' if 'after' is 1.
- * Insert 'new_node' before 'old_node' if 'after' is 0.
- * Note: 'new_node' is *always* uncompressed, so if we assign it to
- * head or tail, we do not need to uncompress it. */
- REDIS_STATIC void __quicklistInsertNode(quicklist *quicklist,
- quicklistNode *old_node,
- quicklistNode *new_node, int after) {
- if (after) {
- new_node->prev = old_node;
- if (old_node) {
- new_node->next = old_node->next;
- if (old_node->next)
- old_node->next->prev = new_node;
- old_node->next = new_node;
- }
- if (quicklist->tail == old_node)
- quicklist->tail = new_node;
- } else {
- new_node->next = old_node;
- if (old_node) {
- new_node->prev = old_node->prev;
- if (old_node->prev)
- old_node->prev->next = new_node;
- old_node->prev = new_node;
- }
- if (quicklist->head == old_node)
- quicklist->head = new_node;
- }
- /* If this insert creates the only element so far, initialize head/tail. */
- if (quicklist->len == 0) {
- quicklist->head = quicklist->tail = new_node;
- }
- /* Update len first, so in __quicklistCompress we know exactly len */
- quicklist->len++;
- if (old_node)
- quicklistCompress(quicklist, old_node);
- }
也就是說,頭尾不會(huì)壓縮,其他的節(jié)點(diǎn)會(huì)壓縮,在修改的時(shí)候同事把舊的節(jié)點(diǎn)給壓縮了
這里有個(gè)問題,這里的節(jié)點(diǎn)壓縮了,rdb存儲(chǔ)的時(shí)候還要特別處理一下,判定已經(jīng)壓縮過,走rdbSaveLzfBlob
需要有個(gè)record頭來記錄一個(gè)compression的標(biāo)記
rocksdb
類似redis,還是很好找的,UncompressData/CompressData
針對(duì)sst的壓縮
調(diào)用關(guān)系
UncompressBlockContentsForCompressionType -> UncompressData
WriteBlock/BGWorkCompression -> CompressAndVerifyBlock -> CompressBlock -> CompressData
block本身有信息標(biāo)記是否是壓縮
寫入的時(shí)候才壓縮
blobdb
CompressBlobIfNeeded -> CompressData
GetCompressedSlice -> CompressData
總結(jié)
- 需要文件本身知道自己是壓縮的,有元信息記錄
- 在內(nèi)存中是否壓縮要考慮業(yè)務(wù)場(chǎng)景,比如redis這個(gè)quicklist 壓縮,因?yàn)閘ist最近訪問的就是頭尾,其他不重要