自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<style id="wcgjz"><rp id="wcgjz"></rp></style>

<style id="wcgjz"></style><legend id="wcgjz"><track id="wcgjz"></track></legend><cite id="wcgjz"></cite>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

部署基于內(nèi)存存儲(chǔ)的 Elasticsearch - 一億+條數(shù)據(jù)，全文檢索 100ms 響應(yīng)

作者：陳少文 2024-06-05 11:23:14

AI 的算力節(jié)點(diǎn)有大量空閑的 CPU 和 Memory 資源，使用這些大內(nèi)存的主機(jī)節(jié)點(diǎn)，部署一些短生命周期的基于內(nèi)存存儲(chǔ)的高性能應(yīng)用，有利于提高資源的使用效率。

1. 在主機(jī)上掛載內(nèi)存存儲(chǔ)目錄

創(chuàng)建目錄用于掛載

mkdir /mnt/memory_storage

掛載 tmpfs 文件系統(tǒng)

mount -t tmpfs -o size=800G tmpfs /mnt/memory_storage

存儲(chǔ)空間會(huì)按需使用，也就是使用 100G 存儲(chǔ)時(shí)才會(huì)占用 100G 內(nèi)存。主機(jī)節(jié)點(diǎn)上有 2T 內(nèi)存，這里分配 800G 內(nèi)存用于存儲(chǔ) Elasticsearch 數(shù)據(jù)。

提前創(chuàng)建好目錄

mkdir /mnt/memory_storage/elasticsearch-data-es-jfs-prod-es-default-0
mkdir /mnt/memory_storage/elasticsearch-data-es-jfs-prod-es-default-1
mkdir /mnt/memory_storage/elasticsearch-data-es-jfs-prod-es-default-2

如果沒有提前創(chuàng)建好目錄，并賦予讀寫權(quán)限，會(huì)導(dǎo)致 Elasticsearch 組件起不來，提示多個(gè)節(jié)點(diǎn)使用了相同的數(shù)據(jù)目錄。

配置目錄權(quán)限

chmod -R 777 /mnt/memory_storage

DD 測試 IO 帶寬

dd if=/dev/zero of=/mnt/memory_storage/dd.txt bs=4M count=2500

2500+0 records in
2500+0 records out
10485760000 bytes (10 GB, 9.8 GiB) copied, 3.53769 s, 3.0 GB/s

清理文件

rm -rf /mnt/memory_storage/dd.txt

FIO 測試 IO 帶寬

fio --name=test --filename=/mnt/memory_storage/fio_test_file --size=10G --rw=write --bs=4M --numjobs=1 --runtime=60 --time_based

Run status group 0 (all jobs):
  WRITE: bw=2942MiB/s (3085MB/s), 2942MiB/s-2942MiB/s (3085MB/s-3085MB/s), io=172GiB (185GB), run=60001-60001msec

清理文件

rm -rf /mnt/memory_storage/fio_test_file

測試內(nèi)存 IO 帶寬

mbw 10000

Long uses 8 bytes. Allocating 2*1310720000 elements = 20971520000 bytes of memory.
Using 262144 bytes as blocks for memcpy block copy test.
Getting down to business... Doing 10 runs per test.
0 Method: MEMCPY Elapsed: 1.62143 MiB: 10000.00000 Copy: 6167.380 MiB/s
1 Method: MEMCPY Elapsed: 1.63542 MiB: 10000.00000 Copy: 6114.656 MiB/s
2 Method: MEMCPY Elapsed: 1.63345 MiB: 10000.00000 Copy: 6121.997 MiB/s
3 Method: MEMCPY Elapsed: 1.63715 MiB: 10000.00000 Copy: 6108.161 MiB/s
4 Method: MEMCPY Elapsed: 1.64429 MiB: 10000.00000 Copy: 6081.667 MiB/s
5 Method: MEMCPY Elapsed: 1.62772 MiB: 10000.00000 Copy: 6143.574 MiB/s
6 Method: MEMCPY Elapsed: 1.60684 MiB: 10000.00000 Copy: 6223.379 MiB/s
7 Method: MEMCPY Elapsed: 1.62499 MiB: 10000.00000 Copy: 6153.876 MiB/s
8 Method: MEMCPY Elapsed: 1.63967 MiB: 10000.00000 Copy: 6098.770 MiB/s
9 Method: MEMCPY Elapsed: 2.97213 MiB: 10000.00000 Copy: 3364.588 MiB/s
AVG Method: MEMCPY Elapsed: 1.76431 MiB: 10000.00000 Copy: 5667.937 MiB/s
0 Method: DUMB Elapsed: 1.01521 MiB: 10000.00000 Copy: 9850.140 MiB/s
1 Method: DUMB Elapsed: 0.85378 MiB: 10000.00000 Copy: 11712.605 MiB/s
2 Method: DUMB Elapsed: 0.82487 MiB: 10000.00000 Copy: 12123.167 MiB/s
3 Method: DUMB Elapsed: 0.84520 MiB: 10000.00000 Copy: 11831.463 MiB/s
4 Method: DUMB Elapsed: 0.83050 MiB: 10000.00000 Copy: 12040.968 MiB/s
5 Method: DUMB Elapsed: 0.84932 MiB: 10000.00000 Copy: 11774.194 MiB/s
6 Method: DUMB Elapsed: 0.82491 MiB: 10000.00000 Copy: 12122.505 MiB/s
7 Method: DUMB Elapsed: 1.44235 MiB: 10000.00000 Copy: 6933.144 MiB/s
8 Method: DUMB Elapsed: 2.68656 MiB: 10000.00000 Copy: 3722.225 MiB/s
9 Method: DUMB Elapsed: 8.44667 MiB: 10000.00000 Copy: 1183.898 MiB/s
AVG Method: DUMB Elapsed: 1.86194 MiB: 10000.00000 Copy: 5370.750 MiB/s
0 Method: MCBLOCK Elapsed: 4.52486 MiB: 10000.00000 Copy: 2210.013 MiB/s
1 Method: MCBLOCK Elapsed: 4.82467 MiB: 10000.00000 Copy: 2072.683 MiB/s
2 Method: MCBLOCK Elapsed: 0.84797 MiB: 10000.00000 Copy: 11792.870 MiB/s
3 Method: MCBLOCK Elapsed: 0.84980 MiB: 10000.00000 Copy: 11767.516 MiB/s
4 Method: MCBLOCK Elapsed: 0.87665 MiB: 10000.00000 Copy: 11407.113 MiB/s
5 Method: MCBLOCK Elapsed: 0.85952 MiB: 10000.00000 Copy: 11634.468 MiB/s
6 Method: MCBLOCK Elapsed: 0.84132 MiB: 10000.00000 Copy: 11886.154 MiB/s
7 Method: MCBLOCK Elapsed: 0.84970 MiB: 10000.00000 Copy: 11768.915 MiB/s
8 Method: MCBLOCK Elapsed: 0.86918 MiB: 10000.00000 Copy: 11505.150 MiB/s
9 Method: MCBLOCK Elapsed: 0.85996 MiB: 10000.00000 Copy: 11628.434 MiB/s
AVG Method: MCBLOCK Elapsed: 1.62036 MiB: 10000.00000 Copy: 6171.467 MiB/s

看起來將內(nèi)存掛載為文件系統(tǒng)的 IO 帶寬只能達(dá)到內(nèi)存的 IO 帶寬的一半。

2. 在 Kubernetes 集群上創(chuàng)建 PVC

配置環(huán)境變量

export NAMESPACE=data-center
export PVC_NAME=elasticsearch-data-es-jfs-prod-es-default-0

創(chuàng)建 PV 及 PVC

kubectl create -f - <<EOF
apiVersion: v1
kind: PersistentVolume
metadata:
  name: ${PVC_NAME}
  namespace: ${NAMESPACE}
spec:
  accessModes:
    - ReadWriteMany
  capacity:
    storage: 800Gi
  hostPath:
    path: /mnt/memory_storage/${PVC_NAME}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: ${PVC_NAME}
  namespace: ${NAMESPACE}
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 800Gi
EOF

通過修改 PVC_NAME 變量創(chuàng)建至少 3 個(gè) PVC 應(yīng)用，最終我創(chuàng)建了 20 個(gè) PVC，總共提供了 15+ TB 的存儲(chǔ)。

3. 部署 Elasticsearch 相關(guān)組件

此處省略了部分內(nèi)容，詳情參考使用 JuiceFS 存儲(chǔ) Elasticsearch 數(shù)據(jù)[1]。

部署 Elasticsearch

cat <<EOF | kubectl apply -f -
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  namespace: $NAMESPACE
  name: es-jfs-prod
spec:
  version: 8.3.2
  image: hubimage/elasticsearch:8.3.2
  http:
    tls:
      selfSignedCertificate:
        disabled: true
  nodeSets:
  - name: default
    count: 3
    config:
      node.store.allow_mmap: false
      index.store.type: niofs
    podTemplate:
      spec:
        nodeSelector:
          servertype: Ascend910B-24
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
            runAsUser: 0
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        - name: install-plugins
          command:
            - sh
            - -c
            - |
              bin/elasticsearch-plugin install --batch https://get.infini.cloud/elasticsearch/analysis-ik/8.3.2
          securityContext:
            runAsUser: 0
            runAsGroup: 0
        containers:
        - name: elasticsearch
          readinessProbe:
            exec:
              command:
              - bash
              - -c
              - /mnt/elastic-internal/scripts/readiness-probe-script.sh
            failureThreshold: 10
            initialDelaySeconds: 30
            periodSeconds: 30
            successThreshold: 1
            timeoutSeconds: 30
          env:
            - name: "ES_JAVA_OPTS"
              value: "-Xms31g -Xmx31g"
            - name: "NSS_SDB_USE_CACHE"
              value: "no"
          resources:
            requests:
              cpu: 8
              memory: 64Gi
EOF

查看 Elasticsearch 密碼

kubectl -n $NAMESPACE get secret es-jfs-prod-es-elastic-user -o go-template='{{.data.elastic | base64decode}}'

xxx

默認(rèn)用戶名是 elastic

部署 Metricbeat

kubectl apply -f - <<EOF
apiVersion: beat.k8s.elastic.co/v1beta1
kind: Beat
metadata:
  name: es-jfs-prod
  namespace: $NAMESPACE
spec:
  type: metricbeat
  version: 8.3.2
  elasticsearchRef:
    name: es-jfs-prod
  config:
    metricbeat:
      autodiscover:
        providers:
          - type: kubernetes
            scope: cluster
            hints.enabled: true
            templates:
              - config:
                  - module: kubernetes
                    metricsets:
                      - event
                    period: 10s
    processors:
    - add_cloud_metadata: {}
    logging.json: true
  deployment:
    podTemplate:
      spec:
        serviceAccountName: metricbeat
        automountServiceAccountToken: true
        # required to read /etc/beat.yml
        securityContext:
          runAsUser: 0
EOF

部署 Kibana

cat <<EOF | kubectl apply -f -
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  namespace: $NAMESPACE
  name: es-jfs-prod
spec:
  version: 8.3.2
  count: 1
  image: hubimage/kibana:8.3.2
  elasticsearchRef:
    name: es-jfs-prod
  http:
    tls:
      selfSignedCertificate:
        disabled: true
EOF

查看 Elasticsearch 集群信息

圖片

4. 導(dǎo)入數(shù)據(jù)

創(chuàng)建索引

在 Elasticsearch Management 的 Dev Tools 頁面中執(zhí)行:

PUT /bayou_tt_articles
{
  "settings": {
    "index": {
      "number_of_shards": 30,
      "number_of_replicas": 1,
      "refresh_interval": "120s",
      "translog.durability": "async",
      "translog.sync_interval": "120s",
      "translog.flush_threshold_size": "2048M"
    }
  },
  "mappings": {
  "properties": {
      "text": {
        "type": "text",
        "analyzer": "ik_smart"
      }
    }
  }
}

有兩個(gè)注意事項(xiàng):

保持每個(gè)分片大小在 10-50G 之間，這里 number_of_shards 設(shè)置為 30，因?yàn)橐还灿袔装?GB 的數(shù)據(jù)需要導(dǎo)入。
副本數(shù)至少為 1，是為了保障 Pod 在滾動(dòng)更新時(shí)不會(huì)丟失數(shù)據(jù)。當(dāng) Pod 的 IP 發(fā)生變化時(shí)，Elasticsearch 會(huì)認(rèn)為是一個(gè)新的節(jié)點(diǎn)，不能復(fù)用之前的數(shù)據(jù)，此時(shí)如果沒有副本重建分片，會(huì)導(dǎo)致數(shù)據(jù)丟失。

安裝導(dǎo)入工具

也可以采用 elasticdump 容器導(dǎo)入，下面也會(huì)有示例。這里采用 npm 安裝。

apt-get install npm -y

npm install elasticdump -g

導(dǎo)入數(shù)據(jù)

export DATAPATH=./bayou_tt_articles_0.jsonl
nohup elasticdump --limit 20000 --input=${DATAPATH} --output=http://elastic:xxx@x.x.x.x:31391/ --output-index=bayou_tt_articles --type=data --transform="doc._source=Object.assign({},doc)" > elasticdump-${DATAPATH}.log 2>&1 &

limit 表示每次導(dǎo)入的數(shù)據(jù)條數(shù)，默認(rèn)值是 100 太小，建議在保障導(dǎo)入成功的前提下盡可能大一點(diǎn)。

查看索引速率

圖片

索引速率達(dá)到 1w+/s，但上限遠(yuǎn)不止于此。因?yàn)?，根?jù)社區(qū)文檔的壓力測試結(jié)果顯示，單個(gè)節(jié)點(diǎn)至少能提供 2W/s 的索引速率。

5. 測試與驗(yàn)證

全文檢索性能顯著提升

圖片

上圖是使用 JuiceFS 存儲(chǔ)的全文檢索速度為 18s，使用 SSD 節(jié)點(diǎn)的 Elasticsearch 的全文檢索速度為 5s。下圖是使用內(nèi)存存儲(chǔ)的 Elasticsearch 的全文檢索速度為 100ms 左右。

圖片

更新 Elasticsearch 不會(huì)丟數(shù)據(jù)

之前給 Elasticsearch Pod 分配的 CPU 和 Memory 太多，調(diào)整為 CPU 32C，Memory 64 GB。在滾動(dòng)更新過程中，Elasticsearch 始終可用，并且數(shù)據(jù)沒有丟失。

但務(wù)必注意設(shè)置 replicas > 1，盡量不要自行重啟 Pod，雖然 Pod 是原節(jié)點(diǎn)更新。

能平穩(wěn)實(shí)現(xiàn)節(jié)點(diǎn)的擴(kuò)容

圖片

由于業(yè)務(wù)總的 Elasticsearch 存儲(chǔ)需求是 10T 左右，我繼續(xù)增加節(jié)點(diǎn)到 10 個(gè)，Elasticsearch 的索引分片會(huì)自動(dòng)遷移，均勻分布在這些節(jié)點(diǎn)上。

導(dǎo)出索引速度達(dá) 1w 條每秒

docker run --rm -ti elasticdump/elasticsearch-dump --limit 10000 --input=http://elastic:xxx@x.x.x.x:31391/bayou_tt_articles --output=/data/es-bayou_tt_articles-output.json --type=data

Wed, 29 May 2024 01:41:23 GMT | got 10000 objects from source elasticsearch (offset: 0)
Wed, 29 May 2024 01:41:23 GMT | sent 10000 objects to destination file, wrote 10000
Wed, 29 May 2024 01:41:24 GMT | got 10000 objects from source elasticsearch (offset: 10000)
Wed, 29 May 2024 01:41:24 GMT | sent 10000 objects to destination file, wrote 10000
Wed, 29 May 2024 01:41:25 GMT | got 10000 objects from source elasticsearch (offset: 20000)
Wed, 29 May 2024 01:41:25 GMT | sent 10000 objects to destination file, wrote 10000
Wed, 29 May 2024 01:41:25 GMT | got 10000 objects from source elasticsearch (offset: 30000)

導(dǎo)出速度能達(dá)到 1w 條每秒，一億條數(shù)據(jù)大約需要 3h，基本也能滿足索引的備份、遷移需求。

Elasticsearch 節(jié)點(diǎn) Pod 更新時(shí)，不會(huì)發(fā)生漂移

更新之前的 Pod 分布節(jié)點(diǎn)如下：

NAME                                           READY   STATUS    RESTARTS      AGE   IP               NODE                         NOMINATED NODE   READINESS GATES
es-jfs-prod-beat-metricbeat-7fbdd657c4-djgg6   1/1     Running   6 (32m ago)   18h   10.244.54.5      ascend-01   <none>           <none>
es-jfs-prod-es-default-0                       1/1     Running   0             28m   10.244.46.82     ascend-07   <none>           <none>
es-jfs-prod-es-default-1                       1/1     Running   0             29m   10.244.23.77     ascend-53   <none>           <none>
es-jfs-prod-es-default-2                       1/1     Running   0             31m   10.244.49.65     ascend-20   <none>           <none>
es-jfs-prod-es-default-3                       1/1     Running   0             32m   10.244.54.14     ascend-01   <none>           <none>
es-jfs-prod-es-default-4                       1/1     Running   0             34m   10.244.100.239   ascend-40   <none>           <none>
es-jfs-prod-es-default-5                       1/1     Running   0             35m   10.244.97.201    ascend-39   <none>           <none>
es-jfs-prod-es-default-6                       1/1     Running   0             37m   10.244.101.156   ascend-38   <none>           <none>
es-jfs-prod-es-default-7                       1/1     Running   0             39m   10.244.19.101    ascend-49   <none>           <none>
es-jfs-prod-es-default-8                       1/1     Running   0             40m   10.244.16.109    ascend-46   <none>           <none>
es-jfs-prod-es-default-9                       1/1     Running   0             41m   10.244.39.119    ascend-15   <none>           <none>
es-jfs-prod-kb-75f7bbd96-6tcrn                 1/1     Running   0             18h   10.244.1.164     ascend-22   <none>           <none>

更新之后的 Pod 分布節(jié)點(diǎn)如下：

NAME                                           READY   STATUS    RESTARTS      AGE     IP               NODE                         NOMINATED NODE   READINESS GATES
es-jfs-prod-beat-metricbeat-7fbdd657c4-djgg6   1/1     Running   6 (50m ago)   18h     10.244.54.5      ascend-01   <none>           <none>
es-jfs-prod-es-default-0                       1/1     Running   0             72s     10.244.46.83     ascend-07   <none>           <none>
es-jfs-prod-es-default-1                       1/1     Running   0             2m35s   10.244.23.78     ascend-53   <none>           <none>
es-jfs-prod-es-default-2                       1/1     Running   0             3m59s   10.244.49.66     ascend-20   <none>           <none>
es-jfs-prod-es-default-3                       1/1     Running   0             5m34s   10.244.54.15     ascend-01   <none>           <none>
es-jfs-prod-es-default-4                       1/1     Running   0             7m21s   10.244.100.240   ascend-40   <none>           <none>
es-jfs-prod-es-default-5                       1/1     Running   0             8m44s   10.244.97.202    ascend-39   <none>           <none>
es-jfs-prod-es-default-6                       1/1     Running   0             10m     10.244.101.157   ascend-38   <none>           <none>
es-jfs-prod-es-default-7                       1/1     Running   0             11m     10.244.19.102    ascend-49   <none>           <none>
es-jfs-prod-es-default-8                       1/1     Running   0             13m     10.244.16.110    ascend-46   <none>           <none>
es-jfs-prod-es-default-9                       1/1     Running   0             14m     10.244.39.120    ascend-15   <none>           <none>
es-jfs-prod-kb-75f7bbd96-6tcrn                 1/1     Running   0             18h     10.244.1.164     ascend-22   <none>           <none>

這點(diǎn)打消了我的一個(gè)顧慮， Elasticsearch 的 Pod 重啟時(shí)，發(fā)生了漂移，那么節(jié)點(diǎn)上是否會(huì)殘留分片的數(shù)據(jù)，導(dǎo)致內(nèi)存使用不斷膨脹？答案是，不會(huì)。ECK Operator 似乎能讓 Pod 在原節(jié)點(diǎn)進(jìn)行重啟，掛載的 Hostpath 數(shù)據(jù)依然對新的 Pod 有效，僅當(dāng)主機(jī)節(jié)點(diǎn)發(fā)生重啟時(shí)，才會(huì)丟失數(shù)據(jù)。

6. 總結(jié)

AI 的算力節(jié)點(diǎn)有大量空閑的 CPU 和 Memory 資源，使用這些大內(nèi)存的主機(jī)節(jié)點(diǎn)，部署一些短生命周期的基于內(nèi)存存儲(chǔ)的高性能應(yīng)用，有利于提高資源的使用效率。

本篇主要介紹了借助于 Hostpath 的內(nèi)存存儲(chǔ)部署 Elasticsearch 提供高性能查詢能力的方案，具體內(nèi)容如下：

將內(nèi)存 mount 目錄到主機(jī)上
創(chuàng)建基于 Hostpath 的 PVC，將數(shù)據(jù)掛載到上述目錄
使用 ECK Operator 部署 Elasticsearch
Elasticsearch 更新時(shí)，數(shù)據(jù)并不會(huì)丟失，但不能同時(shí)重啟多個(gè)主機(jī)節(jié)點(diǎn)
300+GB、一億+條數(shù)據(jù)，全文檢索響應(yīng)場景中，基于 JuiceFS 存儲(chǔ)的速度為 18s， SSD 節(jié)點(diǎn)的速度為 5s，內(nèi)存節(jié)點(diǎn)的速度為 100ms

參考資料

[1]使用 JuiceFS 存儲(chǔ) Elasticsearch 數(shù)據(jù): https://www.chenshaowen.com/blog/store-elasticsearch-data-in-juicefs.html

責(zé)任編輯：武曉燕來源：陳少文

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<strong id="1y97b"><track id="1y97b"></track></strong>