自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<big id="df4dh"><source id="df4dh"></source></big>

<samp id="df4dh"></samp>

<tfoot id="df4dh"><source id="df4dh"></source></tfoot>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

后端開發(fā)必備：ElasticSearch入門與進階

作者：程序猿技術(shù)充電站 2025-01-02 10:58:27

大數(shù)據(jù) 數(shù)據(jù)分析

Kibana 是一款 ES 的數(shù)據(jù)可視化和管理工具，可以提供直方圖，線形圖，餅狀圖，地圖。Kibana 還包含 Canvas 和 Elastic Maps 等應(yīng)用程序。Canvas 可以基于用戶創(chuàng)建動態(tài)信息。Elastic Maps 可以對空間數(shù)據(jù)進行可視化處理。

ES 是一個分布式的開源搜索和分析引擎，適用于文本、數(shù)字、地理空間、結(jié)構(gòu)化數(shù)據(jù)、非結(jié)構(gòu)化數(shù)據(jù)等數(shù)據(jù)的搜索。ES 是在 Apache Lucene 的基礎(chǔ)上完成開發(fā)。由 Elastic 于 2010 年發(fā)布。ES 通過其簡單的 REST 風(fēng)格的 API、分布式特性、速度和可擴容聞名世界。是 Elastic Stack 的核心組件。Elastic Stack 是一套用于數(shù)據(jù)采集、擴充、保存、分析、可視化的開源工具。Elastic Stack 稱之為 ELK。目前 ELK 包含一系列豐富的輕量數(shù)據(jù)采集代理，這些代理被稱之為 Beats。

ES 的用途

主要有以下的用途：

? 應(yīng)用程序搜索

? 網(wǎng)站搜索

? 企業(yè)搜索

? 日志處理

? 基礎(chǔ)設(shè)施指標(biāo)和容器監(jiān)測

? 應(yīng)用程序性能監(jiān)測

? 地理空間數(shù)據(jù)分析和可視化

? 安全分析

? 業(yè)務(wù)分析

工作原理

從多個來源輸入到 ES 中，數(shù)據(jù)在 ES 中進行索引和解析，標(biāo)準(zhǔn)化并充實這些數(shù)據(jù)。這些數(shù)據(jù)在 ES 中索引完成之后，用戶就可以針對他們的數(shù)據(jù)進行復(fù)雜的查詢，并使用聚合來檢索這些數(shù)據(jù)，在 Kibana 中，用戶可以創(chuàng)建數(shù)據(jù)可視化面板，并對 ELK 進行管理。

索引

ES 索引是指相互關(guān)聯(lián)的文檔集合。ES 是會以 JSON 文檔的形式保存數(shù)據(jù)，每個文檔都會在一組鍵值對中建立聯(lián)系。

ES 使用的是一種倒排序索引的數(shù)據(jù)結(jié)構(gòu)。這個結(jié)構(gòu)可以允許十分快速的進行全文本的搜索。

在索引的過程中，ES 會保存文檔并構(gòu)建倒排序索引，這樣用戶就可以實時的對文檔數(shù)據(jù)進行搜索。索引是在添加過程中就啟動的。

Logstash

Logstash 是 ELK 的核心菜品，可以對數(shù)據(jù)進行聚合和處理。并將數(shù)據(jù)發(fā)送到 ES 中。Logstash 是一個開源的服務(wù)器端數(shù)據(jù)處理管道。

Kibana

Kibana 是一款 ES 的數(shù)據(jù)可視化和管理工具，可以提供直方圖，線形圖，餅狀圖，地圖。Kibana 還包含 Canvas 和 Elastic Maps 等應(yīng)用程序。Canvas 可以基于用戶創(chuàng)建動態(tài)信息。Elastic Maps 可以對空間數(shù)據(jù)進行可視化處理。

為什么要使用 ES

? ES 很快：ES 是在 Lucene 基礎(chǔ)上構(gòu)建，所以全文本搜索相當(dāng)?shù)某錾?。ES 還是一個實時搜索平臺。文檔索引操作到文檔變?yōu)榭伤阉髦g速度很快。

? ES 具有分布式的特征：ES 中保存的文檔分布在不同的容器中，這些容器為分片，可以對分片進行復(fù)制并形成冗余副本。ES 可以擴充到數(shù)百臺，并處理 PB 級別的數(shù)據(jù)。

? ES 包含一系列廣泛的功能：ES 擁有大量的內(nèi)置功能，方便用戶管理數(shù)據(jù)。

? ES 簡化了數(shù)據(jù)采集，可視化報告的過程：通過與 Beats 和 Logstash 集成，用戶可以在 ES 中索引數(shù)據(jù)并處理數(shù)據(jù)，

ES 搭建

安裝鏡像：

docker pull docker.elastic.co/elasticsearch/elasticsearch:7.3.2

啟動容器：

docker run -d --name es -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.3.2

修改配置：

#進入 docker 容器內(nèi)部

docker exec -it es /bin/bash

#打開配置文件

vim config/elasticsearch.yml

## 加入跨域配置

http.cors.enabled: true

http.cors.allow-origin: "*"

進入容器，安裝分詞器：

docker exec -it es /bin/bash
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.3.2/elasticsearch-analysis-ik-7.3.2.zip

重啟 ES：

docker restart es

測試：

圖片

查看分詞器是否安裝上：

圖片

ES 查詢

空查詢

空查詢將會返回一個索引庫中所有文檔：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{}
'

在一個或者多個索引庫或者所有的 _type 中查詢：

GET /index_2014*/type1,type2/_search
{}

使用分頁：

GET /_search
{
  "from": 30,
  "size": 10
}

查詢表達(dá)式

只需要在查詢上，將語句傳遞給 queue 參數(shù)：

GET /_search
{
    "query": YOUR_QUERY_HERE
}

查詢語句的結(jié)構(gòu)

一個查詢的典型結(jié)構(gòu)：

{
    QUERY_NAME: {
        ARGUMENT: VALUE,
        ARGUMENT: VALUE,...
    }
}

針對某個字段：

{
    QUERY_NAME: {
        FIELD_NAME: {
            ARGUMENT: VALUE,
            ARGUMENT: VALUE,...
        }
    }
}

如果想要使用 match 查詢 tewwt 字段中包含 elasticsesh 的內(nèi)容。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match": {
            "tweet": "elasticsearch"
        }
    }
}
'

合并查詢

分為葉子語句，被用于將查詢字符串和字段進行對比，復(fù)合語句用于合并其他查詢語句。

例如下面語句：找出信件正文包含 business opportunity 的星標(biāo)郵件，或者在郵件正文包含 business opportunity 的非垃圾郵件：

{
    "bool": {
        "must": { "match":   { "email": "business opportunity" }},
        "should": [
            { "match":       { "starred": true }},
            { "bool": {
                "must":      { "match": { "folder": "inbox" }},
                "must_not":  { "match": { "spam": true }}
            }}
        ],
        "minimum_should_match": 1
    }
}

常用查詢

match_all 查詢

該查詢匹配所有文檔：

{ "match_all": {}}

match 查詢

用于使用分詞器進行查詢：

{ "match": { "tweet": "About Search" }}

multi_match 查詢

用于在多個字段上執(zhí)行相同更多 match 查詢：

{
    "multi_match": {
        "query":    "full text search",
        "fields":   [ "title", "body" ]
    }
}

range 查詢

用于找出在指定區(qū)間內(nèi)的數(shù)字或者時間：

{
    "range": {
        "age": {
            "gte":  20,
            "lt":   30
        }
    }
}

term 查詢

用于進行精確匹配：

{ "term": { "age":    26           }}
{ "term": { "date":   "2014-09-01" }}
{ "term": { "public": true         }}
{ "term": { "tag":    "full_text"  }}

terms 查詢

用于進行多值匹配：

{ "terms": { "tag": [ "search", "full_text", "nosql" ] }}

exists 查詢和 missing 查詢

用于查詢在指定字段中有值或者無值的文檔：

{
    "exists":   {
        "field":    "title"
    }
}

ES 索引

創(chuàng)建一個索引

PUT /my_index
{
    "settings": { ... any settings ... },
    "mappings": {
        "type_one": { ... any mappings ... },
        "type_two": { ... any mappings ... },
        ...
    }
}

此刻，ES 會自動創(chuàng)建一個索引。

刪除一個索引

DELETE /my_index

索引設(shè)置

? number_of_shards：每個索引的主分片數(shù)

? number_of_replicas：每個主分片的副本數(shù)

創(chuàng)建只有一個主分片，沒有副本的小索引：

PUT /my_temp_index
{
    "settings": {
        "number_of_shards" :   1,
        "number_of_replicas" : 0
    }
}

配置分析器

standard 分析器是用于全文字段的默認(rèn)分析器，包含以下部分：

? standard 分詞器，通過單詞邊界分割輸入的文本。

? standard 語匯單元過濾器，目的是整理分詞器觸發(fā)的語匯單元（但是目前什么都沒做）。

? lowercase 語匯單元過濾器，轉(zhuǎn)換所有的語匯單元為小寫。

? stop 語匯單元過濾器，刪除停用詞—對搜索相關(guān)性影響不大的常用詞，如 a、the、and、is。

在下面的例子中，創(chuàng)建了一個新的分析器 es_std，并使用預(yù)定義的西班牙語停用詞列表。

PUT /spanish_docs
{
    "settings": {
        "analysis": {
            "analyzer": {
                "es_std": {
                    "type":      "standard",
                    "stopwords": "_spanish_"
                }
            }
        }
    }
}

進行測試：

curl -X GET "localhost:9200/spanish_docs/_analyze?analyzer=es_std&pretty" -H 'Content-Type: application/json' -d'
El veloz zorro marrón
'

通過結(jié)果進行查看：

{
  "tokens" : [
    { "token" :    "veloz",   "position" : 2 },
    { "token" :    "zorro",   "position" : 3 },
    { "token" :    "marrón",  "position" : 4 }
  ]
}

自定義分析器

在 analysis 下的相應(yīng)位置設(shè)置字符過濾器，分詞過濾器，詞單元過濾器。

PUT /my_index
{
    "settings": {
        "analysis": {
            "char_filter": { ... custom character filters ... },
            "tokenizer":   { ...    custom tokenizers     ... },
            "filter":      { ...   custom token filters   ... },
            "analyzer":    { ...    custom analyzers      ... }
        }
    }
}

接著創(chuàng)建一個自定義分析器，用于清楚 html 部分，將 & 映射為 and：

"char_filter": {
    "&_to_and": {
        "type":       "mapping",
        "mappings": [ "&=> and "]
    }
}

使用標(biāo)準(zhǔn)分詞器諷刺，小寫詞條使用小寫過濾，使用自定義停止詞過濾器移除自定義的停止詞列表中包含的詞。

"filter": {
    "my_stopwords": {
        "type":        "stop",
        "stopwords": [ "the", "a" ]
    }
}

最后使用分析器，自定義組合過濾器和分詞器。

"analyzer": {
    "my_analyzer": {
        "type":           "custom",
        "char_filter":  [ "html_strip", "&_to_and" ],
        "tokenizer":      "standard",
        "filter":       [ "lowercase", "my_stopwords" ]
    }
}

總和如下所示：

curl -X PUT "localhost:9200/my_index?pretty" -H 'Content-Type: application/json' -d'
{
    "settings": {
        "analysis": {
            "char_filter": {
                "&_to_and": {
                    "type":       "mapping",
                    "mappings": [ "&=> and "]
            }},
            "filter": {
                "my_stopwords": {
                    "type":       "stop",
                    "stopwords": [ "the", "a" ]
            }},
            "analyzer": {
                "my_analyzer": {
                    "type":         "custom",
                    "char_filter":  [ "html_strip", "&_to_and" ],
                    "tokenizer":    "standard",
                    "filter":       [ "lowercase", "my_stopwords" ]
            }}
}}}
'

測試一下：

curl -X GET "localhost:9200/my_index/_analyze?analyzer=my_analyzer&pretty" -H 'Content-Type: application/json' -d'
The quick & brown fox
'

可以看到結(jié)果如下所示：

{
  "tokens" : [
      { "token" :   "quick",    "position" : 2 },
      { "token" :   "and",      "position" : 3 },
      { "token" :   "brown",    "position" : 4 },
      { "token" :   "fox",      "position" : 5 }
    ]
}

最后，把這個分詞器用在 string 字段上：

curl -X PUT "localhost:9200/my_index/_mapping/my_type?pretty" -H 'Content-Type: application/json' -d'
{
    "properties": {
        "title": {
            "type":      "string",
            "analyzer":  "my_analyzer"
        }
    }
}
'

類型和映射

Lucene 如何處理文檔

在 Lucene 中一個文檔由鍵值對組成。在索引文檔的時候，每個字段的值都會添加到相關(guān)字段的倒排序中。

類型如何實現(xiàn)

每個文檔的類型名稱將會保存在 _type 字段上，當(dāng)要檢索字段的時候，ES 會自動在 _type 字段上檢索。

例如在 User 類型中，name 字段會映射聲明為 string 類型，并索引到 name 的倒排序中，需要使用 whitespace 分詞器分析。

"name": {
    "type":     "string",
    "analyzer": "whitespace"
}

Lucene 索引的每個字段都包含一個單一的扁平的模式

在 Lucene 中，一個特定的字段可以映射到 string 類型或者是 number 類型，但是不能兩者兼具。因為 ES 添加的優(yōu)于 lucene 的額外機制（以元數(shù)據(jù) _type 字段的形式。）在 ES 中所有類型都最終共享相同的映射。

{
   "data": {
      "mappings": {
         "people": {
            "properties": {
               "name": {
                  "type": "string",
               },
               "address": {
                  "type": "string"
               }
            }
         },
         "transactions": {
            "properties": {
               "timestamp": {
                  "type": "date",
                  "format": "strict_date_optional_time"
               },
               "message": {
                  "type": "string"
               }
            }
         }
      }
   }
}

在上方中，"name"/"address" 和 "timestamp"/"message" 雖然是獨立的，但是在 Lucene 中是一個映射。

{
   "data": {
      "mappings": {
        "_type": {
          "type": "string",
          "index": "not_analyzed"
        },
        "name": {
          "type": "string"
        }
        "address": {
          "type": "string"
        }
        "timestamp": {
          "type": "long"
        }
        "message": {
          "type": "string"
        }
      }
   }
}

對于整個索引，映射在本質(zhì)上被扁平化成一個單一的、全局的模式。

Java 連接 ES

添加依賴：

<!-- TransportClient 依賴包-->
        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.2.2</version>
        </dependency>

        <!-- 測試包，與 JAVA 連接 ES 無關(guān) -->
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
            <scope>test</scope>
        </dependency>

創(chuàng)建 ES 集群：

圖片

連接 ES：

package cn.zsm.es;

import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.TransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.Before;
import org.junit.Test;

import java.net.InetAddress;
import java.net.UnknownHostException;

public class JavaEsTest {

    private String IP;
    private int PORT;

    @Before
    public void init(){
        this.IP = "192.168.？.？";
        this.PORT = 9300;
    }

    @Test
    public void esClient(){
        try {
            Settings settings = Settings.builder().put("cluster.name", "my-application").build();
            TransportClient client = new PreBuiltTransportClient(settings)
                    .addTransportAddresses(new TransportAddress(InetAddress.getByName(IP), PORT));
            System.out.println(client.toString());
        } catch (UnknownHostException e) {
            e.printStackTrace();
        }
    }

}

測試結(jié)果：

圖片

責(zé)任編輯：武曉燕來源：程序猿技術(shù)充電站

數(shù)據(jù)可視化工具

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營