自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<cite id="nykgl"></cite>

<cite id="nykgl"></cite>

<style id="nykgl"></style>

AI.x社區(qū)

軟考社區(qū)

企業(yè)培訓

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費課排行榜短視頻直播課軟考學堂

全部課程軟考華為認證廠商認證 IT技術(shù)PMP項目管理免費題庫

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學堂

51CTO博客

CTO訓練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學堂APP

51CTO學堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

Apache Flume之正則過濾器

作者：佚名 2017-07-18 14:10:31

大數(shù)據(jù)

在當今的大數(shù)據(jù)世界中，應用程序產(chǎn)生大量的電子數(shù)據(jù) – 這些巨大的電子數(shù)據(jù)存儲庫包含了有價值的、寶貴的信息。對于人類分析師或領(lǐng)域?qū)＜?，很難做出有趣的發(fā)現(xiàn)或?qū)ふ铱梢詭椭鷽Q策過程的模式。我們需要自動化的流程來有效地利用龐大的，信息豐富的數(shù)據(jù)進行規(guī)劃和投資決策。在處理數(shù)據(jù)之前，收集數(shù)據(jù)，聚合和轉(zhuǎn)換數(shù)據(jù)是絕對必要的，并最終將數(shù)據(jù)移動到那些使用不同分析和數(shù)據(jù)挖掘工具的存儲庫中。

在當今的大數(shù)據(jù)世界中，應用程序產(chǎn)生大量的電子數(shù)據(jù) – 這些巨大的電子數(shù)據(jù)存儲庫包含了有價值的、寶貴的信息。對于人類分析師或領(lǐng)域?qū)＜遥茈y做出有趣的發(fā)現(xiàn)或?qū)ふ铱梢詭椭鷽Q策過程的模式。我們需要自動化的流程來有效地利用龐大的，信息豐富的數(shù)據(jù)進行規(guī)劃和投資決策。在處理數(shù)據(jù)之前，收集數(shù)據(jù)，聚合和轉(zhuǎn)換數(shù)據(jù)是絕對必要的，并最終將數(shù)據(jù)移動到那些使用不同分析和數(shù)據(jù)挖掘工具的存儲庫中。

執(zhí)行所有這些步驟的流行工具之一是Apache Flume。這些數(shù)據(jù)通常是以事件或日志的形式存儲。 Apache Flume有三個主要組件：

Source：數(shù)據(jù)源可以是企業(yè)服務器，文件系統(tǒng)，云端，數(shù)據(jù)存儲庫等。
Sink：Sink是可以存儲數(shù)據(jù)的目標存儲庫。它可以是一個集中的地方，如HDFS，像Apache Spark這樣的處理引擎，或像ElasticSearch這樣的數(shù)據(jù)存儲庫/搜索引擎。
Channel：在事件被sink消耗前由Channel 存儲。 Channel 是被動存儲。 Channel 支持故障恢復和高可靠性; Channel 示例是由本地文件系統(tǒng)和基于內(nèi)存的Channel 支持的文件通道。

Flume是高度可配置的，并且支持許多源，channel，serializer和sink。它還支持數(shù)據(jù)流。 Flume的強大功能是攔截器，支持在運行中修改/刪除事件的功能。支持的攔截器之一是regex_filter。

regex_filter將事件體解釋為文本，并將其與提供的正則表達式進行對比，并基于匹配的模式和表達式，包括或排除事件。我們將詳細看看regex_filter。

要求

從數(shù)據(jù)源中，我們以街道號，名稱，城市和角色的形式獲取數(shù)據(jù)?，F(xiàn)在，數(shù)據(jù)源可能是實時流數(shù)據(jù)，也可能是任何其他來源。在本示例中，我已經(jīng)使用Netcat服務作為偵聽給定端口的源，并將每行文本轉(zhuǎn)換為事件。要求以文本格式將數(shù)據(jù)保存到HDFS中。在將數(shù)據(jù)保存到HDFS之前，必須根據(jù)角色對數(shù)據(jù)進行過濾。只有經(jīng)理的記錄需要存儲在HDFS中;其他角色的數(shù)據(jù)必須被忽略。例如，允許以下數(shù)據(jù)：

1,alok,mumbai,manager 
 
2,jatin,chennai,manager

下列的數(shù)據(jù)是不被允許的：

3,yogesh,kolkata,developer 
 
5,jyotsana,pune,developer

如何達到這個要求

可以通過使用 regex_filter 攔截器來實現(xiàn)。這個攔截器將根據(jù)規(guī)則基礎(chǔ)來進行事件過濾，只有感興趣的事件才會發(fā)送到對應的槽中，同時忽略其他的事件。

## Describe regex_filter interceptor and configure exclude events attribute 
 
a1.sources.r1.interceptors = i1 
 
a1.sources.r1.interceptors.i1.type = regex_filter 
 
a1.sources.r1.interceptors.i1.regex = developer 
 
a1.sources.r1.interceptors.i1.excludeEvents = true

HDFS 槽允許數(shù)據(jù)存儲在 HDFS 中，使用文本/序列格式。也可以使用壓縮格式存儲。

a1.channels = c1 
 
a1.sinks = k1 
 
a1.sinks.k1.type = hdfs 
 
a1.sinks.k1.channel = c1 
 
## assumption is that Hadoop is CDH 
 
a1.sinks.k1.hdfs.path = hdfs://quickstart.cloudera:8020/user/hive/warehouse/managers 
 
a1.sinks.k1.hdfs.fileType= DataStream 
 
a1.sinks.k1.hdfs.writeFormat = Text

如何運行示例

首先，你需要 Hadoop 來讓示例作為 HDFS 的槽來運行。如果你沒有一個 Hadoop 集群，可以將槽改為日志，然后只需要啟動 Flume。在某個目錄下存儲 regex_filter_flume_conf.conf 文件然后使用如下命令運行代理。

flume-ng agent --conf conf --conf-file regex_filter_flume_conf.conf --name a1 -Dflume.root.logger=INFO,console

注意代理名稱是 a1。我用了 Netcat 這個源。

a1.sources.r1.type = netcat 
 
a1.sources.r1.bind = localhost 
 
a1.sources.r1.port = 44444

一旦 Flume 代理啟動，運行下面命令用來發(fā)送事件給 Flume。

telnet localhost 40000

現(xiàn)在我們只需要提供如下輸入文本：

1,alok,mumbai,manager 
 
2,jatin,chennai,manager 
 
3,yogesh,kolkata,developer 
 
4,ragini,delhi,manager 
 
5,jyotsana,pune,developer 
 
6,valmiki,banglore,manager

訪問 HDFS 你會觀察到 HDFS 在 hdfs://quickstart.cloudera:8020/user/hive/warehouse/managers 下創(chuàng)建了一個文件，文件只包含經(jīng)理的數(shù)據(jù)。

完整的 flume 配置 — regex_filter_flume_conf.conf — 如下：

# Name the components on this agent 
 
a1.sources = r1 
 
a1.sinks = k1 
 
a1.channels = c1 
 
# Describe/configure the source - netcat 
 
a1.sources.r1.type = netcat 
 
a1.sources.r1.bind = localhost 
 
a1.sources.r1.port = 44444 
 
# Describe the HDFS sink 
 
a1.channels = c1 
 
a1.sinks = k1 
 
a1.sinks.k1.type = hdfs 
 
a1.sinks.k1.channel = c1 
 
a1.sinks.k1.hdfs.path = hdfs://quickstart.cloudera:8020/user/hive/warehouse/managers 
 
a1.sinks.k1.hdfs.fileType= DataStream 
 
a1.sinks.k1.hdfs.writeFormat = Text 
 
## Describe regex_filter interceptor and configure exclude events attribute 
 
a1.sources.r1.interceptors = i1 
 
a1.sources.r1.interceptors.i1.type = regex_filter 
 
a1.sources.r1.interceptors.i1.regex = developer 
 
a1.sources.r1.interceptors.i1.excludeEvents = true 
 
# Use a channel which buffers events in memory 
 
a1.channels.c1.type = memory 
 
a1.channels.c1.capacity = 1000 
 
a1.channels.c1.transactionCapacity = 100 
 
# Bind the source and sink to the channel 
 
a1.sources.r1.channels = c1 
 
a1.sinks.k1.channel = c1

完整的項目代碼請看這里。

責任編輯：龐桂玉來源： 36大數(shù)據(jù)

大數(shù)據(jù)Apache Flume 過濾器

51CTO技術(shù)棧公眾號

業(yè)務
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學堂精培企業(yè)培訓 CTO訓練營

<sub id="4vanu"><i id="4vanu"></i></sub>

<sub id="4vanu"><p id="4vanu"></p></sub>

<cite id="4vanu"><rp id="4vanu"><form id="4vanu"></form></rp></cite>

<tt id="4vanu"><delect id="4vanu"></delect></tt>

^{<blockquote id="4vanu"><i id="4vanu"></i></blockquote>}