HBaseCon大會是Apache HBase™官方舉辦的技術(shù)會議,發(fā)起于2012年。Apache HBase是基于Apache Hadoop構(gòu)建的一個分布式、可伸縮的KeyValue數(shù)據(jù)庫,它提供了大數(shù)據(jù)背景下的高性能的隨機讀寫能力,它的實現(xiàn)參考了Google在2006年發(fā)布的Bigtable論文。
大會時間:2017.08.04 08:00-18:00
大會地點:中國·深圳市龍崗區(qū)坂田街道環(huán)城路天安云谷1期3棟D座3樓國際會議中心
參會對象:開發(fā)者

演講主題簡介
Keynote:
HBase 2.0.0
Michael Stack
HBase-2.0.0 has been a couple of years in the making. It is chock-a-block full of a long list of new features and fixes. In this session, the 2.0.0 release manager will perform the impossible, describing the release content inside the session time bounds.
|
HBase Practice At XiaoMi
Zheng Hu
We'll share some HBase experience at XiaoMi:
1. How did we tuning G1GC for HBase Clusters.
2. Development and performance of Async HBase Client.
|
Track1
Offheap bucket cache success story and Offheaping the write path in HBase
Ramkrishna Vasudevan and Anoop Sam John
The first part of the talk covers the success story of deploying the latest improvements to offheap mode bucket cache in one of the biggest clusters at Alibaba.
It highlights how off heap read from bucket cache helped in improving the avg QPS and avoided the frequent dips in QPS due to GC.
The second part covers the efforts that went into making the HBase write path to effectively use the offheap memory, various design changes in terms of size accounting and the performance gains that we achieved at the end of the task.
|
HBase Multi tenancy use cases and various solution
Bhupendra Jain
In a multi tenant scenario the biggest challenge is to achieve the QoS for each tenant without impacting the other tenants workload. This session will talk about the multi tenancy use cases and challenges present in HBase. Session will talk in detail about
a) Achieving Multi tenancy with Single HBase cluster - Solutions, Pros and cons (RS Group, RPC Throttling, Quota etc.)
b) Achieving Multi tenancy with multiple HBase cluster - Solutions, Pros and cons.
|
Lift the ceiling of HBase throughputs
Yu Li and Lijin Bin
HBase is the core storage of Alibaba's search infrastructure and meets big challenge on improving its throughputs, which decides the speed of machine learning program processing thus the accuracy of recommendations made. In this session we will talk about work done and in progress to increase both read and write throughputs, as well as the real performance on the past Singles' Day and latest benchmark data in laboratory.
|
Removable singularity: a story of HBase upgrade in Pinterest
Tianying Chang
HBase is used to serve online facing traffic in Pinterest. It means no downtime is allowed. However, we were on HBase 94. To upgrade to latest version, we need to figure out a way to live upgrade while keeping Pinterest site live. Recently, we successfully upgrade 94 HBase cluster to 1.2 with no downtime. We made change to both Asynchbase and HBase server side. We will talk about what we did and how we did it. We will also talk about the finding in config and performance tuning we did to achieve low latency.
|
HBase Disaster Recovery Solution at Huawei
Ashish Singhi
HBase Disaster recovery solution aims to maintain high availability of HBase service in case of disaster of one HBase cluster with very minimal user intervention. This session will introduce the HBase disaster recovery use cases and the various solutions adopted at Huawei like.
a) Cluster Read-Write mode
b) DDL operations synchronization with standby cluster
c) Mutation and bulk loaded data replication
d) Further challenges and pending work
|
Backup / Restore feature in HBase
Vladimir Rodionov and Ted Yu
Backup and restore functionality is crucial to achieving fault tolerance for data management systems.
In the talk, we are going to cover the newly merged backup and restore phases 2 and 3.
Previously users can perform snapshot for backing up data. However, the associated execution cost may be high due to the flush across region servers. There was no incremental snapshot either.
Backup and restore functionality provides two types of backup:
Full backup – foundation for incremental backups
Incremental backup – can be periodic to capture changes over time
We'll cover three types of backup strategies:
Intra-cluster backup
backup on a separate HDFS archive cluster
backup involving Cloud or a Storage Vendor
Best practices for Backup-and-Restore will be presented next.
We'll explain concepts such as Backup Image, Backup Set with example commands of how they are used.
Mechanism for Incremental backups is covered next.
Finally we'll cover bulk load support for backup.
|
HBase on Beam
Jingcheng Du
Apache Beam is an open source and unified programming model for defining batch and streaming jobs that run on many execution engines, HBase on Beam is a connector that allows Beam to use HBase as a bounded data source and target data store for both batch and streaming data sets. With this connector HBase can work with many batch and streaming engines directly, for example Spark, Flink, Google Cloud Dataflow, etc. In this session, I will introduce Apache Beam, and the current implementation of HBase on Beam and the future plan on this.
|
Track 2
HBase: recent improvement and practice at Alibaba
Wenlong Yang and Han Yang
AliHB, a tailored HBase branch for Alibaba Group's business characteristics and requirements, is widely used as a basic storage service to support the online and nearline applications of whole alibaba economy companies, like taobao.com、tmall.com、alipay.com、cainiao.com and etc.
In this talk, we will share the experience of high availability and low cost to maintain the clusters including more than ten thousand nodes:
1. Several typical scenes introduction at Alibaba
2. SQL(based on Apache Phoenix) improvement
3. Range-level data copy feature cross clusters
4. Prefix-Bloomfilter for scan performance
5. Dual-Service based on async api, enabling concurrent access on two clusters for expected low latency
6. Some useful things for production.
|
Ecosystems with HBase and CloudTable service at Huawei
Jieshan Bi and Yanhui Zhong
CloudTable: Huawei's cloud HBase service will be online.
1. Our view on HBase.
2. CloudTable service based on HBase.
CTBase: A light-weight HBase client for structured data.
1. Schematized table, more friendly for structured data storage.
2. Global secondary index for HBase.
3. HBase Query DSL. JSON based light-weight API.
4. Cluster table. Pre-joining with keys, a better solution for cross-table join queries from HBase.
Tagram: Distributed bitmap index implementation for HBase.
1. Distributed bitmap index for accelerating AD-HOC queries with low cardinality columns.
2. Powerful and flexible query API.
3. Tagram offers millisecond-level query latency.
|
Large scale data near-line loading method and architecture
Shuaifeng Zhou
When we do real-time data loading to HBase, we use put/putlist interface. After receiving put request, regionserver will write WAL, write data into memory store, flush memory store to disk-store, then compact files again and again. That precedure occupies too much resource and causing read/write performance decrease. To solve the problem, we provide a kind of near-line loading method and architecture, greatly increase the loading bandwidth, and decrease the influence to read operations.
|
HBase at JD
Xingbo Peng, Nan Zhang and Bang Wen
1.規(guī)?,F(xiàn)狀
HBase在京東CTO體系中經(jīng)歷了數(shù)年的發(fā)展,集群規(guī)模已經(jīng)達到3000+臺,支持了京東600+業(yè)務系統(tǒng),京東CTO體系的HBase集群,已經(jīng)經(jīng)歷了多次618和雙11的考驗。京東CTO體系是HBase的重要用戶。
2.應用的業(yè)務場景
介紹HBase在京東的典型應用的業(yè)務,包括監(jiān)控、風控、推薦、廣告等
3.高可用改進
介紹我們在HBase集群高可用方面做的一些工作,包括跨機房容災、多租戶-資源分組、集群安全等
4.運維實踐
主要介紹我們在HBase集群運維上的一些實踐,包括:HBase集群監(jiān)控系統(tǒng)Mummut、報警系統(tǒng)、HBase集群與大數(shù)據(jù)平臺結(jié)合、業(yè)務運營及數(shù)據(jù)遷移等
5.未來展望
介紹我們正在基于HBase做的及未來要做的一些工作,包括:kylin、phoenix和容器化部署等
|
Synchronous replication for HBase
Shen Chunhui and Meng Qingyi
This talk will share the detailed implementation and actual practice about synchronous replication between clusters on alibaba's internal HBase branch.
It contains the content of how to keep the data consistency, how to switch the client access between clusters automatically, the related perfomance and monitor.
|
基于HBase的企業(yè)級大數(shù)據(jù)平臺
Xinyu Zhang, Xueliang Chen and Zheng Fan
基于HBase的大數(shù)據(jù)平臺已經(jīng)成為中國人壽新一代綜合業(yè)務處理系統(tǒng)中非常重要的基礎性數(shù)據(jù)平臺。目前基于該平臺已經(jīng)整合了上百TB的數(shù)據(jù),并將幾億客戶的客戶、業(yè)務、接觸數(shù)據(jù)整合到一個統(tǒng)一的數(shù)據(jù)模型中,并基于此形成了上千個客戶標簽。同時,基于該平臺為客戶、營銷員和內(nèi)部管理人員提供了銷售支持、客戶服務、運營支持等多類應用。通過APP、網(wǎng)頁等形式提供了多種信息的檢索和查詢,并通過深度學習模型提供了反欺詐等方面的數(shù)據(jù)應用。
|
HBase在Hulu的使用和實踐
Qianxi Zhang
1. Hulu是美國最受歡迎的在線視頻網(wǎng)站之一,Hulu Beijing是Hulu第二大研發(fā)中心。北京大數(shù)據(jù)基礎架構(gòu)團隊負責整個公司的大數(shù)據(jù)基礎架構(gòu)的研發(fā)和運維。
2. HBase在Hulu的概況
3. HBase在Hulu的使用
4. 用戶畫像系統(tǒng),存放所有用戶的基本信息,用戶行為,第三方DMP數(shù)據(jù)和機器學習結(jié)果標簽(幾十萬個Qualifier),Spark和Spark Streaming讀寫HBase數(shù)據(jù),運行各種機器學習模型,為公司的視頻推薦,精準廣告和Marketing團隊服務
5. HBase在Hulu的優(yōu)化
|
Apache HBase at Netease
Xinxin Fan and Hongxiang Jiang
First, we will give a brief introduction about the HBase service at Netease,include the basic cluster info and the key HBase service. And then we will talk same tips about the tuning practices for HBase. Last, we will introduce some improvements at the internal HBase version.
|
Building online HBase cluster of Zhihu based on Kubernetes
Zhiyong Bai
As a high performance and scalable key value database, Zhihu use HBase to provide online data store system along with Mysql and Redis. Zhihu’s platform team had accumulated some experience in technology of container, and this time, based on Kubernetes, we build flexible platform of online HBase system, create multiple logic isolated HBase clusters on the shared physical cluster with fast rapid,and provide customized service for different business needs. Combined with Consul and DNS server, we implement high available access of HBase using client mainly written with Python. This presentation is mainly shared the architecture of online HBase platform in Zhihu and some practical experience in production environment.
|
聯(lián)系方式:
關(guān)于大會的商務/贊助咨詢:張沖 zhangchong1@huawei.com