一行代碼實(shí)現(xiàn)Python連接所有數(shù)據(jù)庫(kù)做數(shù)據(jù)分析
市面上比較常用的數(shù)據(jù)庫(kù)包括mysql, presto, hive, druid, kylin, spark, elasticsearch等,作為一名數(shù)據(jù)分析師,面對(duì)不同的數(shù)據(jù)庫(kù),是否有頭麻的情況。別擔(dān)心,使用python連接以上數(shù)據(jù)庫(kù),你只需要一招,5行代碼即可。
對(duì)于大部分sqlboys和sqlgirls而言,只關(guān)心我的sql提交到以上數(shù)據(jù)庫(kù),返回給我一個(gè)pandas的dataframe即可。所以必要的輸入包括sql和數(shù)據(jù)庫(kù)連接信息(包括地址,port, 賬號(hào)密碼)即可。
- from sqlachemy import create_engine
- import pandas as pd
- # 數(shù)據(jù)庫(kù)連接地址
- engine = create_engine("mysql://root:123456@127.0.0.1:3306/database")
- # 用戶要查詢的sql
- sql = "select * from users limit 10"
- df = pd.read_sql_query(sql, engine)
presto
- # presto
- uri = "presto://username:password@127.0.0.1:8080/database?source=pyhive"
- sql = "select * from users limit 10"
- df = pd.read_sql_query(sql, create_engine(uri))
mysql
- # mysql
- uri = "mysql://root:123456@127.0.0.1:3306/database"
- sql = "select * from users limit 10"
- df = pd.read_sql_query(sql, create_engine(uri))
druid
- # druid
- uri = "druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql"
- sql = "select count(*) from users where _time> TIME_SHIFT...."
- df = pd.read_sql_query(sql, create_engine(uri))
更多數(shù)據(jù)庫(kù)連接方式:
數(shù)據(jù)庫(kù) |
示例 |
Apache Druid |
druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql |
Apache Hive |
hive://hive@{hostname}:{port}/{database} |
Apache Kylin |
kylin://<username>:<password>@<hostname>:<port>/<project>?<param1>=<value1>&<param2>=<value2> |
Apache Spark SQL |
hive://hive@{hostname}:{port}/{database} |
ClickHouse |
clickhouse://{username}:{password}@{hostname}:{port}/{database} |
ElasticSearch |
elasticsearch+http://{user}:{password}@{host}:9200/ |
Presto |
presto://{user}@{host}:{port}/{database}?source={source} |
MySQL |
mysql://<UserName>:<DBPassword>@<Database Host>/<Database Name> |
基本上市面上所有的數(shù)據(jù)庫(kù),只要該數(shù)據(jù)庫(kù)支持sqlalchemy dialect和對(duì)應(yīng)的python driver,都可以按照上面的套路去無腦操作。簡(jiǎn)單省心。
核心只需要一行代碼即可:
- df = pd.read_sql_query(sql, create_engine(uri))