自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法! 原創(chuàng)

發(fā)布于 2024-11-1 09:09
瀏覽
0收藏

01 概述

在這個數(shù)據(jù)驅(qū)動的時代,我們每天都在產(chǎn)生和處理海量的數(shù)據(jù)。如何從這些數(shù)據(jù)中提取有價值的信息,并以一種直觀、易于理解的方式呈現(xiàn),成為了一個重要的課題。今天,給大家介紹一個強(qiáng)大的工具——Language-Integrated Data Analysis(LIDA),它能夠自動化地創(chuàng)建可視化圖表,讓數(shù)據(jù)洞察變得觸手可及。

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

02 LIDA的核心特性

語法無關(guān)的可視化

無論你是Python、R還是C++的開發(fā)者,LIDA都能幫助你產(chǎn)出視覺輸出,而無需鎖定在特定的編程語言中。這種靈活性讓來自不同編程背景的用戶都能輕松上手。

多階段生成流程

LIDA通過一個無縫的工作流程,從數(shù)據(jù)總結(jié)到可視化創(chuàng)建,幫助用戶輕松駕馭復(fù)雜的數(shù)據(jù)集。

混合用戶界面

LIDA提供了直接操作和多語言自然語言界面的選項,使得從數(shù)據(jù)科學(xué)家到商業(yè)分析師的廣泛受眾都能輕松使用。用戶可以通過自然語言命令進(jìn)行交互,使數(shù)據(jù)可視化變得直觀而簡單。

03 LIDA的架構(gòu)

LIDA的架構(gòu)包括以下幾個關(guān)鍵組件:

  • Summarizer:將數(shù)據(jù)集轉(zhuǎn)換為簡潔的自然語言描述,包括所有列名、分布等信息。
  • GOAL Explorer:基于數(shù)據(jù)集識別潛在的可視化或分析目標(biāo),并生成用戶指定數(shù)量的目標(biāo)。
  • Viz Generator:根據(jù)數(shù)據(jù)集上下文和指定目標(biāo)自動生成創(chuàng)建可視化的代碼。
  • Infographer:創(chuàng)建、評估、完善并執(zhí)行可視化代碼,以產(chǎn)生完全風(fēng)格化的規(guī)范。

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

04 LIDA的主要特點

  • 數(shù)據(jù)總結(jié):LIDA將大型數(shù)據(jù)集壓縮成密集的自然語言摘要,作為未來操作的基礎(chǔ)。
  • 自動化數(shù)據(jù)探索:LIDA提供了一個完全自動化的模式,用于基于不熟悉的數(shù)據(jù)集生成有意義的可視化目標(biāo)。
  • 信息圖表生成:使用圖像生成模型將數(shù)據(jù)轉(zhuǎn)換為風(fēng)格化的、吸引人的信息圖表,用于個性化的故事講述。
  • VizOps – 可視化操作:對生成的可視化進(jìn)行詳細(xì)操作,增強(qiáng)可訪問性、數(shù)據(jù)素養(yǎng)和調(diào)試。
  • 可視化解釋:提供可視化代碼的深入描述,幫助無障礙使用、教育和理解。
  • 自我評估:使用大型語言模型(LLMs)根據(jù)最佳實踐為可視化生成多維評估分?jǐn)?shù)。
  • 可視化修復(fù):使用自我評估或用戶提供的反饋自動改進(jìn)或修復(fù)可視化。
  • 可視化推薦:根據(jù)上下文或現(xiàn)有可視化推薦額外的可視化,以便比較或增加視角。

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

05 LIDA實戰(zhàn)

安裝

使用pip安裝:

pip install lida

# 設(shè)定對應(yīng)的api keyexport OPENAI_API_KEY=<API_KEY>

也可以.env來進(jìn)行api key管理:

from dotenv import load_env 
import os load_dotenv() 

# read the .env file 
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

LIDA 功能詳解

  • 初始化

from lida import Manager, TextGenerationConfig , llm 
from lida.utils import plot_raster 
import warnings
from dotenv import load_dotenv
import os

load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")

warnings.filterwarnings("ignore")

# 初始化 LIDA
lida = Manager(text_gen = llm("openai", api_key=str(OPENAI_API_KEY))) # !! input your openai or other LLM api key
textgen_config = TextGenerationConfig(n=1, temperature=0.5, model="gpt-3.5-turbo-0301", use_cache=True)

lida.Manager 是 LIDA Lib 中的 Controller,負(fù)責(zé)設(shè)置 LLM 的類型;而 lida.TextGenerationConfig 則是對生成內(nèi)容的詳細(xì)設(shè)置,包括生成次數(shù) n、生成參數(shù)溫度的變化程度、模型和 use_cache 都在這里設(shè)置。

  • 導(dǎo)入數(shù)據(jù)

import pandas as pd  
# 資料目前是使用官方推薦的資料集 

cars data = pd.read_csv("<https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv>") data.head()

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

  • 數(shù)據(jù)摘要

從數(shù)據(jù)集生成簡要摘要;內(nèi)容分別為每個專欄的std, min, max, samples, unique, semantic_type和description

# 數(shù)據(jù)摘要:從資料集生成簡短摘要
summary = lida.summarize( "https://raw.githubusercontent.com/uwdata/draco/master/data/cars.csv" , summary_method= "default" , textgen_cnotallow=textgen_config)   

print (summary)

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

  • 目標(biāo)生成

根據(jù)資料摘要輸出,包括Index, Question, Visualizations 和Rationale。

# 目標(biāo)生成:根據(jù)資料摘要生成視覺化圖表的目標(biāo), n=3 表示生成3 個目標(biāo)
goals = lida.goals(summary, n= 3 , textgen_cnotallow=textgen_config) 

# 查看目前要生成的目標(biāo)
for goal in goals: 
    print ( "=" * 20 ) 
    print ( f"Question: {goal.index} " ) 
    # print the question, visualization and rationale with each goal 
    print (goal.question) 
    print (goal.visualization) 
    print (goal.rationale) 

```輸出結(jié)果
==================== 
Question: 0
 What is the distribution of Retail_Price? 
histogram of Retail_Price 
This tells about the spread of prices of cars in the dataset . 
==================== 
Question: 1
 What is the distribution of Engine_Size__l_ among different car types? 
box plot of Engine_Size__l_ for each car type
 This will help  in identifying if there is  any difference in engine size among different car types. 
==================== 
Question: 2
 What is the relationship between Horsepower_HP_ and City_Miles_Per_Gallon? 
scatter plot of Horsepower_HP_ vs City_Miles_Per_Gallon 
This will help  in identifying if there is  any correlation between horsepower and fuel efficiency of cars.
  • 生成可視化圖表

根據(jù)Goal 的visualization 建議自動生成圖表。

library = "matplotlib"  # 可選"altair", "seaborn", "plotly", "matplotlib"

 textgen_config = TextGenerationConfig(n= 1 , temperature= 0.2 , use_cache= True ) 
for i in  range ( len (goals)): 
    # print the question, visualization and rationale with each goal 
    print ( "Question: " , goals[i].question) 
    print ( "Visualization: " , goals[i].visualization) 
    print ( "Rationale: " , goals[i] .rationale) 
    charts = lida.visualize(summary=summary, goal=goals[i], textgen_cnotallow=textgen_config, library=library) 
    plot_raster(charts[ 0 ].raster)

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

  • 圖表編輯

使用自然語言(NLP)編輯圖表,例如顏色、字的大小甚至字型等等。(這個在寫論文或研究報告時感覺很實用XD )

# 改變圖表顏色和字體大小
instructions = [ "change the color to red " , "scale the word size to 50%" ] 

edited_charts = lida.edit(code=charts[ 0 ].code, summary=summary, instructinotallow=instructions ) 
plot_raster(edited_charts[ 0 ].raster)

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

  • 視覺化圖表解釋

code = charts[ 0 ].code 
explanations = lida.explain(code=code, library=library, textgen_cnotallow=textgen_config) 

for row in explanations[ 0 ]: 
    print (row[ "section" ], " ** " , row[ "explanation" ]) 
    
# 輸出結(jié)果
accessibility ** The code creates a scatter plot using the matplotlib.pyplot library to visualize the relationship between two variables - Horsepower_HP_ and City_Miles_Per_Gallon. The plot is colored blue with an alpha value of 0.5 to show the density of the data points. The x-axis is labeled 'Horsepower_HP_'  and the y-axis is labeled 'City_Miles_Per_Gallon' . The title of the plot is  'What is the relationship between Horsepower_HP_ and City_Miles_Per_Gallon?' . 
transformation ** There is no data transformation happening in this code. The plot is made using the original data as it is . 
visualization ** The code first imports the required libraries - matplotlib.pyplot and pandas. The function plot() takes a pandas DataFrame as  input  and creates a scatter plot using the plt.scatter() method. The x-axis of the plot is the 'Horsepower_HP_' column of the input DataFrame and the y-axis is the 'City_Miles_Per_Gallon' column of the input DataFrame. The alpha parameter controls the transparency of the data points and the color parameter sets the color of the data points. The plt.xlabel() and plt.ylabel() methods add labels to the x-axis and y-axis respectively. The plt.title() method adds a title to the plot. The wrap parameter in plt.title() is  set to True to wrap the title text if it exceeds the width of the plot. Finally, the function returns the plot object .
  • 可視化評估和修復(fù)

評估視覺化圖表是否存在問題,評分標(biāo)準(zhǔn)包括:Bug 錯誤, Transformation 轉(zhuǎn)換程度, Compliance 合規(guī)性, type 圖表類別, encoding 編碼方式和aesthetics 美觀程度;令人最意外的居然可以評估美觀程度XDD

evaluations = lida.evaluate(code=code, goal=goals[i], library=library)[ 0 ] 
for  eval  in evaluations: 
    print ( eval [ "dimension" ], "Score" , eval [ "score" ], " / 10" ) 
    print ( "\t" , eval [ "rationale" ][: 200 ]) 
    print ( "\t*********************** ***********" ) 

# 輸出結(jié)果
bugs Score 10 / 10
   No bugs, syntax errors, or typos found. 
***************** ***************** 
transformation Score 10 / 10
   No data transformation needed for a scatter plot. 
******************* *************** 
compliance Score 8 / 10
   The code meets the specified visualization goal, but the title could be improved by removing the question mark and rephrasing it as a statement. 
**** ****************************** 
type Score 9 / 10
   A scatter plot is an appropriate visualization type  for exploring the relationship between two continuous variables. 
********************************** 
encoding Score 9 / 10
   The data is encoded appropriately with Horsepower_HP_ on the x-axis and City_Miles_Per_Gallon on the y-axis. 
********************************** 
aesthetics Score 9 / 10
   The aesthetics of the visualization are appropriate with a blue color and an alpha of 0.5 to show overlapping points. ***************************** *****
  • 可視化圖表推薦

針對Summary 的上下文生成對應(yīng)數(shù)量、由LLM 判斷的推薦圖表。

textgen_config = TextGenerationConfig(n= 1 , temperature= 0 , use_cache= True ) 
recommended_charts = lida.recommend(code=code, summary=summary, n= 3 , textgen_cnotallow=textgen_config) 

print ( f"Recommended { len (recommended_charts)} charts " ) 
for chart in recommended_charts: 
    plot_raster(chart.raster) 
    pass

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

  • 個性化圖表生成

數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

# 先繼承class 'lida.datamodel.Goal' 
from lida.datamodel import Goal 

# datamodel 總共有4 個object,分別是index, question, visualization and rationale
 custom_goal = Goal( 
    index= 0 , 
    questinotallow= "What is the distribution of the Type?" , 
    visualizatinotallow= "Bar Chart" , 
    ratinotallow= "The type of the car is an important feature of the dataset."
 ) 
# 生成圖表
custom_chart = lida.visualize(summary=summary, goal=custom_goal, textgen_cnotallow=textgen_config , library=library) 
plot_raster(custom_chart[ 0 ].raster) 
# 編輯客制化生成圖表
custom_instructions = [ "change the color to blue tone on tone color" ] # 改變Bar Chart 的顏色
edited_custom_charts = lida.edit(code= custom_chart[ 0 ].code, summary=summary, instructinotallow=custom_instructions) 
plot_raster(edited_custom_charts[ 0 ].raster)

Web UI

目前LIDA 官方有推出一個Web UI 可以讓大家上傳自己的資料進(jìn)行分析,使用方法如下:?

pip install lida 

export OPENAI_API_KEY=<your key> 

lida ui --port=8080 --docs


數(shù)據(jù)分析自動化:LIDA智能可視化的魔法!-AI.x社區(qū)

??!注意事項:

  • 資料集大?。?/strong>LIDA 目前適合小規(guī)模的資料集,因為目前LLM 沒法處理太長的文章(Token 長度)。
  • LLM 選擇:LIDA 與GPT 3.5, GPT 4,最為相容,因為Summary 維度較高的資料和進(jìn)行推理時還是需要比較大的LLM 才有較好的成效。 
  • 可靠性:論文中顯示錯誤率低于3.5%、但在輸出圖表還是反覆檢查一下結(jié)果是否合理。

參考:

  1. ??https://github.com/microsoft/lida??
  2. ??https://microsoft.github.io/lida/??


本文轉(zhuǎn)載自公眾號Halo咯咯  作者:基咯咯

原文鏈接: ??https://mp.weixin.qq.com/s/smeYr8cUi3yqXYm4jBz7Wg???

?著作權(quán)歸作者所有,如需轉(zhuǎn)載,請注明出處,否則將追究法律責(zé)任
標(biāo)簽
收藏
回復(fù)
舉報
回復(fù)
相關(guān)推薦