自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

LLM代理應(yīng)用實戰(zhàn):構(gòu)建Plotly數(shù)據(jù)可視化代理

人工智能
如果你嘗試過像ChatGPT這樣的LLM,就會知道它們幾乎可以為任何語言或包生成代碼。但是僅僅依靠LLM是有局限的。對于數(shù)據(jù)可視化的問題我們需要提供一下的內(nèi)容

如果你嘗試過像ChatGPT這樣的LLM,就會知道它們幾乎可以為任何語言或包生成代碼。但是僅僅依靠LLM是有局限的。對于數(shù)據(jù)可視化的問題我們需要提供一下的內(nèi)容。

描述數(shù)據(jù):模型本身并不知道數(shù)據(jù)集的細節(jié),比如列名和行細節(jié)。手動提供這些信息可能很麻煩,特別是當(dāng)數(shù)據(jù)集變得更大時。如果沒有這個上下文,LLM可能會產(chǎn)生幻覺或虛構(gòu)列名,從而導(dǎo)致數(shù)據(jù)可視化中的錯誤。

樣式和偏好:數(shù)據(jù)可視化是一種藝術(shù)形式,每個人都有獨特的審美偏好,這些偏好因圖表類型和信息而異。不斷地為每個可視化提供不同的風(fēng)格和偏好是很麻煩的。而配備了風(fēng)格信息的代理可以簡化這一過程,確保一致和個性化的視覺輸出。

如果每次于LLM進行交互都附帶這些內(nèi)容會導(dǎo)致請求過于復(fù)雜,不利于用戶的輸入,所以這次我們構(gòu)建一個數(shù)據(jù)可視化的代理,通過代理我們只需提供很少的信息就能夠讓LLM生成我們定制化的圖表。

可視化庫的選擇

在構(gòu)建一個數(shù)據(jù)可視化的AI代理時,選擇合適的可視化工具是至關(guān)重要的。雖然存在多種工具可以用于數(shù)據(jù)可視化,但Plotly和Matplotlib是最為常用的兩種。為了構(gòu)建一個既功能豐富又用戶友好的可視化界面,我們決定使用Plotly作為主要的可視化庫。

與Matplotlib相比,Plotly提供了更加豐富的交互性功能。它支持直接在Web瀏覽器中的動態(tài)渲染,使得用戶能夠通過縮放、平移、懸停來互動式地探索數(shù)據(jù)。這種高度的交互性是Plotly的一大優(yōu)勢,尤其是在需要展示復(fù)雜數(shù)據(jù)集或進行深入數(shù)據(jù)分析的應(yīng)用場景中。

雖然Matplotlib在科學(xué)研究和學(xué)術(shù)出版物中有廣泛的應(yīng)用,特別是在生成高質(zhì)量的靜態(tài)圖像方面具有極高的靈活性和精確度,但其在交互性和Web集成方面的限制使得它在構(gòu)建現(xiàn)代、交互式的數(shù)據(jù)可視化解決方案時可能不如Plotly那么吸引人。

所以我們選擇Plotly作為構(gòu)建數(shù)據(jù)可視化AI代理的工具,不僅能夠滿足用戶對交互性的需求,還能夠提供強大的數(shù)據(jù)處理能力和優(yōu)秀的用戶體驗。這將極大地提高數(shù)據(jù)可視化的效率和效果,使得數(shù)據(jù)分析更加直觀和易于理解。

下面是我使用Llama3 70B構(gòu)建可視化基線。

我們執(zhí)行上面的代碼將得到如下的結(jié)果

要構(gòu)建這個應(yīng)用程序,我們需要為LLM代理配備兩個工具:一個工具提供關(guān)于數(shù)據(jù)集的信息,另一個工具包含關(guān)于樣式的信息。

代理提供的信息

1、DataFrame信息

這個工具目的是分析DataFrame并將其內(nèi)容信息存儲到索引中。要索引的數(shù)據(jù)包括列名、數(shù)據(jù)類型以及值的最小值、最大值和平均值范圍。這有助于代理理解它們正在處理的變量類型。

這里我們使用layoff.fyi 的數(shù)據(jù)來進行分析。

我們這里還做了一些預(yù)處理的工作,包括將數(shù)據(jù)轉(zhuǎn)換為適當(dāng)?shù)念愋?例如,將數(shù)字字符串轉(zhuǎn)換為整數(shù)或浮點數(shù))并刪除空值。

#Optional pre-processing
 import pandas as pd
 import numpy as np
 
 
 df = pd.read_csv('WARN Notices California_Omer Arain - Sheet1.csv')
 
 #Changes date like column into datetime
 df['Received Date'] = [pd.to_datetime(x) for x in df['Received Date']]
 df['Effective Date'] = [pd.to_datetime(x) for x in df['Effective Date']]
 #Converts numbers stored as strings into ints
 df['Number of Workers'] = [int(str(x).replace(',','')) if str(x)!='nan' else np.nan for x in df['Number of Workers']]
 # Replacing NULL values
 df = df.replace(np.nan,0)

將數(shù)據(jù)集信息存儲到索引中

from llama_index.core.readers.json import JSONReader
 from llama_index.core import VectorStoreIndex
 import json
 
 # Function that stores the max,min & mean for numerical values
 def return_vals(df,c):
    if isinstance(df[c].iloc[0], (int, float, complex)):
        return [max(df[c]), min(df[c]), np.mean(df[c])]
 # For datetime we need to store that information as string
    elif(isinstance(df[c].iloc[0],datetime.datetime)):
        return [str(max(df[c])), str(min(df[c])), str(np.mean(df[c]))]
    else:
 # For categorical variables you can store the top 10 most frequent items and their frequency
        return list(df[c].value_counts()[:10])
 
 # declare a dictionary
 dict_ = {}
 for c in df.columns:
 # storing the column name, data type and content
  dict_[c] = {'column_name':c,'type':str(type(df[c].iloc[0])), 'variable_information':return_vals(df,c)}
 # After looping storing the information as a json dump that can be loaded
 # into a llama-index Document
 
 # Writing the information into dataframe.json
 
 with open("dataframe.json", "w") as fp:
    json.dump(dict_ ,fp)
 
 
 reader = JSONReader()
 # Load data from JSON file
 documents = reader.load_data(input_file='dataframe.json')
 
 # Creating an Index
 dataframe_index = VectorStoreIndex.from_documents(documents)

這樣第一步就完成了。

2、自定義樣式信息

表樣式主要包括關(guān)于如何在plot中設(shè)置不同圖表樣式的自然語言說明。這里需要使用自然語言描述樣式,所以可能需要進行嘗試,下面是我如何創(chuàng)建折線圖和條形圖的說明!

from llama_index.core import Document
 from llama_index.core import VectorStoreIndex
 
 styling_instructions =[Document(text="""
  Dont ignore any of these instructions.
        For a line chart always use plotly_white template, reduce x axes & y axes line to 0.2 & x & y grid width to 1.
        Always give a title and make bold using html tag axis label and try to use multiple colors if more than one line
        Annotate the min and max of the line
        Display numbers in thousand(K) or Million(M) if larger than 1000/100000
        Show percentages in 2 decimal points with '%' sign
        """
        )
        , Document(text="""
        Dont ignore any of these instructions.
        For a bar chart always use plotly_white template, reduce x axes & y axes line to 0.2 & x & y grid width to 1.
        Always give a title and make bold using html tag axis label and try to use multiple colors if more than one line
        Always display numbers in thousand(K) or Million(M) if larger than 1000/100000. Add annotations x values
        Annotate the values on the y variable
        If variable is a percentage show in 2 decimal points with '%' sign.
        """)
 
 
        # You should fill in instructions for other charts and play around with these instructions
        , Document(text=
          """ General chart instructions
        Do not ignore any of these instructions
          always use plotly_white template, reduce x & y axes line to 0.2 & x & y grid width to 1.
        Always give a title and make bold using html tag axis label
        Always display numbers in thousand(K) or Million(M) if larger than 1000/100000. Add annotations x values
        If variable is a percentage show in 2 decimal points with '%'""")
          ]
 # Creating an Index
 style_index = VectorStoreIndex.from_documents(styling_instructions)

或者直接將部分樣式的代碼作為示例輸入給模型,這樣對于固定的樣式是非常好的一個方式

構(gòu)建AI代理

我們上面已經(jīng)構(gòu)建了2個索引:DataFrame信息(元數(shù)據(jù)),表格自定義樣式信息

下面就可以使用lama- index從索引構(gòu)建查詢引擎并將其用作代理工具使用。

#All imports for this section
 from llama_index.core.agent import ReActAgent
 from llama_index.core.tools import QueryEngineTool
 from llama_index.core.tools import ToolMetadata
 from llama_index.llms.groq import Groq
 
 
 # Build query engines over your indexes
 # It makes sense to only retrieve one document per query
 # However, you may play around with this if you need multiple charts
 # Or have two or more dataframes with similar column names
 dataframe_engine = dataframe_index.as_query_engine(similarity_top_k=1)
 styling_engine = style_index.as_query_engine(similarity_top_k=1)
 
 # Builds the tools
 query_engine_tools = [
    QueryEngineTool(
        query_engine=dataframe_engine,
 # Provides the description which helps the agent decide which tool to use
        metadata=ToolMetadata(
            name="dataframe_index",
            descriptinotallow="Provides information about the data in the data frame. Only use column names in this tool",
        ),
 \
    ),
    QueryEngineTool(
 # Play around with the description to see if it leads to better results
        query_engine=styling_engine,
        metadata=ToolMetadata(
            name="Styling",
            descriptinotallow="Provides instructions on how to style your Plotly plots"
            "Use a detailed plain text question as input to the tool.",
        ),
    ),
 ]
 
 # I used open-source models via Groq but you can use OpenAI/Google/Mistral models as well
 llm = Groq(model="llama3-70b-8192", api_key="<your_api_key>")
 
 # initialize ReAct agent
 agent = ReActAgent.from_tools(query_engine_tools, llm=llm, verbose=True)

為了防止幻覺,我這里稍微調(diào)整了一下提示,這步不是必須的

這里是ReAct的默認提示

修改為:

圖片

from llama_index.core import PromptTemplate
 
 new_prompt_txt= """You are designed to help with building data visualizations in Plotly. You may do all sorts of analyses and actions using Python
 
 ## Tools
 
 You have access to a wide variety of tools. You are responsible for using the tools in any sequence you deem appropriate to complete the task at hand.
 This may require breaking the task into subtasks and using different tools to complete each subtask.
 
 You have access to the following tools, use these tools to find information about the data and styling:
 {tool_desc}
 
 
 ## Output Format
 
 Please answer in the same language as the question and use the following format:
 
 ```
 Thought: The current language of the user is: (user's language). I need to use a tool to help me answer the question.
 Action: tool name (one of {tool_names}) if using a tool.
 Action Input: the input to the tool, in a JSON format representing the kwargs (e.g. {{"input": "hello world", "num_beams": 5}})
 ```
 
 Please ALWAYS start with a Thought.
 
 Please use a valid JSON format for the Action Input. Do NOT do this {{'input': 'hello world', 'num_beams': 5}}.
 
 If this format is used, the user will respond in the following format:
 
 ```
 Observation: tool response
 ```
 
 You should keep repeating the above format till you have enough information to answer the question without using any more tools. At that point, you MUST respond in the one of the following two formats:
 
 ```
 Thought: I can answer without using any more tools. I'll use the user's language to answer
 Answer: [your answer here (In the same language as the user's question)]
 ```
 
 ```
 Thought: I cannot answer the question with the provided tools.
 Answer: [your answer here (In the same language as the user's question)]
 ```
 
 ## Current Conversation
 
 Below is the current conversation consisting of interleaving human and assistant messages."""
 
 # Adding the prompt text into PromptTemplate object
 new_prompt = PromptTemplate(new_prompt_txt)
 
 # Updating the prompt
 agent.update_prompts({'agent_worker:system_prompt':new_prompt})

可視化

現(xiàn)在讓就可以向我們構(gòu)建的代理發(fā)起請求了

response = agent.chat("Give Plotly code for a line chart for Number of Workers get information from the dataframe about the correct column names and make sure to style the plot properly and also give a title")

從輸出中可以看到代理如何分解請求并最終使用Python代碼進行響應(yīng)(可以直接構(gòu)建輸出解析器或復(fù)制過去并運行)。

通過運行以代碼創(chuàng)建的圖表,將注釋、標簽/標題和軸格式與樣式信息完全一致。因為已經(jīng)有了數(shù)據(jù)信息,所以我們直接提出要求就可以,不需要輸入任何的數(shù)據(jù)信息

在試一試其他的圖表,生成一個柱狀圖

結(jié)果如下:

總結(jié)

AI代理可以自動化從多個數(shù)據(jù)源收集、清洗和整合數(shù)據(jù)的過程。這意味著可以減少手動處理錯誤,提高數(shù)據(jù)處理速度,讓分析師有更多時間專注于解讀數(shù)據(jù)而不是處理數(shù)據(jù)。使用AI代理進行數(shù)據(jù)可視化能夠顯著提升數(shù)據(jù)分析的深度和廣度,同時提高效率和用戶體驗,幫助企業(yè)和組織更好地利用他們的數(shù)據(jù)資產(chǎn)。

我們這里只是做了第一步,如果要制作一套代理工具還需要很多步驟,比如可視化代碼的自動執(zhí)行,優(yōu)化提示和處理常見故障等等,如果你對這方面感興趣,可以留言,如果人多的話我們會在后續(xù)的文章中一一介紹。

責(zé)任編輯:華軒 來源: DeepHub IMBA
相關(guān)推薦

2022-08-26 09:15:58

Python可視化plotly

2021-07-02 14:07:00

可視化Plotly漏斗圖

2024-04-01 11:53:42

PlotlyPython數(shù)據(jù)可視化

2020-06-29 15:40:53

PlotlyPython數(shù)據(jù)可視化

2020-03-11 14:39:26

數(shù)據(jù)可視化地圖可視化地理信息

2025-04-01 08:30:00

Plotly數(shù)據(jù)可視化數(shù)據(jù)分析

2023-12-27 10:47:45

Flask數(shù)據(jù)可視化開發(fā)

2020-03-01 14:01:22

Echarts數(shù)據(jù)可視化圖表

2024-10-17 08:10:02

2023-11-24 14:02:00

Python數(shù)據(jù)分析

2017-10-14 13:54:26

數(shù)據(jù)可視化數(shù)據(jù)信息可視化

2020-10-22 08:05:46

Nginx

2025-02-14 08:18:33

2020-05-14 10:19:23

Python可視化分析

2020-03-23 14:55:52

Python可視化Plotly

2018-10-22 15:34:31

Spring Boo監(jiān)控視化

2013-05-07 14:56:27

大數(shù)據(jù)應(yīng)用工具數(shù)據(jù)中心網(wǎng)絡(luò)

2022-11-22 10:52:00

云計算工具

2015-08-20 10:00:45

可視化

2021-01-12 19:52:58

大數(shù)據(jù)大數(shù)應(yīng)用大數(shù)據(jù)可視化
點贊
收藏

51CTO技術(shù)棧公眾號