如何使用 Streamlit 和 Python 構(gòu)建數(shù)據(jù)科學(xué)應(yīng)用程序？

作者：黃顯東 2021-10-29 16:18:14

Web 應(yīng)用程序仍然是數(shù)據(jù)科學(xué)家向用戶展示他們的數(shù)據(jù)科學(xué)項(xiàng)目的有用工具。

【51CTO.com快譯】Web 應(yīng)用程序仍然是數(shù)據(jù)科學(xué)家向用戶展示他們的數(shù)據(jù)科學(xué)項(xiàng)目的有用工具。由于我們可能沒(méi)有 Web 開發(fā)技能，因此我們可以使用 Streamlit 等開源 Python 庫(kù)在短時(shí)間內(nèi)輕松開發(fā) Web 應(yīng)用程序。

1. Streamlit 簡(jiǎn)介

Streamlit 是一個(gè)開源 Python 庫(kù)，用于為數(shù)據(jù)科學(xué)和機(jī)器學(xué)習(xí)項(xiàng)目創(chuàng)建和共享 Web 應(yīng)用程序。該庫(kù)可以幫助您使用幾行代碼在幾分鐘內(nèi)創(chuàng)建和部署數(shù)據(jù)科學(xué)解決方案。

Streamlit 可以與數(shù)據(jù)科學(xué)中使用的其他流行的 Python 庫(kù)無(wú)縫集成，例如 NumPy、Pandas、Matplotlib、Scikit-learn 等等。

注意：Streamlit 使用 React 作為前端框架來(lái)在屏幕上呈現(xiàn)數(shù)據(jù)。

2. 安裝和設(shè)置

Streamlit 在您的機(jī)器中需要 python >= 3.7 版本。

要安裝 streamlit，您需要在終端中運(yùn)行以下命令。

pip install streamlit

您還可以使用以下命令檢查您機(jī)器上安裝的版本。

streamlit --version

流線型，版本 1.1.0

成功安裝streamlit后，您可以通過(guò)在終端中運(yùn)行以下命令來(lái)測(cè)試庫(kù)。

streamlit hello

Streamlit 的 Hello 應(yīng)用程序?qū)⒊霈F(xiàn)在您的網(wǎng)絡(luò)瀏覽器的新選項(xiàng)卡中。

???

這表明一切運(yùn)行正常，我們可以繼續(xù)使用 Streamlit 創(chuàng)建我們的第一個(gè) Web 應(yīng)用程序。

3. 開發(fā) Web 應(yīng)用程序

在這一部分，我們將部署經(jīng)過(guò)訓(xùn)練的 NLP 模型來(lái)預(yù)測(cè)電影評(píng)論的情緒（正面或負(fù)面）。您可以在[此處](https://hackernoon.com/how-to-build-and-deploy-an-nlp-model-with-fastapi-part-1-n5w35cj?ref=hackernoon.com)訪問(wèn)源代碼和數(shù)據(jù)集。

數(shù)據(jù)科學(xué) Web 應(yīng)用程序?qū)@示一個(gè)文本字段以添加電影評(píng)論和一個(gè)簡(jiǎn)單按鈕以提交評(píng)論并進(jìn)行預(yù)測(cè)。

導(dǎo)入重要包

第一步是創(chuàng)建一個(gè)名為 app.py 的 python 文件，然后為 streamlit 和訓(xùn)練的 NLP 模型導(dǎo)入所需的 python 包。

# import packages  import streamlit as st  import os  import numpy as np    from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer    # text preprocessing modules  from string import punctuation    # text preprocessing modules  from nltk.tokenize import word_tokenize    import nltk  from nltk.corpus import stopwords  from nltk.stem import WordNetLemmatizer  import re # regular expression  import joblib    import warnings    warnings.filterwarnings("ignore")  # seeding  np.random.seed(123)    # load stop words  stop_words = stopwords.words("english")

清理評(píng)論的功能

評(píng)論可能包含我們?cè)谶M(jìn)行預(yù)測(cè)時(shí)不需要的不必要的單詞和字符。

我們將通過(guò)刪除停用詞、數(shù)字和標(biāo)點(diǎn)符號(hào)來(lái)清理評(píng)論。然后我們將使用 NLTK 包中的詞形還原過(guò)程將每個(gè)單詞轉(zhuǎn)換為其基本形式。

該**text_cleaning（）**函數(shù)將處理所有必要的步驟進(jìn)行預(yù)測(cè)之前清理我們的審查。

# function to clean the text  @st.cache  def text_cleaning(text, remove_stop_words=True, lemmatize_words=True):  # Clean the text, with the option to remove stop_words and to lemmatize word    # Clean the text  text = re.sub(r"[^A-Za-z0-9]", " ", text)  text = re.sub(r"\'s", " ", text)  text = re.sub(r"http\S+", " link ", text)  text = re.sub(r"\b\d+(?:\.\d+)?\s+", "", text) # remove numbers    # Remove punctuation from text  text = "".join([c for c in text if c not in punctuation])    # Optionally, remove stop words  if remove_stop_words:  texttexttext = text.split()  text = [w for w in text if not w in stop_words]  text = " ".join(text)    # Optionally, shorten words to their stems  if lemmatize_words:  texttexttext = text.split()  lemmatizer = WordNetLemmatizer()  lemmatized_words = [lemmatizer.lemmatize(word) for word in text]  text = " ".join(lemmatized_words)    # Return a list of words  return text

預(yù)測(cè)功能

名為**make_prediction()**的 python 函數(shù)將執(zhí)行以下任務(wù)。

1. 收到審查并清理它。

2. 加載經(jīng)過(guò)訓(xùn)練的 NLP 模型。

3. 做個(gè)預(yù)測(cè)。

4. 估計(jì)預(yù)測(cè)的概率。

5. 最后，它將返回預(yù)測(cè)的類別及其概率。

# functon to make prediction  @st.cache  def make_prediction(review):    # clearn the data  clean_review = text_cleaning(review)    # load the model and make prediction  model = joblib.load("sentiment_model_pipeline.pkl")    # make prection  result = model.predict([clean_review])    # check probabilities  probas = model.predict_proba([clean_review])  probability = "{:.2f}".format(float(probas[:, result]))    return result, probability

**注意：**如果訓(xùn)練后的 NLP 模型預(yù)測(cè)為 1，則表示 Positive，如果預(yù)測(cè)為 0，則表示 Negative。

**創(chuàng)建應(yīng)用標(biāo)題和描述**

您可以使用 streamlit 中的 title() 和 write() 方法創(chuàng)建 Web 應(yīng)用程序的標(biāo)題及其描述。

# Set the app title  st.title("Sentiment Analyisis App")  st.write(  "A simple machine laerning app to predict the sentiment of a movie's review"  )

要顯示 Web 應(yīng)用程序，您需要在終端中運(yùn)行以下命令。

streamlit run app.py

然后您將看到 Web 應(yīng)用程序自動(dòng)在您的 Web 瀏覽器中彈出，或者您可以使用創(chuàng)建的本地 URL http://localhost:8501。

???

創(chuàng)建表格以接收電影評(píng)論

下一步是使用 streamlit 創(chuàng)建一個(gè)簡(jiǎn)單的表單。表單將顯示一個(gè)文本字段來(lái)添加您的評(píng)論，在文本字段下方，它將顯示一個(gè)簡(jiǎn)單的按鈕來(lái)提交添加的評(píng)論，然后進(jìn)行預(yù)測(cè)。

# Declare a form to receive a movie's review  form = st.form(key="my_form")  review = form.text_input(label="Enter the text of your movie review")  submit = form.form_submit_button(label="Make Prediction")

現(xiàn)在，您可以在 Web 應(yīng)用程序上看到該表單。

???

進(jìn)行預(yù)測(cè)并顯示結(jié)果

我們的最后一段代碼是在用戶添加電影評(píng)論并單擊表單部分上的“進(jìn)行預(yù)測(cè)”按鈕時(shí)進(jìn)行預(yù)測(cè)并顯示結(jié)果。

單擊按鈕后，Web 應(yīng)用程序?qū)⑦\(yùn)行**make_prediction()**函數(shù)并在瀏覽器中的 Web 應(yīng)用程序上顯示結(jié)果。

if submit:  # make prediction from the input text  result, probability = make_prediction(review)    # Display results of the NLP task  st.header("Results")    if int(result) == 1:  st.write("This is a positive review with a probabiliy of ", probability)  else:  st.write("This is a negative review with a probabiliy of ", probability)

4. 測(cè)試 Web 應(yīng)用程序

通過(guò)幾行代碼，我們創(chuàng)建了一個(gè)簡(jiǎn)單的數(shù)據(jù)科學(xué)網(wǎng)絡(luò)應(yīng)用程序，它可以接收電影評(píng)論并預(yù)測(cè)它是正面評(píng)論還是負(fù)面評(píng)論。

要測(cè)試 Web 應(yīng)用程序，請(qǐng)通過(guò)添加您選擇的電影評(píng)論來(lái)填充文本字段。我添加了以下關(guān)于 **扎克·施奈德**2021 年上映**的正義聯(lián)盟**電影的影評(píng)。

> “我從頭到尾都很喜歡這部電影。就像雷·費(fèi)舍爾說(shuō)的，我希望這部電影不要結(jié)束。乞討的場(chǎng)景令人興奮，非常喜歡那個(gè)場(chǎng)景。不像電影《正義聯(lián)盟》那樣展示每個(gè)英雄最擅長(zhǎng)自己的事情，讓我們熱愛(ài)每個(gè)角色。謝謝，扎克和整個(gè)團(tuán)隊(duì)?！?/p>

然后單擊進(jìn)行預(yù)測(cè)按鈕并查看結(jié)果。

???