自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<wbr id="acleu"><span id="acleu"></span></wbr>

<blockquote id="acleu"><p id="acleu"></p></blockquote>

<blockquote id="acleu"></blockquote>

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫

賬號(hào)設(shè)置退出

手把手教你采集京東銷售數(shù)據(jù)并做簡單的數(shù)據(jù)分析和可視化

作者：Python進(jìn)階者 2021-09-03 08:58:00

大數(shù)據(jù) 數(shù)據(jù)可視化

隨著移動(dòng)支付的普及，電商網(wǎng)站不斷涌現(xiàn)，由于電商網(wǎng)站產(chǎn)品太多，由用戶產(chǎn)生的評(píng)論數(shù)據(jù)就更多了，這次我們以京東為例，針對(duì)某一單品的評(píng)論數(shù)據(jù)進(jìn)行數(shù)據(jù)采集,并且做簡單數(shù)據(jù)分析。

[[421418]]

前言

大家好!我是古月星辰，大三本科生，數(shù)學(xué)專業(yè)，Python爬蟲愛好者一枚。今天給大家?guī)鞪D數(shù)據(jù)的簡單采集和可視化分析，希望大家可以喜歡。

一、目標(biāo)數(shù)據(jù)

隨著移動(dòng)支付的普及，電商網(wǎng)站不斷涌現(xiàn)，由于電商網(wǎng)站產(chǎn)品太多，由用戶產(chǎn)生的評(píng)論數(shù)據(jù)就更多了，這次我們以京東為例，針對(duì)某一單品的評(píng)論數(shù)據(jù)進(jìn)行數(shù)據(jù)采集,并且做簡單數(shù)據(jù)分析。

二、頁面分析

這個(gè)是某一手機(jī)頁面的詳情頁，對(duì)應(yīng)著手機(jī)的各種參數(shù)以及用戶評(píng)論信息，頁面URL是：

https://item.jd.com/10022971060622.html#none

然后通過分析找到評(píng)論數(shù)據(jù)對(duì)應(yīng)的數(shù)據(jù)接口，如下圖所示:

它的請(qǐng)求url:

https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_com 
ment98& productId=10022971060622 &score=0&sortType=5& page=0 &pageSize=10&isShadowSk 
u=0&fold=1

注意看到這兩個(gè)關(guān)鍵參數(shù)

1. productId: 每個(gè)商品有一個(gè)id

2. page: 對(duì)應(yīng)的評(píng)論分頁

三、解析數(shù)據(jù)

對(duì)評(píng)論數(shù)據(jù)的url發(fā)起請(qǐng)求:

url:https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comm 
ent98& productId=10022971060622 &score=0&sortType=5& page=0 &pageSize=10&isShado 
wSku=0&fold=1

json.cn 打開json數(shù)據(jù)(我們的評(píng)論數(shù)據(jù)是以json形式與頁面進(jìn)行交互傳輸?shù)?，如下圖所示:

分析可知,評(píng)論url中對(duì)應(yīng)十條評(píng)論數(shù)據(jù),對(duì)于每一條評(píng)論數(shù)據(jù),我們需要獲取3條數(shù)

據(jù),contents,color,size(注意到上圖的maxsize,100,也就是100*10=1000條評(píng)論)。

四、程序

1.導(dǎo)入相關(guān)庫

import  requests 
import  json 
import  time 
import  openpyxl  #第三方模塊，用于操作Excel文件的 
#模擬瀏覽器發(fā)送請(qǐng)求并獲取響應(yīng)結(jié)果 
import random

2.獲取評(píng)論數(shù)據(jù)

def get_comments(productId,page): 
    url='https://club.jd.com/comment/productPageComments.action?callback=fetchJSON_comment98&productId={0}&score=0&sortType=5&page={1}&pageSize=10&isShadowSku=0&fold=1'.format(productId,page) # 商品id 
    resp=requests.get(url,headers=headers) 
    #print(resp.text)  #響應(yīng)結(jié)果進(jìn)行顯示輸出 
    s1=resp.text.replace('fetchJSON_comment98(','') #fetchJSON_comment98( 
    s=s1.replace(');','') 
    #將str類型的數(shù)據(jù)轉(zhuǎn)成json格式的數(shù)據(jù) 
    # print(s,type(s)) 
    # print('*'*100) 
    res=json.loads(s) 
    print(type(res)) 
    return res

3.獲取最大頁數(shù)(也可以不寫)

def get_max_page(productId): 
    dic_data=get_comments(productId,0)  #調(diào)用剛才寫的函數(shù)，向服務(wù)器發(fā)送請(qǐng)求，獲取字典數(shù)據(jù) 
    return dic_data['maxPage']

4.提取數(shù)據(jù)

def get_info(productId): 
    #調(diào)用函數(shù)獲取商品的最大評(píng)論頁數(shù) 
    #max_page=get_max_page(productId) 
    # max_page=10 
    lst=[]  #用于存儲(chǔ)提取到的商品數(shù)據(jù) 
    for page in range(0,get_max_page(productId)):   #循環(huán)執(zhí)行次數(shù) 
        #獲取每頁的商品評(píng)論 
        comments=get_comments(productId,page) 
        comm_lst=comments['comments']   #根據(jù)key獲取value，根據(jù)comments獲取到評(píng)論的列表（每頁有10條評(píng)論） 
        #遍歷評(píng)論列表，分別獲取每條評(píng)論的中的內(nèi)容，顏色，鞋碼 
        for item in comm_lst:   #每條評(píng)論又分別是一個(gè)字典，再繼續(xù)根據(jù)key獲取值 
            content=item['content']  #獲取評(píng)論中的內(nèi)容 
            color=item['productColor'] #獲取評(píng)論中的顏色 
            size=item['productSize'] #鞋碼 
            lst.append([content,color,size])  #將每條評(píng)論的信息添加到列表中 
        time.sleep(3)  #延遲時(shí)間，防止程序執(zhí)行速度太快，被封IP 
    save(lst)  #調(diào)用自己編寫的函數(shù)，將列表中的數(shù)據(jù)進(jìn)行存儲(chǔ)

5.用于將爬取到的數(shù)據(jù)存儲(chǔ)到Excel中

def save(lst): 
    wk=openpyxl.Workbook () #創(chuàng)建工作薄對(duì)象 
    sheet=wk.active  #獲取活動(dòng)表 
    #遍歷列表，將列表中的數(shù)據(jù)添加到工作表中,列表中的一條數(shù)據(jù)，在Excel中是 一行 
    for item in lst: 
        sheet.append(item) 
    #保存到磁盤上 
    wk.save('銷售數(shù)據(jù).xlsx')

6.運(yùn)行程序

if __name__ == '__main__': 
    productId='10029693009906' # 單品id 
    get_info(productId)

五、簡單數(shù)據(jù)

1.簡單配置

# 導(dǎo)入相關(guān)庫 
import pandas as pd  
import matplotlib.pyplot as plt 
# 這兩行代碼解決 plt 中文顯示的問題 
plt.rcParams['font.sans-serif'] = ['SimHei'] 
plt.rcParams['axes.unicode_minus'] = False 
# 由于采集的時(shí)候沒有設(shè)置表頭,此處設(shè)置表頭 
data = pd.read_excel('./銷售數(shù)據(jù).xlsx', header=None, names = ['comments','color','intro'] ) #  
data.head()

2.手機(jī)顏色數(shù)量對(duì)比

x = ['白色','黑色','綠色','藍(lán)色','紅色','紫色'] 
y = [314,295,181,173,27,10] 
plt.bar(x,y) 
plt.title('各種顏色手機(jī)數(shù)量對(duì)比') 
plt.xlabel('顏色') 
plt.ylabel('數(shù)量') 
# plt.legend() # 顯示圖例 
plt.show()

可以看出用戶購買的手機(jī)白色和黑色的機(jī)型比較多.占據(jù)了60%多。3.評(píng)論詞云展示1)先要提取評(píng)論數(shù)據(jù)

import xlrd 
def strs(row): 
    values = ""; 
    for i in range(len(row)): 
        if i == len(row) - 1: 
            values = values + str(row[i]) 
        else: 
            values = values + str(row[i]) 
    return values 
# 打卡文件 
data = xlrd.open_workbook("./銷售數(shù)據(jù).xlsx") 
sqlfile = open("data.txt", "a")  # 文件讀寫方式是追加 
table = data.sheets()[0]  # 表頭 
nrows = table.nrows  # 行數(shù) 
ncols = table.ncols  # 列數(shù) 
colnames = table.row_values(1)  # 某一行數(shù)據(jù) 
# 打印出行數(shù)列數(shù) 
for ronum in range(1, nrows): 
        row = table.cell_value(rowx=ronum, colx = 0) #只需要修改你要讀取的列數(shù)-1 
        values = strs(row)  # 調(diào)用函數(shù)，將行數(shù)據(jù)拼接成字符串 
        sqlfile.writelines(values + "\n")  # 將字符串寫入新文件 
sqlfile.close()  # 關(guān)閉寫入的文件

2)詞云展示

# 導(dǎo)入相應(yīng)的庫 
import jieba 
from PIL import Image 
import numpy as np 
from wordcloud import WordCloud 
import matplotlib.pyplot as plt 
# 導(dǎo)入文本數(shù)據(jù)并進(jìn)行簡單的文本處理 
# 去掉換行符和空格 
text = open("./data.txt",encoding='gbk').read() 
text = text.replace('\n',"").replace("\u3000","") 
 
# 分詞，返回結(jié)果為詞的列表 
text_cut = jieba.lcut(text) 
# 將分好的詞用某個(gè)符號(hào)分割開連成字符串 
text_cut = ' '.join(text_cut)

注意: 這里我們不能使用encoding='uth-8'，會(huì)報(bào)出一個(gè)錯(cuò)誤:

> 'utf-8' codec can't decode byte 0xd3 in position 0: invalid continuation byte

所以我們需要改成 gbk。

word_list = jieba.cut(text) 
space_word_list = ' '.join(word_list) 
print(space_word_list) 
# 調(diào)用包PIL中的open方法，讀取圖片文件，通過numpy中的array方法生成數(shù)組 
mask_pic = np.array(Image.open("./xin.png")) 
word = WordCloud( 
    font_path='C:/Windows/Fonts/simfang.ttf',  # 設(shè)置字體，本機(jī)的字體 
    mask=mask_pic,  # 設(shè)置背景圖片 
    background_color='white',  # 設(shè)置背景顏色 
    max_font_size=150,  # 設(shè)置字體最大值 
    max_words=2000,  # 設(shè)置最大顯示字?jǐn)?shù) 
    stopwords={'的'}  # 設(shè)置停用詞，停用詞則不在詞云途中表示 
                 ).generate(space_word_list) 
image = word.to_image() 
word.to_file('2.png')  # 保存圖片 
image.show()

最后得到的效果圖，如下圖所示：

本文轉(zhuǎn)載自微信公眾號(hào)「Python爬蟲與數(shù)據(jù)挖掘」，可以通過以下二維碼關(guān)注。轉(zhuǎn)載本文請(qǐng)聯(lián)系Python爬蟲與數(shù)據(jù)挖掘公眾號(hào)。

責(zé)任編輯：武曉燕來源： Python爬蟲與數(shù)據(jù)挖掘

數(shù)據(jù)分析可視化

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<sub id="ycndj"><p id="ycndj"></p></sub>