自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<blockquote id="q1uld"></blockquote>

<strong id="q1uld"><button id="q1uld"></button></strong>

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

用數(shù)據(jù)分析網(wǎng)絡(luò)暴力有多可怕

作者：小F 2019-04-01 13:51:41

大數(shù)據(jù) 數(shù)據(jù)分析

故事源于潘長(zhǎng)江在某個(gè)綜藝節(jié)目上沒認(rèn)出蔡徐坤，然后潘長(zhǎng)江老師的微博評(píng)論區(qū)就炸鍋了。最后搞得兩邊都多多少少受到網(wǎng)絡(luò)暴力的影響。直至今日，這條微博的評(píng)論區(qū)還在更新著。不得不說(shuō)微博的黑粉，強(qiáng)行帶節(jié)奏，真的很可怕。

這應(yīng)該是一篇拖得蠻久的文章。

故事源于潘長(zhǎng)江在某個(gè)綜藝節(jié)目上沒認(rèn)出蔡徐坤，然后潘長(zhǎng)江老師的微博評(píng)論區(qū)就炸鍋了。

***搞得兩邊都多多少少受到網(wǎng)絡(luò)暴力的影響。

直至今日，這條微博的評(píng)論區(qū)還在更新著。

不得不說(shuō)微博的黑粉，強(qiáng)行帶節(jié)奏，真的很可怕。

還有比如自己一直關(guān)注的英雄聯(lián)盟。

上周王校長(zhǎng)也是被帶了一波節(jié)奏，源于姿態(tài)退役后又復(fù)出的一條微博。

本來(lái)是一句很普通的調(diào)侃回復(fù)，「離辣個(gè)傳奇adc的回歸，還遠(yuǎn)嗎?[二哈]」。

然后就有人開始帶王校長(zhǎng)的節(jié)奏，直接把王校長(zhǎng)給惹毛了。

上面這些事情，對(duì)于我這個(gè)吃瓜群眾，也沒什么好說(shuō)的。

只是希望以后能沒有那么多無(wú)聊的人去帶節(jié)奏，強(qiáng)行給他人帶來(lái)壓力。

本次通過(guò)獲取潘長(zhǎng)江老師那條微博的評(píng)論用戶信息，來(lái)分析一波。

一共是獲取了3天的評(píng)論，共14萬(wàn)條。

一、前期工作

微博評(píng)論信息獲取就不細(xì)說(shuō)，之前也講過(guò)了。

這里提一下用戶信息獲取，同樣從移動(dòng)端下手。

主要是獲取用戶的昵稱、性別、地區(qū)、微博數(shù)、關(guān)注數(shù)、粉絲數(shù)。

另外本次的數(shù)據(jù)存儲(chǔ)采用MySQL數(shù)據(jù)庫(kù)。

創(chuàng)建數(shù)據(jù)庫(kù)。

import pymysql 
 
db = pymysql.connect(host='127.0.0.1', user='root', password='774110919', port=3306) 
cursor = db.cursor() 
cursor.execute("CREATE DATABASE weibo DEFAULT CHARACTER SET utf8mb4") 
db.close()

創(chuàng)建表格以及設(shè)置字段信息。

import pymysql 
 
db = pymysql.connect(host='127.0.0.1', user='root', password='774110919', port=3306, db='weibo') 
cursor = db.cursor() 
sql = 'CREATE TABLE IF NOT EXISTS comments (user_id VARCHAR(255) NOT NULL, user_message VARCHAR(255) NOT NULL, weibo_message VARCHAR(255) NOT NULL, comment VARCHAR(255) NOT NULL, praise VARCHAR(255) NOT NULL, date VARCHAR(255) NOT NULL, PRIMARY KEY (comment, date))' 
cursor.execute(sql) 
db.close()

二、數(shù)據(jù)獲取

具體代碼如下。

from copyheaders import headers_raw_to_dict 
from bs4 import BeautifulSoup 
import requests 
import pymysql 
import re 
 
headers = b""" 
accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 
accept-encoding:gzip, deflate, br 
accept-language:zh-CN,zh;q=0.9 
cache-control:max-age=0 
cookie:你的參數(shù) 
upgrade-insecure-requests:1 
user-agent:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 
""" 
 
# 將請(qǐng)求頭字符串轉(zhuǎn)化為字典 
headers = headers_raw_to_dict(headers) 
 
 
def to_mysql(data): 
    """ 
    信息寫入mysql 
    """ 
    table = 'comments' 
    keys = ', '.join(data.keys()) 
    values = ', '.join(['%s'] * len(data)) 
    db = pymysql.connect(host='localhost', user='root', password='774110919', port=3306, db='weibo') 
    cursor = db.cursor() 
    sql = 'INSERT INTO {table}({keys}) VALUES ({values})'.format(table=table, keys=keys, values=values) 
    try: 
        if cursor.execute(sql, tuple(data.values())): 
            print("Successful") 
            db.commit() 
    except: 
        print('Failed') 
        db.rollback() 
    db.close() 
 
 
def get_user(user_id): 
    """ 
    獲取用戶信息 
    """ 
    try: 
        url_user = 'https://weibo.cn' + str(user_id) 
        response_user = requests.get(url=url_user, headers=headers) 
        soup_user = BeautifulSoup(response_user.text, 'html.parser') 
        # 用戶信息 
        re_1 = soup_user.find_all(class_='ut') 
        user_message = re_1[0].find(class_='ctt').get_text() 
        # 微博信息 
        re_2 = soup_user.find_all(class_='tip2') 
        weibo_message = re_2[0].get_text() 
        return (user_message, weibo_message) 
    except: 
        return ('未知', '未知') 
 
 
def get_message(): 
    # ***頁(yè)有熱門評(píng)論,拿取信息較麻煩,這里偷個(gè)懶~ 
    for i in range(2, 20000): 
        data = {} 
        print('第------------' + str(i) + '------------頁(yè)') 
        # 請(qǐng)求網(wǎng)址 
        url = 'https://weibo.cn/comment/Hl2O21Xw1?uid=1732460543&rl=0&page=' + str(i) 
        response = requests.get(url=url, headers=headers) 
        html = response.text 
        soup = BeautifulSoup(html, 'html.parser') 
        # 評(píng)論信息 
        comments = soup.find_all(class_='ctt') 
        # 點(diǎn)贊數(shù) 
        praises = soup.find_all(class_='cc') 
        # 評(píng)論時(shí)間 
        date = soup.find_all(class_='ct') 
        # 獲取用戶名 
        name = re.findall('id="C_.*?href="/.*?">(.*?)</a>', html) 
        # 獲取用戶ID 
        user_ids = re.findall('id="C_.*?href="(.*?)">(.*?)</a>', html) 
 
        for j in range(len(name)): 
            # 用戶ID 
            user_id = user_ids[j][0] 
            (user_message, weibo_message) = get_user(user_id) 
            data['user_id'] = " ".join(user_id.split()) 
            data['user_message'] = " ".join(user_message.split()) 
            data['weibo_message'] = " ".join(weibo_message.split()) 
            data['comment'] = " ".join(comments[j].get_text().split()) 
            data['praise'] = " ".join(praises[j * 2].get_text().split()) 
            data['date'] = " ".join(date[j].get_text().split()) 
            print(data) 
            # 寫入數(shù)據(jù)庫(kù)中 
            to_mysql(data) 
 
 
if __name__ == '__main__': 
    get_message()

***成功獲取評(píng)論信息。

3天14萬(wàn)條評(píng)論，著實(shí)可怕。

有時(shí)我不禁在想，到底是誰(shuí)天天會(huì)那么無(wú)聊去刷評(píng)論。

職業(yè)黑粉，職業(yè)水軍嗎?好像還真的有。

三、數(shù)據(jù)清洗

清洗代碼如下。

import pandas as pd 
import pymysql 
 
# 設(shè)置列名與數(shù)據(jù)對(duì)齊 
pd.set_option('display.unicode.ambiguous_as_wide', True) 
pd.set_option('display.unicode.east_asian_width', True) 
# 顯示10列 
pd.set_option('display.max_columns', 10) 
# 顯示10行 
pd.set_option('display.max_rows', 10) 
# 設(shè)置顯示寬度為500,這樣就不會(huì)在IDE中換行了 
pd.set_option('display.width', 2000) 
 
# 讀取數(shù)據(jù) 
conn = pymysql.connect(host='localhost', user='root', password='774110919', port=3306, db='weibo', charset='utf8mb4') 
cursor = conn.cursor() 
sql = "select * from comments" 
db = pd.read_sql(sql, conn) 
 
# 清洗數(shù)據(jù) 
df = db['user_message'].str.split(' ', expand=True) 
# 用戶名 
df['name'] = df[0] 
# 性別及地區(qū) 
df1 = df[1].str.split('/', expand=True) 
df['gender'] = df1[0] 
df['province'] = df1[1] 
# 用戶ID 
df['id'] = db['user_id'] 
# 評(píng)論信息 
df['comment'] = db['comment'] 
# 點(diǎn)贊數(shù) 
df['praise'] = db['praise'].str.extract('(\d+)').astype("int") 
# 微博數(shù),關(guān)注數(shù),粉絲數(shù) 
df2 = db['weibo_message'].str.split(' ', expand=True) 
df2 = df2[df2[0] != '未知'] 
df['tweeting'] = df2[0].str.extract('(\d+)').astype("int") 
df['follows'] = df2[1].str.extract('(\d+)').astype("int") 
df['followers'] = df2[2].str.extract('(\d+)').astype("int") 
# 評(píng)論時(shí)間 
df['time'] = db['date'].str.split(':', expand=True)[0] 
df['time'] = pd.Series([i+'時(shí)' for i in df['time']]) 
df['day'] = df['time'].str.split(' ', expand=True)[0] 
# 去除無(wú)用信息 
df = df.ix[:, 3:] 
df = df[df['name'] != '未知'] 
df = df[df['time'].str.contains("日")] 
# 隨機(jī)輸出10行數(shù)據(jù) 
print(df.sample(10))

輸出數(shù)據(jù)。

隨機(jī)輸出十條，就大致能看出評(píng)論區(qū)是什么畫風(fēng)了。

四、數(shù)據(jù)可視化

01 評(píng)論用戶性別情況

通過(guò)用戶ID對(duì)數(shù)據(jù)去重后，剩下約10萬(wàn)+用戶。

***張圖為所有用戶的性別情況，其中男性3萬(wàn)+，女性7萬(wàn)+。

這確實(shí)也符合蔡徐坤的粉絲群體。

第二張圖是因?yàn)橹翱吹健窤lfred數(shù)據(jù)室」對(duì)于蔡徐坤粉絲群體的分析。

提到了很多蔡徐坤的粉絲喜歡用帶有「坤、蔡、葵、kun」的昵稱。

所以將昵稱包含這些字的用戶提取出來(lái)。

果不其然，女性1.2萬(wàn)+，男性900+，更加符合了蔡徐坤的粉絲群體。

可視化代碼如下。

from pyecharts import Pie, Map, Line 
 
 
def create_gender(df): 
    # 全部用戶 
    # df = df.drop_duplicates('id') 
    # 包含關(guān)鍵字用戶 
    df = df[df['name'].str.contains("坤|蔡|葵|kun")].drop_duplicates('id') 
    # 分組匯總 
    gender_message = df.groupby(['gender']) 
    gender_com = gender_message['gender'].agg(['count']) 
    gender_com.reset_index(inplace=True) 
 
    # 生成餅圖 
    attr = gender_com['gender'] 
    v1 = gender_com['count'] 
    # pie = Pie("微博評(píng)論用戶的性別情況", title_pos='center', title_top=0) 
    # pie.add("", attr, v1, radius=[40, 75], label_text_color=None, is_label_show=True, legend_orient="vertical", legend_pos="left", legend_top="%10") 
    # pie.render("微博評(píng)論用戶的性別情況.html") 
    pie = Pie("微博評(píng)論用戶的性別情況(昵稱包含關(guān)鍵字)", title_pos='center', title_top=0) 
    pie.add("", attr, v1, radius=[40, 75], label_text_color=None, is_label_show=True, legend_orient="vertical", legend_pos="left", legend_top="%10") 
    pie.render("微博評(píng)論用戶的性別情況(昵稱包含關(guān)鍵字).html")

02 評(píng)論用戶區(qū)域分布

廣東以8000+的評(píng)論用戶居于首位，隨后則是北京、山東，江蘇，浙江，四川。

這里也與之前網(wǎng)易云音樂評(píng)論用戶的分布有點(diǎn)相似。

更加能說(shuō)明這幾個(gè)地方的網(wǎng)民不少。

可視化代碼如下。

def create_map(df): 
    # 全部用戶 
    df = df.drop_duplicates('id') 
    # 分組匯總 
    loc_message = df.groupby(['province']) 
    loc_com = loc_message['province'].agg(['count']) 
    loc_com.reset_index(inplace=True) 
 
    # 繪制地圖 
    value = [i for i in loc_com['count']] 
    attr = [i for i in loc_com['province']] 
    map = Map("微博評(píng)論用戶的地區(qū)分布圖", title_pos='center', title_top=0) 
    map.add("", attr, value, maptype="china", is_visualmap=True, visual_text_color="#000", is_map_symbol_show=False, visual_range=[0, 7000]) 
    map.render('微博評(píng)論用戶的地區(qū)分布圖.html')

03 評(píng)論用戶關(guān)注數(shù)分布

整體上符合常態(tài)，不過(guò)我也很好奇那些關(guān)注上千的用戶，是什么樣的一個(gè)存在。

可視化代碼如下。

def create_follows(df): 
    """ 
    生成評(píng)論用戶關(guān)注數(shù)情況 
    """ 
    df = df.drop_duplicates('id') 
    follows = df['follows'] 
    bins = [0, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000] 
    level = ['0-10', '10-20', '20-50', '50-100', '100-200', '200-500', '500-1000', '1000-2000', '2000-5000', '5000-10000', '10000以上'] 
    len_stage = pd.cut(follows, bins=bins, labels=level).value_counts().sort_index() 
    # 生成柱狀圖 
    attr = len_stage.index 
    v1 = len_stage.values 
    bar = Bar("評(píng)論用戶關(guān)注數(shù)分布情況", title_pos='center', title_top='18', width=800, height=400) 
    bar.add("", attr, v1, is_stack=True, is_label_show=True, xaxis_interval=0, xaxis_rotate=30) 
    bar.render("評(píng)論用戶關(guān)注數(shù)分布情況.html")

04 評(píng)論用戶粉絲數(shù)分布

這里發(fā)現(xiàn)粉絲數(shù)為「0-10」的用戶不少，估摸著應(yīng)該是水軍在作怪了。

粉絲數(shù)為「50-100」的用戶最多。

可視化代碼如下。

def create_follows(df): 
    """ 
    生成評(píng)論用戶關(guān)注數(shù)情況 
    """ 
    df = df.drop_duplicates('id') 
    follows = df['follows'] 
    bins = [0, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10000, 20000] 
    level = ['0-10', '10-20', '20-50', '50-100', '100-200', '200-500', '500-1000', '1000-2000', '2000-5000', '5000-10000', '10000以上'] 
    len_stage = pd.cut(follows, bins=bins, labels=level).value_counts().sort_index() 
    # 生成柱狀圖 
    attr = len_stage.index 
    v1 = len_stage.values 
    bar = Bar("評(píng)論用戶關(guān)注數(shù)分布情況", title_pos='center', title_top='18', width=800, height=400) 
    bar.add("", attr, v1, is_stack=True, is_label_show=True, xaxis_interval=0, xaxis_rotate=30) 
    bar.render("評(píng)論用戶關(guān)注數(shù)分布情況.html")

05 評(píng)論時(shí)間分布

潘老師是在17時(shí)發(fā)出微博的，但是那時(shí)并沒有大量的評(píng)論出現(xiàn)，那個(gè)小時(shí)一共有1237條評(píng)論。

直到蔡徐坤在18時(shí)評(píng)論后，微博的評(píng)論一下就上去了，24752條。

而且目前一半的評(píng)論都是在蔡徐坤的回復(fù)底下評(píng)論，點(diǎn)贊數(shù)多的也大多都在其中。

不得不說(shuō)蔡徐坤的粉絲力量真大，可怕可怕~

可視化代碼如下。

def creat_date(df): 
    # 分組匯總 
    date_message = df.groupby(['time']) 
    date_com = date_message['time'].agg(['count']) 
    date_com.reset_index(inplace=True) 
 
    # 繪制走勢(shì)圖 
    attr = date_com['time'] 
    v1 = date_com['count'] 
    line = Line("微博評(píng)論的時(shí)間分布", title_pos='center', title_top='18', width=800, height=400) 
    line.add("", attr, v1, is_smooth=True, is_fill=True, area_color="#000", xaxis_interval=24, is_xaxislabel_align=True, xaxis_min="dataMin", area_opacity=0.3, mark_point=["max"], mark_point_symbol="pin", mark_point_symbolsize=55) 
    line.render("微博評(píng)論的時(shí)間分布.html")

06 評(píng)論詞云

大體上言論還算好，沒有很偏激。

可視化代碼如下。

from wordcloud import WordCloud, ImageColorGenerator 
import matplotlib.pyplot as plt 
import jieba 
 
 
def create_wordcloud(df): 
    """ 
    生成評(píng)論詞云 
    """ 
    words = pd.read_csv('chineseStopWords.txt', encoding='gbk', sep='\t', names=['stopword']) 
    # 分詞 
    text = '' 
    for line in df['comment']: 
        line = line.split(':')[-1] 
        text += ' '.join(jieba.cut(str(line), cut_all=False)) 
    # 停用詞 
    stopwords = set('') 
    stopwords.update(words['stopword']) 
    backgroud_Image = plt.imread('article.jpg') 
    wc = WordCloud( 
        background_color='white', 
        mask=backgroud_Image, 
        font_path='C:\Windows\Fonts\華康儷金黑W8.TTF', 
        max_words=2000, 
        max_font_size=150, 
        min_font_size=15, 
        prefer_horizontal=1, 
        random_state=50, 
        stopwords=stopwords 
    ) 
    wc.generate_from_text(text) 
    img_colors = ImageColorGenerator(backgroud_Image) 
    wc.recolor(color_func=img_colors) 
    # 高詞頻詞語(yǔ) 
    process_word = WordCloud.process_text(wc, text) 
    sort = sorted(process_word.items(), key=lambda e: e[1], reverse=True) 
    print(sort[:50]) 
    plt.imshow(wc) 
    plt.axis('off') 
    wc.to_file("微博評(píng)論詞云.jpg") 
    print('生成詞云成功!')

五、總結(jié)

***，照例來(lái)扒一扒哪位用戶評(píng)論最多。

這位男性用戶，一共評(píng)論了90條，居于首位。

評(píng)論畫風(fēng)有點(diǎn)迷，是來(lái)攪局的嗎?

這位女性用戶，一共評(píng)論了80條。

大部分內(nèi)容都是圍繞黑粉去說(shuō)的。

這位女性用戶，一共評(píng)論了71條。

瘋狂與評(píng)論區(qū)互動(dòng)...

這位男性用戶，一共評(píng)論了68條。

也在與評(píng)論區(qū)互動(dòng)，不過(guò)大多數(shù)評(píng)論情感傾向都是偏消極的。

觀察了評(píng)論數(shù)最多的10名用戶，發(fā)現(xiàn)其中男性用戶的評(píng)論都是偏負(fù)面的，女性評(píng)論都是正面的。

好了，作為一名吃瓜群眾，我是看看就好，也就不發(fā)表什么言論了。

責(zé)任編輯：未麗燕來(lái)源：法納斯特

數(shù)據(jù)分析網(wǎng)絡(luò)暴力代碼

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)

<cite id="8di53"></cite>

<cite id="8di53"></cite>