利用Selenium批量下載100首網(wǎng)易云熱歌榜音樂(lè)

作者：游世九黎 2021-05-13 08:11:50

開(kāi)發(fā) 前端

今天的小demo我們使用的是selenium和xpath.函數(shù)式編程采集數(shù)據(jù).采集到的數(shù)據(jù)。

[[399227]]

本文轉(zhuǎn)載自微信公眾號(hào)「菜J學(xué)Python」，作者游世九黎。轉(zhuǎn)載本文請(qǐng)聯(lián)系菜J學(xué)Python公眾號(hào)。

今天的小demo我們使用的是selenium和xpath.函數(shù)式編程采集數(shù)據(jù).采集到的數(shù)據(jù)如圖所示。

01需求數(shù)據(jù)

網(wǎng)易云音樂(lè)新歌榜數(shù)據(jù)100首歌曲。

02頁(yè)面分析

首先這個(gè)頁(yè)面通過(guò)reuqests方法是無(wú)法獲取頁(yè)面數(shù)據(jù)的,所以我們這里使用selenium,xpath方法解析數(shù)據(jù)。

這個(gè)table標(biāo)簽裝了100首歌曲數(shù)據(jù),但是這個(gè)頁(yè)面是嵌在iframe標(biāo)簽中的,所以需要定位iframe標(biāo)簽,獲取到里面的的內(nèi)容。

url = "https://music.163.com/#/discover/toplist?id=3779629" # 新歌榜 
 
driver = webdriver.Chrome() 
 
driver.get(url) 
 
time.sleep(3) 
 
_iframe = driver.find_element_by_id('g_iframe') # 找到iframe標(biāo)簽 
 
driver.switch_to.frame(_iframe) 
 
time.sleep(1) 
 
page_text = driver.execute_script("return document.documentElement.outerHTML")

03解析數(shù)據(jù)

得到了iframe中的元素page_text,我們使用xpath。

html = etree.HTML(page_text) 
 
trs = html.xpath('//tr') 
id_list = [] 
song_name_list = [] 
singer_list = [] 
 
for tr in trs[1:]: 
    id = tr.xpath("./td[2]/div[1]/div[1]/span/@data-res-id")[0][-10:] #  
    id_list.append(id) 
    song_name = tr.xpath("./td[2]/div/div/div/span/a/b/@title")[0] 
    song_name_list.append(song_name) 
    print(id,"----",song_name)

04保存數(shù)據(jù)

base_url = 'http://music.163.com/song/media/outer/url?id={}.mp3' 
try: 
    for index,id in enumerate(id_list): 
        if index == 25: # 因?yàn)檫@個(gè)26首歌曲名非正常字符,要排除,否則報(bào)錯(cuò) 
            continue 
        file_name = song_name_list[index] 
        resp = requests.get(base_url.format(id)) 
        with open(r'HotMusic/'+ file_name + '.mp3','wb') as f: 
            f.write(resp.content) 
            print('歌曲:%s下載成功' % file_name) 
except Exception as error: 
    print(error)