Python爬蟲很強大,在爬蟲里如何自動操控瀏覽器呢?
概述:
python通過selenium爬取數(shù)據(jù)是很多突破封鎖的有效途徑。但在使用selenium中會遇到很多問題,本文就通過一問一答的形式來通熟易懂的普及如何通過selenium執(zhí)行javascript程序,進而獲取動態(tài)執(zhí)行后的網(wǎng)頁。如果你喜歡,歡迎轉(zhuǎn)發(fā)本文。
python爬蟲編程:用selenium執(zhí)行javascript出錯了,該咋改?
問題:
小明開始學(xué)習(xí)python爬蟲編程了,仿佛整個互聯(lián)網(wǎng)的數(shù)據(jù)都快被他納入囊中了。今天,他又試圖完成一個高難度動作,他想讓selenium中抓取到以下HTML后,并自動執(zhí)行js腳本,模仿鼠標(biāo)自動執(zhí)行一個點擊動作。但令他很失望的是,居然,居然,沒用!
- <div class="vbseo_liked">
- <a href="http://www.jamiiforums.com/member.php?u=8355" rel="nofollow">Nyaralego</a>
- ,
- <a href="http://www.jamiiforums.com/member.php?u=8870" rel="nofollow">Sikonge</a>
- ,
- <a href="http://www.jamiiforums.com/member.php?u=8979" rel="nofollow">Ab-Titchaz</a>
- and
- <a onclick="return vbseoui.others_click(this)" href="http://www.jamiiforums.com/kenyan-news/225589-kenyan-and-tanzanian-surburbs.html#">11 others</a>
- like this.
- </div>
這是他執(zhí)行的代碼。
- browser.execute_script("document.getElement(By.xpath(\"//div[@class='vbseo_liked']/a[contains(@onclick, 'return vbseoui.others_click(this)')]\").click()")
它沒用,沒有反應(yīng)。究竟做錯了什么?
Python大大的答案:
要點回答:
使用selenium查找元素并將其傳遞execute_script()給單擊:
- link = browser.find_element_by_xpath('//div[@class="vbseo_liked"]/a[contains(@onclick, "return vbseoui.others_click(this)")]')
- browser.execute_script('arguments[0].click();', link)
如果要從頭解決這問題,那么以下就是需要了解它的一系列事情:
- 如何使用JavaScript模擬點擊?
這就是我做的東西。這很簡單,但它有效:
- function eventFire(el, etype){
- if (el.fireEvent) {
- el.fireEvent('on' + etype);
- } else {
- var evObj = document.createEvent('Events');
- evObj.initEvent(etype, true, false);
- el.dispatchEvent(evObj);
- }
- }
用法:
- eventFire(document.getElementById('mytest1'), 'click');
- 如何在Python里進行模擬點擊呢?首先制定一個自定義的預(yù)期條件,等待元素被“執(zhí)行”:
- class wait_for_text_not_to_end_with(object):
- def __init__(self, locator, text):
- self.locator = locator
- self.text = text
- def __call__(self, driver):
- try :
- element_text = EC._find_element(driver, self.locator).text.strip()
- return not element_text.endswith(self.text)
- except StaleElementReferenceException:
- return False
定義完畢后,如何在程序里調(diào)用這個類呢?看看以下代碼:
- from selenium import webdriver
- from selenium.common.exceptions import StaleElementReferenceException
- from selenium.webdriver.common.by import By
- from selenium.webdriver.support.ui import WebDriverWait
- from selenium.webdriver.support import expected_conditions as EC
- class wait_for_text_not_to_end_with(object):
- def __init__(self, locator, text):
- self.locator = locator
- self.text = text
- def __call__(self, driver):
- try :
- element_text = EC._find_element(driver, self.locator).text.strip()
- return not element_text.endswith(self.text)
- except StaleElementReferenceException:
- return False
- browser = webdriver.PhantomJS()
- browser.maximize_window()
- browser.get("http://www.jamiiforums.com/kenyan-news/225589-kenyan-and-tanzanian-surburbs.html")
- username = browser.find_element_by_id("navbar_username")
- password = browser.find_element_by_name("vb_login_password_hint")
- username.send_keys("MarioP")
- password.send_keys("codeswitching")
- browser.find_element_by_class_name("loginbutton").click()
- wait = WebDriverWait(browser, 30)
- wait.until(EC.visibility_of_element_located((By.XPATH, '//h2[contains(., "Redirecting")]')))
- wait.until(EC.title_contains('Kenyan & Tanzanian'))
- wait.until(EC.visibility_of_element_located((By.ID, 'postlist')))
- # click "11 others" link
- link = browser.find_element_by_xpath('//div[@class="vbseo_liked"]/a[contains(@onclick, "return vbseoui.others_click(this)")]')
- link.click()
- browser.execute_script("""
- function eventFire(el, etype){
- if (el.fireEvent) {
- el.fireEvent('on' + etype);
- } else {
- var evObj = document.createEvent('Events');
- evObj.initEvent(etype, true, false);
- el.dispatchEvent(evObj);
- }
- }
- eventFire(arguments[0], "click");
- """, link)
- # wait for the "div" not to end with "11 others link this."
- wait.until(wait_for_text_not_to_end_with((By.CLASS_NAME, 'vbseo_liked'), "11 others like this."))
- print 'success!!'
- browser.close()
看,如何在python里通過selenium來爬取數(shù)據(jù)就是這么簡單。要點掌握好,開始編制自己的爬蟲吧。