自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

使用Python的urlliib.parse庫解析URL

作者：Darksun 2018-02-23 11:11:11

開發(fā) 后端

Python 中的 urllib.parse 模塊提供了很多解析和組建 URL 的函數(shù)。urlparse() 函數(shù)可以將 URL 解析成 ParseResult 對象。

Python 中的 urllib.parse 模塊提供了很多解析和組建 URL 的函數(shù)。

解析url

urlparse() 函數(shù)可以將 URL 解析成 ParseResult 對象。對象中包含了六個元素，分別為：

協(xié)議（scheme）
域名（netloc）
路徑（path）
路徑參數(shù)（params）
查詢參數(shù)（query）
片段（fragment）

from urllib.parse import urlparse
 
url='http://user:pwd@domain:80/path;params?query=queryarg#fragment'
 
parsed_result=urlparse(url)
 
print('parsed_result 包含了',len(parsed_result),'個元素')
print(parsed_result)

結(jié)果為:

parsed_result 包含了 6 個元素
ParseResult(scheme='http', netloc='user:pwd@domain:80', path='/path', params='params', query='query=queryarg', fragment='fragment')

ParseResult 繼承于 namedtuple，因此可以同時通過索引和命名屬性來獲取 URL 中各部分的值。

為了方便起見， ParseResult 還提供了 username、 password、 hostname、 port 對 netloc 進(jìn)一步進(jìn)行拆分。

print('scheme  :', parsed_result.scheme)
print('netloc  :', parsed_result.netloc)
print('path    :', parsed_result.path)
print('params  :', parsed_result.params)
print('query   :', parsed_result.query)
print('fragment:', parsed_result.fragment)
print('username:', parsed_result.username)
print('password:', parsed_result.password)
print('hostname:', parsed_result.hostname)
print('port    :', parsed_result.port)

結(jié)果為：

scheme  : http
netloc  : user:pwd@domain:80
path    : /path
params  : params
query   : query=queryarg
fragment: fragment
username: user
password: pwd
hostname: domain
port    : 80

除了 urlparse() 之外，還有一個類似的 urlsplit() 函數(shù)也能對 URL 進(jìn)行拆分，所不同的是， urlsplit() 并不會把 路徑參數(shù)(params) 從 路徑(path) 中分離出來。

當(dāng) URL 中路徑部分包含多個參數(shù)時，使用 urlparse() 解析是有問題的：

url='http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment'
 
parsed_result=urlparse(url)
 
print(parsed_result)
print('parsed.path    :', parsed_result.path)
print('parsed.params  :', parsed_result.params)

結(jié)果為：

ParseResult(scheme='http', netloc='user:pwd@domain:80', path='/path1;params1/path2', params='params2', query='query=queryarg', fragment='fragment')
parsed.path    : /path1;params1/path2
parsed.params  : params2

這時可以使用 urlsplit() 來解析：

from urllib.parse import urlsplit
split_result=urlsplit(url)
 
print(split_result)
print('split.path    :', split_result.path)
# SplitResult 沒有 params 屬性

結(jié)果為：

SplitResult(scheme='http', netloc='user:pwd@domain:80', path='/path1;params1/path2;params2', query='query=queryarg', fragment='fragment')
split.path    : /path1;params1/path2;params2

若只是要將 URL 后的 fragment 標(biāo)識拆分出來，可以使用 urldefrag() 函數(shù)：

from urllib.parse import urldefrag
 
url = 'http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment'
 
d = urldefrag(url)
print(d)
print('url     :', d.url)
print('fragment:', d.fragment)

結(jié)果為：

DefragResult(url='http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg', fragment='fragment')
url     : http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg
fragment: fragment

組建URL

ParsedResult 對象和 SplitResult 對象都有一個 geturl() 方法，可以返回一個完整的 URL 字符串。

print(parsed_result.geturl())
print(split_result.geturl())

結(jié)果為：

http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment
http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment

但是 geturl() 只在 ParsedResult 和 SplitResult 對象中有，若想將一個普通的元組組成 URL，則需要使用 urlunparse() 函數(shù)：

from urllib.parse import urlunparse
url_compos = ('http', 'user:pwd@domain:80', '/path1;params1/path2', 'params2', 'query=queryarg', 'fragment')
print(urlunparse(url_compos))

結(jié)果為：

http://user:pwd@domain:80/path1;params1/path2;params2?query=queryarg#fragment

相對路徑轉(zhuǎn)換絕對路徑

除此之外，urllib.parse 還提供了一個 urljoin() 函數(shù)，來將相對路徑轉(zhuǎn)換成絕對路徑的 URL。

from urllib.parse import urljoin
 
print(urljoin('http://www.example.com/path/file.html', 'anotherfile.html'))
print(urljoin('http://www.example.com/path/', 'anotherfile.html'))
print(urljoin('http://www.example.com/path/file.html', '../anotherfile.html'))
print(urljoin('http://www.example.com/path/file.html', '/anotherfile.html'))

結(jié)果為：

http://www.example.com/path/anotherfile.html
http://www.example.com/path/anotherfile.html
http://www.example.com/anotherfile.html
http://www.example.com/anotherfile.html

查詢參數(shù)的構(gòu)造和解析

使用 urlencode() 函數(shù)可以將一個 dict 轉(zhuǎn)換成合法的查詢參數(shù)：

from urllib.parse import urlencode
 
query_args = {
    'name': 'dark sun',
    'country': '中國'
}
 
query_args = urlencode(query_args)
print(query_args)

結(jié)果為：

name=dark+sun&country=%E4%B8%AD%E5%9B%BD

可以看到特殊字符也被正確地轉(zhuǎn)義了。

相對的，可以使用 parse_qs() 來將查詢參數(shù)解析成 dict。

from urllib.parse import parse_qs
print(parse_qs(query_args))

結(jié)果為：

{'name': ['dark sun'], 'country': ['中國']}

如果只是希望對特殊字符進(jìn)行轉(zhuǎn)義，那么可以使用 quote 或 quote_plus 函數(shù)，其中 quote_plus 比 quote 更激進(jìn)一些，會把 :、/ 一類的符號也給轉(zhuǎn)義了。

from urllib.parse import quote, quote_plus, urlencode
 
url = 'http://localhost:1080/~hello!/'
print('urlencode :', urlencode({'url': url}))
print('quote     :', quote(url))
print('quote_plus:', quote_plus(url))

結(jié)果為：

urlencode : url=http%3A%2F%2Flocalhost%3A1080%2F%7Ehello%21%2F
quote     : http%3A//localhost%3A1080/%7Ehello%21/
quote_plus: http%3A%2F%2Flocalhost%3A1080%2F%7Ehello%21%2F

可以看到 urlencode 中應(yīng)該是調(diào)用 quote_plus 來進(jìn)行轉(zhuǎn)義的。

逆向操作則使用 unquote 或 unquote_plus 函數(shù)：

from urllib.parse import unquote, unquote_plus
 
encoded_url = 'http%3A%2F%2Flocalhost%3A1080%2F%7Ehello%21%2F'
print(unquote(encoded_url))
print(unquote_plus(encoded_url))

結(jié)果為：

http://localhost:1080/~hello!/
http://localhost:1080/~hello!/

你會發(fā)現(xiàn) unquote 函數(shù)居然能正確地將 quote_plus 的結(jié)果轉(zhuǎn)換回來。

責(zé)任編輯：龐桂玉來源： Linux中國

Python Urllib URL

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<blockquote id="qt6lc"><p id="qt6lc"></p></blockquote><cite id="qt6lc"></cite>