request
Table of Contents
1 todo list
1.1 TODO redirection and cookies
1.2 TODO read testrequest.py
1.3 TODO proxy
1.4 TODO adapter
1.5 TODO https
1.6 TODO post a multipart-encoded files
2 grocery
2.1 Transfer-Encoding: chunked
size\r\n data\r\n data\r\n … 0\r\n\r\n(end)
2.2 when the request url has unacceptable char, urllib3 will rase
ConnectionError: ('Connection aborted.', BadStatusLine("''",))
req = request.Request('GET', 'http://localhost:8000/?u=8') pre = pre.prepare() pre.url = u'http://localhost:8000/?u=8 8' request.session().send(pre)
Traceback (most recent call last): File "<pyshell#30>", line 1, in <module> s.send(pr) File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\Python27\lib\site-packages\requests\adapters.py", line 415, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', BadStatusLine("''",))
2.3 a sesion has two adapters,(each for http, https)
a adpters has a PoolManager and maybe one or two ProxyManager(each for http, https)
2.4 Redirection
2.5 r = requests.get('http://www.baidu.com')
create a Session s mount adapters https:// http:// to HTTPAdapter()
Session send a request s.request create a Request req send the request s.send adapter send the request, return the Response r
- adapter *
** CA certificate bundle try: from certifi import where except ImportError: def where(): """Return the preferred certificate bundle."""
return os.path.join(os.path.dirname(file), 'cacert.pem')
3 classes
3.1 Session
attrs = [ 'headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify', 'cert', 'prefetch', 'adapters', 'stream', 'trustenv', 'maxredirects', ]
METHOD:
request : preparerequest mergeenvironmentsetting send *send : getadapter resolveredirects *resolveredirects(generator) : rebuildproxies rebuildauth send (redirect=False) getadapter
get post delete head put patch options
mergeenvironmentsettings mount preparerequest rebuildauth rebuildproxies close
COOKIES:
- self.cookies = cookiejarfromdict({})
- cookies (a param)
preparerequest mergecookies self.cookies, cookies (param), RequestsCookieJar()
send extractcookiestojar(self.cookies, request, resp.raw)
resolveredirects extractcookiestojar(preparerequest.cookies, req, resp.raw preparedrequest.cookies.update(self.cookies) preparedrequest.preparecookies(preparedrequest.cookies)
ps : send also invocate Respose.content
3.2 Request
3.3 PreparedRequest
3.4 Respose
content (in bytes) text (in unicode) json
3.5 HTTPAdapter
send() conn = sellf.getconnection() resp = conn.urlopen() lowconn = conn.getconn lowconn.send() buildresponse()
buildresponse()
METHOD initpoolmanager
send : getconnection certverify requesturl addheaders buildresponse
getconnection : proxymanagerfor
addheaders buildresponse certverify close
proxyheaders proxymanagerfor requesturl
4 urllib3
** import urllib3
http = urllib3.PoolManager()
r = http.request('GET', 'http://google.com/')
print r.status,r.headers, r.data **
PoolManager (a collection of ConnectionPool objects.) ProxyManager
ConnectionPool (a pool of connections to a single host . is composed of a collection of httplib.HTTPConnection objects.)
Timeout Retry Stream
4.1
PoolManager,proxyManager中的pools类似字典,以(scheme, host, port)为key, 根据scheme的不同,对应值为HTTPConnection和HTTPSConnection.
主要的方法有: connectionfromhost(self, host, port=None, scheme='http') PoolManager直接根据(scheme,host,port)从pools取出ConnectPool,如 果没有就创建一个新的 proxyManager重写了该方法, 对scheme=https直接使用(scheme, host, port), 对scheme=http使用(porxy.scheme, proxy.host, porxy.port)
_newpool(self, scheme, host, port)
urlopen(self, method, url, redirect=True, **kw) 调用connectfromhost获得ConnectionPool,再调用ConnectionPool 和它同名的方法urlopen 并对重定向做了一些处理(requests没有使用)
4.2
HTTP(S)ConnectionPool 维持一个由HTTP(S)Connection组成的pool,一个thread-safe的队列,其中 每个Conneciton完全一样
HTTPConnectionPool维持代理信息, 但HTTPConnectPool不对代理做任何处 理 HTTPSConnectionPool在生成新的Connection时,以代理的主机名和端口作为 参数 ConnectionCls(proxy.host, proxy.port)
urlopen调用httplib.Connection.request 和 getresponse 如何有代理,调用self.prepareproxy()
4.3 Verified HTTPS with SSL/TLS
http = urllib3.PoolManager( certreqs='CERTREQUIRED', # Force certificate check. cacerts=certifi.where(), # Path to the Certifi bundle. )
cacarts = DEFAULTCABUNDLEPATH = certs.where() # requests/cert.py
5 httpbin
pip install httpbin pyhon -m httpbin.core
6 bug
6.1 编码错误
r = requests.get('http://www.baidu.com') print r.text
Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'gbk' codec can't encode character u'\xbb' in position 23594
illegal multibyte sequence
6.2 参数描述错误
requests/adapters.py
def proxyheaders(self, proxy): """Returns a dictionary of the headers to add to any request sent through a proxy. This works with urllib3 magic to ensure that they are correctly sent to the proxy, rather than in a tunnelled request if CONNECT is being used.
This should not be called from user code, and is only exposed for use when subclassing the
:class:`HTTPAdapter <requests.adapters.HTTPAdapter>`.
:param proxies: The url of the proxy being used for this request.
:param kwargs: Optional additional keyword arguments. """
7 test
python testrequests.py commit 9bbab338fdbb562b923ba2d8a80f0bfba697fa41
====================================================================
ERROR: testauthisstrippedonredirectoffhost (main.RequestsTestCase)
Traceback (most recent call last): File "testrequests.py", line 983, in testauthisstrippedonredirectoffho st auth=('user', 'pass'), File "C:\emacs\git\requests\requests\api.py", line 69, in get return request('get', url, params=params, **kwargs) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 594, in send history = [resp for resp in gen] if allowredirects else [] File "C:\emacs\git\requests\requests\sessions.py", line 196, in resolveredire cts **adapterkwargs File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 412, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(10060, ''))
====================================================================
ERROR: testdifferentencodingsdontbreakpost (main.RequestsTestCase)
Traceback (most recent call last): File "testrequests.py", line 501, in testdifferentencodingsdontbreakpost
files={'file': ('testrequests.py', open(file, 'rb'))}) File "C:\emacs\git\requests\requests\api.py", line 109, in post return request('post', url, data=data, json=json, **kwargs) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 605, in send r.content File "C:\emacs\git\requests\requests\models.py", line 734, in content self.content = bytes().join(self.itercontent(CONTENTCHUNKSIZE)) or bytes () File "C:\emacs\git\requests\requests\models.py", line 657, in generate for chunk in self.raw.stream(chunksize, decodecontent=True): File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 307, in stream data = self.read(amt=amt, decodecontent=decodecontent) File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 243, in read data = self.fp.read(amt) File "C:\Python27\lib\httplib.py", line 573, in read s = self.fp.read(amt) File "C:\Python27\lib\socket.py", line 380, in read data = self.sock.recv(left) error: [Errno 10054]
====================================================================
ERROR: testmixedcaseschemeacceptable (main.RequestsTestCase)
Traceback (most recent call last): File "testrequests.py", line 137, in testmixedcaseschemeacceptable r = s.send(r.prepare()) File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 428, in send raise SSLError(e, request=request) SSLError: EOF occurred in violation of protocol (ssl.c:581)
====================================================================
ERROR: testpreparedrequesthook (main.RequestsTestCase)
Traceback (most recent call last): File "testrequests.py", line 588, in testpreparedrequesthook resp = s.send(prep) File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 412, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(10060, ''))
====================================================================
ERROR: testrequestokset (main.RequestsTestCase)
Traceback (most recent call last): File "testrequests.py", line 459, in testrequestokset r = requests.get(httpbin('status', '404')) File "C:\emacs\git\requests\requests\api.py", line 69, in get return request('get', url, params=params, **kwargs) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 412, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(10060, ''))
====================================================================
ERROR: testunicodemethodname (main.RequestsTestCase)
Traceback (most recent call last): File "testrequests.py", line 539, in testunicodemethodname method=u('POST'), url=httpbin('post'), files=files) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 605, in send r.content File "C:\emacs\git\requests\requests\models.py", line 734, in content self.content = bytes().join(self.itercontent(CONTENTCHUNKSIZE)) or bytes () File "C:\emacs\git\requests\requests\models.py", line 657, in generate for chunk in self.raw.stream(chunksize, decodecontent=True): File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 307, in stream data = self.read(amt=amt, decodecontent=decodecontent) File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 243, in read data = self.fp.read(amt) File "C:\Python27\lib\httplib.py", line 573, in read s = self.fp.read(amt) File "C:\Python27\lib\socket.py", line 380, in read data = self.sock.recv(left) error: [Errno 10054]
====================================================================
ERROR: testexpiresvalidstr (main.TestMorselToCookieExpires)
Test case where we convert expires from string time.
Traceback (most recent call last): File "testrequests.py", line 1398, in testexpiresvalidstr cookie = morseltocookie(morsel) File "C:\emacs\git\requests\requests\cookies.py", line 425, in morseltocooki e time.strptime(morsel['expires'], timetemplate)) - time.timezone) OverflowError: mktime argument out of range
Ran 144 tests in 1028.720s
FAILED (errors=7
8 TLS/SSL
Transport Layer Security is successor of Secure Sockets Layer
TLS/SSL is a public/private key infrastructure with Digital signature and Digital Certificates technology involved.
8.1 Secure Transport Setup
8.1.1 establish TCP connecton
8.1.2 SSL Handshake
Exchange protocol version numbers Select a cipher that each side knows Authenticate the identity of each side Generate temporary session keys to encrypt the channel
8.1.3 Certificate Validation
认证过程是可选的,双向的。一般情况,客户端会对服务器端证书进行认 证,过程如下: Date check 检查证书是否过期 Signer trust check 判断证书的签发者是否可信(浏览器预装了受信的证书) Signature check 检查签名是否正确 hash(证书) == decode(signature, issuser-public-key) Site identity check 检查服务器域名与证书上的域名是否一致
8.2 证书制作
private key is created like this: openssl genrsa -out ryans-key.pem 2048
Certificates are public keys signed by a Certificate Authority orself-signed. The first step to getting a certificate is to create a "Certificate Signing Request" (CSR) file. openssl req -new -sha256 -key ryans-key.pem -out ryans-csr.pem
To create a self-signed certificate with the CSR openssl x509 -req -in ryans-csr.pem -signkey ryans-key.pem -out ryans-cert.pem
Alternatively you can send the CSR to a Certificate Authority for signing.
8.3 example
参见nodejs的 api document(TLS,HTTPS) 和 python的Library Reference(ssl)
node tls.createServer(options[, secureConnectionListener]) tls.connect(options[, callback]) options.key 私钥 options.cert 证书 options.ca 受信证书 options.rejectUnauthorized 是否拒绝任何未被认证的连接
python ssl.wrapsocket(sock, keyfile=None, certfile=None, serverside=False, certreqs=CERTNONE(certificates ignored), CERTREQUIRED (required and validated). cacerts=None, …)
9 remark
9.1
requests 的大部分功能都通过一个API解决 Session.request
这些参数一部分是为了构建请求request: 请求的方法method,url,query params,头headers,cookies;请求体 data,也可以是json格式,上传文件 files,简单的HTTP认证auth. 另一部分参数有超时控制timeout,代理proxies,SSL认证verify,cert,钩 子函数hooks,是否允许重定向allowredirects,还有一个参数stream不知 道干球用的
而这个函数是Session的一个实例函数,Session就像他的名字指示的一样, 模拟一个会话过程。一个session对象有着和 request 函数参数差不多的实 例变量,由此来控制会话期间每个请求的默认行为。
通过用户提供给request函数的参数先构造一个Request对象,在经过一些处 理(参数的检查,合并一些session属性…),生成preparedRequest对象, 选取合适 的适配器(http or https)发送请求,返回Response.
9.1.1 Session.request定义是:
def request(self, method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allowredirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None):
"""Constructs a :class:`Request <Request>`, prepares it and sends it. Returns :class:`Response <Response>` object.
:param method: method for the new :class:`Request` object.
:param url: URL for the new :class:`Request` object.
:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.
:param data: (optional) Dictionary or bytes to send in the body of the
:class:`Request`.
:param json: (optional) json to send in the body of the
:class:`Request`.
:param headers: (optional) Dictionary of HTTP Headers to send with the
:class:`Request`.
:param cookies: (optional) Dict or CookieJar object to send with the
:class:`Request`.
:param files: (optional) Dictionary of ``'filename': file-like-objects`` for multipart encoding upload.
:param auth: (optional) Auth tuple or callable to enable Basic/Digest/Custom HTTP Auth.
:param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a (`connect timeout, read timeout <user/advanced.html#timeouts>`_) tuple.
:type timeout: float or tuple
:param allowredirects: (optional) Set to True by default.
:type allowredirects: bool
:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.
:param stream: (optional) whether to immediately download the response content. Defaults to ``False``.
:param verify: (optional) if ``True``, the SSL cert will be verified. A CABUNDLE path can also be provided.
:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. """
9.2 requests 对代理的处理
直接将proxies参数传给adapter,由adapter使用urllib3.ProxyManager完 成后续工作
adapter.send(request, proxies) conn = adapter.getconnection(request.url, proxies) url = adapter.requesturl(request, proxies) resp = conn.urlopen(request.method, url)
两种情况 s = requests.session() s.proxies = {'http': '127.0.0.1:8000', 'https': '127.0.0.1:8000'} s.get('http://www.google.com/t') s.get('https://www.google.com/ts')
代理 import socket sock = socket.socket() sock.bind(('127.0.0.1',8000)) sock.listen(5) while 1: r,_ = sock.accept() print r.recv(65536) r.sendall('HTTP/1.0 200 OK\r\n\r\nHello World\r\n') r.close()
18832691616
Question: httplib.HTTPConnection 支持CONNECT tunnel urllib3.HTTPConnection urllib3.HTTPConnectionPool
9.3 urllib3.ProxyManager如何处理代理
>>> proxy = urllib3.ProxyManager('http://localhost:3128/') >>> r1 = proxy.request('GET', 'http://google.com/') >>> r2 = proxy.request('GET', 'http://httpbin.org/') >>> len(proxy.pools) 1 >>> r3 = proxy.request('GET', 'https://httpbin.org/') >>> r4 = proxy.request('GET', 'https://twitter.com/') >>> len(proxy.pools) 3
目的是http的,返回一个 HTTPConnectPool(proxy.host, proxy.port, connectionpoolkw) 并且发送请求时使用full URL
目的是https的,返回一个 HTTPSConnectPool(dest.host, dest.port, connectionpoolkw) connectpoolkw包含代理信息 VerifiedHTTPSConnect会调用tunnel使用代理 注意:httplib.HTTPSConnection中 self.tunnelhost 指的是目的服务器,而 self.host指的是代理服务器
def connectionfromhost(self, host, port=None, scheme='http'): if scheme == "https": return super(ProxyManager, self).connectionfromhost( host, port, scheme)
return super(ProxyManager, self).connectionfromhost( self.proxy.host, self.proxy.port, self.proxy.scheme)
9.4 requests 对HTTPS的处理
与HTTPS相关的参数有verify,cert
cert是客户端证书,不常用
verify的默认值是True,表示对服务器的证书进行认证,受信证书(CA)使用默认 值 DEFAULTCABUNDLEPATH(requests/cacert.pem)
verify的值为False,表示不对服务器证书进行认证,等于把ssl.wrapsocket的 参数certreqs设为ssl.CERTNONE
verify的值还可以是路径,表示对服务器的证书进行验证,并且受信证书(CA)使 用该路径指定的证书文件
VerifiedHTTPSConnection 对httplib.HTTPSConnection.connect方法进行了重写,使用ssl……
9.5 关于python的类变量和实例变量
类变量定义在所有方法之外,被所有实例对象所共享。但是一旦实例对象直 接对类变量修改或赋值,或者说实例对象创建了一个与类变量同名的实例变 量,这样就把类变量给覆盖了
class Dog:
kind = 'canine' # class variable shared by all instances
def _init_(self, name): self.name = name # instance variable unique to each instance
>>> d = Dog('Fido') >>> e = Dog('Buddy')
>>> print d.kind, e.kind canine canine
>>> e.kind = 'E' >>> Dog.kind = "wowo"
>>> print d.kind, e.kind wowo E
9.6 关于bound method 和 unbound method
在类中定义的函数是 unbound method,实例化对象后,这个方法成为这个对象 的bound method. 方法调用时,从类中搜索.
>>> class T: def f(self): print 'Hello World'
>>> t = T()
>>> T.f <unbound method T.f>
>>> t.f <bound method T.f of <_main__.T instance at 0x02780E18>>
>>> t.f() Hello World
>>> def f(self): print 'FUCK'
>>> T.f = f
>>> t.f() FUCK
将一个函数变为bound method,只需要调用函数对象的的_get_方法
>>> t.g = f._get_(t) >>> t.g() FUCK