request

Table of Contents

1 todo list

1.1 TODO redirection and cookies

1.2 TODO read testrequest.py

1.3 TODO proxy

1.4 TODO adapter

1.5 TODO https

1.6 TODO post a multipart-encoded files

2 grocery

requests urllib3 httplib (lib/httplib.py)

2.1 Transfer-Encoding: chunked

size\r\n data\r\n data\r\n … 0\r\n\r\n(end)

2.2 when the request url has unacceptable char, urllib3 will rase

ConnectionError: ('Connection aborted.', BadStatusLine("''",))

req = request.Request('GET', 'http://localhost:8000/?u=8') pre = pre.prepare() pre.url = u'http://localhost:8000/?u=8 8' request.session().send(pre)

Traceback (most recent call last): File "<pyshell#30>", line 1, in <module> s.send(pr) File "C:\Python27\lib\site-packages\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\Python27\lib\site-packages\requests\adapters.py", line 415, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', BadStatusLine("''",))


2.3 a sesion has two adapters,(each for http, https)

a adpters has a PoolManager and maybe one or two ProxyManager(each for http, https)

2.4 Redirection

s = requests.session() r = s.get('http://httpbin.org/relative-redirect/2')

resp1 = r.history1 resp2 = r.history2 resp3 = r

req1 = resp1.request req2 = resp2.request req3 = resp3.request

[r for r in s.resolveredirects(resp1, req1)]

2.5 r = requests.get('http://www.baidu.com')

create a Session s mount adapters https:// http:// to HTTPAdapter()

Session send a request s.request create a Request req send the request s.send adapter send the request, return the Response r

  • adapter *

** CA certificate bundle try: from certifi import where except ImportError: def where(): """Return the preferred certificate bundle."""

return os.path.join(os.path.dirname(file), 'cacert.pem')

3 classes

3.1 Session

attrs = [ 'headers', 'cookies', 'auth', 'proxies', 'hooks', 'params', 'verify', 'cert', 'prefetch', 'adapters', 'stream', 'trustenv', 'maxredirects', ]

METHOD:

request : preparerequest mergeenvironmentsetting send *send : getadapter resolveredirects *resolveredirects(generator) : rebuildproxies rebuildauth send (redirect=False) getadapter

get post delete head put patch options

mergeenvironmentsettings mount preparerequest rebuildauth rebuildproxies close

COOKIES:

  1. self.cookies = cookiejarfromdict({})
  2. cookies (a param)

preparerequest mergecookies self.cookies, cookies (param), RequestsCookieJar()

send extractcookiestojar(self.cookies, request, resp.raw)

resolveredirects extractcookiestojar(preparerequest.cookies, req, resp.raw preparedrequest.cookies.update(self.cookies) preparedrequest.preparecookies(preparedrequest.cookies)

ps : send also invocate Respose.content

3.2 Request

3.3 PreparedRequest

3.4 Respose

content (in bytes) text (in unicode) json

3.5 HTTPAdapter

send() conn = sellf.getconnection() resp = conn.urlopen() lowconn = conn.getconn lowconn.send() buildresponse()

buildresponse()

METHOD initpoolmanager

send : getconnection certverify requesturl addheaders buildresponse

getconnection : proxymanagerfor

addheaders buildresponse certverify close

proxyheaders proxymanagerfor requesturl

4 urllib3

** import urllib3

http = urllib3.PoolManager()

r = http.request('GET', 'http://google.com/')

print r.status,r.headers, r.data **

PoolManager (a collection of ConnectionPool objects.) ProxyManager

ConnectionPool (a pool of connections to a single host . is composed of a collection of httplib.HTTPConnection objects.)

Timeout Retry Stream

4.1

PoolManager,proxyManager中的pools类似字典,以(scheme, host, port)为key, 根据scheme的不同,对应值为HTTPConnection和HTTPSConnection.

主要的方法有: connectionfromhost(self, host, port=None, scheme='http') PoolManager直接根据(scheme,host,port)从pools取出ConnectPool,如 果没有就创建一个新的 proxyManager重写了该方法, 对scheme=https直接使用(scheme, host, port), 对scheme=http使用(porxy.scheme, proxy.host, porxy.port)

_newpool(self, scheme, host, port)

urlopen(self, method, url, redirect=True, **kw) 调用connectfromhost获得ConnectionPool,再调用ConnectionPool 和它同名的方法urlopen 并对重定向做了一些处理(requests没有使用)

4.2

HTTP(S)ConnectionPool 维持一个由HTTP(S)Connection组成的pool,一个thread-safe的队列,其中 每个Conneciton完全一样

HTTPConnectionPool维持代理信息, 但HTTPConnectPool不对代理做任何处 理 HTTPSConnectionPool在生成新的Connection时,以代理的主机名和端口作为 参数 ConnectionCls(proxy.host, proxy.port)

urlopen调用httplib.Connection.request 和 getresponse 如何有代理,调用self.prepareproxy()

4.3 Verified HTTPS with SSL/TLS

http = urllib3.PoolManager( certreqs='CERTREQUIRED', # Force certificate check. cacerts=certifi.where(), # Path to the Certifi bundle. )

cacarts = DEFAULTCABUNDLEPATH = certs.where() # requests/cert.py

5 httpbin

pip install httpbin pyhon -m httpbin.core

6 bug

6.1 编码错误

r = requests.get('http://www.baidu.com') print r.text

Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'gbk' codec can't encode character u'\xbb' in position 23594

illegal multibyte sequence

6.2 参数描述错误

requests/adapters.py


def proxyheaders(self, proxy): """Returns a dictionary of the headers to add to any request sent through a proxy. This works with urllib3 magic to ensure that they are correctly sent to the proxy, rather than in a tunnelled request if CONNECT is being used.

This should not be called from user code, and is only exposed for use when subclassing the

:class:`HTTPAdapter <requests.adapters.HTTPAdapter>`.

:param proxies: The url of the proxy being used for this request.

:param kwargs: Optional additional keyword arguments. """

7 test

python testrequests.py commit 9bbab338fdbb562b923ba2d8a80f0bfba697fa41 ==================================================================== ERROR: testauthisstrippedonredirectoffhost (main.RequestsTestCase)


Traceback (most recent call last): File "testrequests.py", line 983, in testauthisstrippedonredirectoffho st auth=('user', 'pass'), File "C:\emacs\git\requests\requests\api.py", line 69, in get return request('get', url, params=params, **kwargs) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 594, in send history = [resp for resp in gen] if allowredirects else [] File "C:\emacs\git\requests\requests\sessions.py", line 196, in resolveredire cts **adapterkwargs File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 412, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(10060, ''))

==================================================================== ERROR: testdifferentencodingsdontbreakpost (main.RequestsTestCase)


Traceback (most recent call last): File "testrequests.py", line 501, in testdifferentencodingsdontbreakpost

files={'file': ('testrequests.py', open(file, 'rb'))}) File "C:\emacs\git\requests\requests\api.py", line 109, in post return request('post', url, data=data, json=json, **kwargs) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 605, in send r.content File "C:\emacs\git\requests\requests\models.py", line 734, in content self.content = bytes().join(self.itercontent(CONTENTCHUNKSIZE)) or bytes () File "C:\emacs\git\requests\requests\models.py", line 657, in generate for chunk in self.raw.stream(chunksize, decodecontent=True): File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 307, in stream data = self.read(amt=amt, decodecontent=decodecontent) File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 243, in read data = self.fp.read(amt) File "C:\Python27\lib\httplib.py", line 573, in read s = self.fp.read(amt) File "C:\Python27\lib\socket.py", line 380, in read data = self.sock.recv(left) error: [Errno 10054]

==================================================================== ERROR: testmixedcaseschemeacceptable (main.RequestsTestCase)


Traceback (most recent call last): File "testrequests.py", line 137, in testmixedcaseschemeacceptable r = s.send(r.prepare()) File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 428, in send raise SSLError(e, request=request) SSLError: EOF occurred in violation of protocol (ssl.c:581)

==================================================================== ERROR: testpreparedrequesthook (main.RequestsTestCase)


Traceback (most recent call last): File "testrequests.py", line 588, in testpreparedrequesthook resp = s.send(prep) File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 412, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(10060, ''))

==================================================================== ERROR: testrequestokset (main.RequestsTestCase)


Traceback (most recent call last): File "testrequests.py", line 459, in testrequestokset r = requests.get(httpbin('status', '404')) File "C:\emacs\git\requests\requests\api.py", line 69, in get return request('get', url, params=params, **kwargs) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 573, in send r = adapter.send(request, **kwargs) File "C:\emacs\git\requests\requests\adapters.py", line 412, in send raise ConnectionError(err, request=request) ConnectionError: ('Connection aborted.', error(10060, ''))

==================================================================== ERROR: testunicodemethodname (main.RequestsTestCase)


Traceback (most recent call last): File "testrequests.py", line 539, in testunicodemethodname method=u('POST'), url=httpbin('post'), files=files) File "C:\emacs\git\requests\requests\api.py", line 50, in request response = session.request(method=method, url=url, **kwargs) File "C:\emacs\git\requests\requests\sessions.py", line 465, in request resp = self.send(prep, **sendkwargs) File "C:\emacs\git\requests\requests\sessions.py", line 605, in send r.content File "C:\emacs\git\requests\requests\models.py", line 734, in content self.content = bytes().join(self.itercontent(CONTENTCHUNKSIZE)) or bytes () File "C:\emacs\git\requests\requests\models.py", line 657, in generate for chunk in self.raw.stream(chunksize, decodecontent=True): File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 307, in stream data = self.read(amt=amt, decodecontent=decodecontent) File "C:\emacs\git\requests\requests\packages\urllib3\response.py", line 243, in read data = self.fp.read(amt) File "C:\Python27\lib\httplib.py", line 573, in read s = self.fp.read(amt) File "C:\Python27\lib\socket.py", line 380, in read data = self.sock.recv(left) error: [Errno 10054]

==================================================================== ERROR: testexpiresvalidstr (main.TestMorselToCookieExpires) Test case where we convert expires from string time.


Traceback (most recent call last): File "testrequests.py", line 1398, in testexpiresvalidstr cookie = morseltocookie(morsel) File "C:\emacs\git\requests\requests\cookies.py", line 425, in morseltocooki e time.strptime(morsel['expires'], timetemplate)) - time.timezone) OverflowError: mktime argument out of range


Ran 144 tests in 1028.720s

FAILED (errors=7

8 TLS/SSL

Transport Layer Security is successor of Secure Sockets Layer

TLS/SSL is a public/private key infrastructure with Digital signature and Digital Certificates technology involved.

8.1 Secure Transport Setup

8.1.1 establish TCP connecton

8.1.2 SSL Handshake

Exchange protocol version numbers Select a cipher that each side knows Authenticate the identity of each side Generate temporary session keys to encrypt the channel

8.1.3 Certificate Validation

认证过程是可选的,双向的。一般情况,客户端会对服务器端证书进行认 证,过程如下: Date check 检查证书是否过期 Signer trust check 判断证书的签发者是否可信(浏览器预装了受信的证书) Signature check 检查签名是否正确 hash(证书) == decode(signature, issuser-public-key) Site identity check 检查服务器域名与证书上的域名是否一致

8.2 证书制作

private key is created like this: openssl genrsa -out ryans-key.pem 2048

Certificates are public keys signed by a Certificate Authority orself-signed. The first step to getting a certificate is to create a "Certificate Signing Request" (CSR) file. openssl req -new -sha256 -key ryans-key.pem -out ryans-csr.pem

To create a self-signed certificate with the CSR openssl x509 -req -in ryans-csr.pem -signkey ryans-key.pem -out ryans-cert.pem

Alternatively you can send the CSR to a Certificate Authority for signing.

8.3 example

参见nodejs的 api document(TLS,HTTPS) 和 python的Library Reference(ssl)

node tls.createServer(options[, secureConnectionListener]) tls.connect(options[, callback]) options.key 私钥 options.cert 证书 options.ca 受信证书 options.rejectUnauthorized 是否拒绝任何未被认证的连接

python ssl.wrapsocket(sock, keyfile=None, certfile=None, serverside=False, certreqs=CERTNONE(certificates ignored), CERTREQUIRED (required and validated). cacerts=None, …)

9 remark

9.1

requests 的大部分功能都通过一个API解决 Session.request

这些参数一部分是为了构建请求request: 请求的方法method,url,query params,头headers,cookies;请求体 data,也可以是json格式,上传文件 files,简单的HTTP认证auth. 另一部分参数有超时控制timeout,代理proxies,SSL认证verify,cert,钩 子函数hooks,是否允许重定向allowredirects,还有一个参数stream不知 道干球用的

而这个函数是Session的一个实例函数,Session就像他的名字指示的一样, 模拟一个会话过程。一个session对象有着和 request 函数参数差不多的实 例变量,由此来控制会话期间每个请求的默认行为。

通过用户提供给request函数的参数先构造一个Request对象,在经过一些处 理(参数的检查,合并一些session属性…),生成preparedRequest对象, 选取合适 的适配器(http or https)发送请求,返回Response.

9.1.1 Session.request定义是:

def request(self, method, url, params=None, data=None, headers=None, cookies=None, files=None, auth=None, timeout=None, allowredirects=True, proxies=None, hooks=None, stream=None, verify=None, cert=None, json=None):

"""Constructs a :class:`Request <Request>`, prepares it and sends it. Returns :class:`Response <Response>` object.

:param method: method for the new :class:`Request` object.

:param url: URL for the new :class:`Request` object.

:param params: (optional) Dictionary or bytes to be sent in the query string for the :class:`Request`.

:param data: (optional) Dictionary or bytes to send in the body of the

:class:`Request`.

:param json: (optional) json to send in the body of the

:class:`Request`.

:param headers: (optional) Dictionary of HTTP Headers to send with the

:class:`Request`.

:param cookies: (optional) Dict or CookieJar object to send with the

:class:`Request`.

:param files: (optional) Dictionary of ``'filename': file-like-objects`` for multipart encoding upload.

:param auth: (optional) Auth tuple or callable to enable Basic/Digest/Custom HTTP Auth.

:param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a (`connect timeout, read timeout <user/advanced.html#timeouts>`_) tuple.

:type timeout: float or tuple

:param allowredirects: (optional) Set to True by default.

:type allowredirects: bool

:param proxies: (optional) Dictionary mapping protocol to the URL of the proxy.

:param stream: (optional) whether to immediately download the response content. Defaults to ``False``.

:param verify: (optional) if ``True``, the SSL cert will be verified. A CABUNDLE path can also be provided.

:param cert: (optional) if String, path to ssl client cert file (.pem). If Tuple, ('cert', 'key') pair. """

9.2 requests 对代理的处理

直接将proxies参数传给adapter,由adapter使用urllib3.ProxyManager完 成后续工作

adapter.send(request, proxies) conn = adapter.getconnection(request.url, proxies) url = adapter.requesturl(request, proxies) resp = conn.urlopen(request.method, url)

两种情况 s = requests.session() s.proxies = {'http': '127.0.0.1:8000', 'https': '127.0.0.1:8000'} s.get('http://www.google.com/t') s.get('https://www.google.com/ts')

代理 import socket sock = socket.socket() sock.bind(('127.0.0.1',8000)) sock.listen(5) while 1: r,_ = sock.accept() print r.recv(65536) r.sendall('HTTP/1.0 200 OK\r\n\r\nHello World\r\n') r.close()

18832691616

Question: httplib.HTTPConnection 支持CONNECT tunnel urllib3.HTTPConnection urllib3.HTTPConnectionPool

9.3 urllib3.ProxyManager如何处理代理

>>> proxy = urllib3.ProxyManager('http://localhost:3128/') >>> r1 = proxy.request('GET', 'http://google.com/') >>> r2 = proxy.request('GET', 'http://httpbin.org/') >>> len(proxy.pools) 1 >>> r3 = proxy.request('GET', 'https://httpbin.org/') >>> r4 = proxy.request('GET', 'https://twitter.com/') >>> len(proxy.pools) 3

目的是http的,返回一个 HTTPConnectPool(proxy.host, proxy.port, connectionpoolkw) 并且发送请求时使用full URL

目的是https的,返回一个 HTTPSConnectPool(dest.host, dest.port, connectionpoolkw) connectpoolkw包含代理信息 VerifiedHTTPSConnect会调用tunnel使用代理 注意:httplib.HTTPSConnection中 self.tunnelhost 指的是目的服务器,而 self.host指的是代理服务器

def connectionfromhost(self, host, port=None, scheme='http'): if scheme == "https": return super(ProxyManager, self).connectionfromhost( host, port, scheme)

return super(ProxyManager, self).connectionfromhost( self.proxy.host, self.proxy.port, self.proxy.scheme)

9.4 requests 对HTTPS的处理

与HTTPS相关的参数有verify,cert

cert是客户端证书,不常用

verify的默认值是True,表示对服务器的证书进行认证,受信证书(CA)使用默认 值 DEFAULTCABUNDLEPATH(requests/cacert.pem)

verify的值为False,表示不对服务器证书进行认证,等于把ssl.wrapsocket的 参数certreqs设为ssl.CERTNONE

verify的值还可以是路径,表示对服务器的证书进行验证,并且受信证书(CA)使 用该路径指定的证书文件

VerifiedHTTPSConnection 对httplib.HTTPSConnection.connect方法进行了重写,使用ssl……

9.5 关于python的类变量和实例变量

类变量定义在所有方法之外,被所有实例对象所共享。但是一旦实例对象直 接对类变量修改或赋值,或者说实例对象创建了一个与类变量同名的实例变 量,这样就把类变量给覆盖了

class Dog:

kind = 'canine' # class variable shared by all instances

def _init_(self, name): self.name = name # instance variable unique to each instance

>>> d = Dog('Fido') >>> e = Dog('Buddy')

>>> print d.kind, e.kind canine canine

>>> e.kind = 'E' >>> Dog.kind = "wowo"

>>> print d.kind, e.kind wowo E

9.6 关于bound method 和 unbound method

在类中定义的函数是 unbound method,实例化对象后,这个方法成为这个对象 的bound method. 方法调用时,从类中搜索.

>>> class T: def f(self): print 'Hello World'

>>> t = T()

>>> T.f <unbound method T.f>

>>> t.f <bound method T.f of <_main__.T instance at 0x02780E18>>

>>> t.f() Hello World

>>> def f(self): print 'FUCK'

>>> T.f = f

>>> t.f() FUCK

将一个函数变为bound method,只需要调用函数对象的的_get_方法

>>> t.g = f._get_(t) >>> t.g() FUCK

Footnotes:

1

DEFINITION NOT FOUND.

2

DEFINITION NOT FOUND.

Created: 2015-09-08 周二 22:21

Emacs 24.4.1 (Org mode 8.2.10)

Validate