You are here: Home > Dive Into Python > HTTP Web Services > Setting the User-Agent | << >> | ||||
Dive Into PythonPython from novice to pro |
The first step to improving your HTTP web services client is to identify yourself properly with a User-Agent. To do that, you need to move beyond the basic urllib and dive into urllib2.
>>> import httplib >>> httplib.HTTPConnection.debuglevel = 1 >>> import urllib2 >>> request = urllib2.Request('http://diveintomark.org/xml/atom.xml') >>> opener = urllib2.build_opener() >>> feeddata = opener.open(request).read() connect: (diveintomark.org, 80) send: ' GET /xml/atom.xml HTTP/1.0 Host: diveintomark.org User-agent: Python-urllib/2.1 ' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Wed, 14 Apr 2004 23:23:12 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Content-Type: application/atom+xml header: Last-Modified: Wed, 14 Apr 2004 22:14:38 GMT header: ETag: "e8284-68e0-4de30f80" header: Accept-Ranges: bytes header: Content-Length: 26848 header: Connection: close
If you still have your Python IDE open from the previous section's example, you can skip this, but this turns on HTTP debugging so you can see what you're actually sending over the wire, and what gets sent back. | |
Fetching an HTTP resource with urllib2 is a three-step process, for good reasons that will become clear shortly. The first step is to create a Request object, which takes the URL of the resource you'll eventually get around to retrieving. Note that this step doesn't actually retrieve anything yet. | |
The second step is to build a URL opener. This can take any number of handlers, which control how responses are handled. But you can also build an opener without any custom handlers, which is what you're doing here. You'll see how to define and use custom handlers later in this chapter when you explore redirects. | |
The final step is to tell the opener to open the URL, using the Request object you created. As you can see from all the debugging information that gets printed, this step actually retrieves the resource and stores the returned data in feeddata. |
>>> request <urllib2.Request instance at 0x00250AA8> >>> request.get_full_url() http://diveintomark.org/xml/atom.xml >>> request.add_header('User-Agent', ... 'OpenAnything/1.0 +http://diveintopython.org/') >>> feeddata = opener.open(request).read() connect: (diveintomark.org, 80) send: ' GET /xml/atom.xml HTTP/1.0 Host: diveintomark.org User-agent: OpenAnything/1.0 +http://diveintopython.org/ ' reply: 'HTTP/1.1 200 OK\r\n' header: Date: Wed, 14 Apr 2004 23:45:17 GMT header: Server: Apache/2.0.49 (Debian GNU/Linux) header: Content-Type: application/atom+xml header: Last-Modified: Wed, 14 Apr 2004 22:14:38 GMT header: ETag: "e8284-68e0-4de30f80" header: Accept-Ranges: bytes header: Content-Length: 26848 header: Connection: close
<< Debugging HTTP web services |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
Handling Last-Modified and ETag >> |