The urllib2 module defines functions and classes which help
in opening URLs (mostly HTTP) in a complex world -- basic and digest
authentication, redirections, cookies and more.
The urllib2 module defines the following functions:
urlopen(
url[, data])
Open the URL url, which can be either a string or a Request
object.
data may be a string specifying additional data to send to the
server, or None if no such data is needed.
Currently HTTP requests are the only ones that use data;
the HTTP request will be a POST instead of a GET when the data
parameter is provided. data should be a buffer in the standard
application/x-www-form-urlencoded format. The
urllib.urlencode() function takes a mapping or sequence of
2-tuples and returns a string in this format.
This function returns a file-like object with two additional methods:
geturl() -- return the URL of the resource retrieved
info() -- return the meta-information of the page, as
a dictionary-like object
Raises URLError on errors.
Note that None may be returned if no handler handles the
request (though the default installed global OpenerDirector
uses UnknownHandler to ensure this never happens).
install_opener(
opener)
Install an OpenerDirector instance as the default global
opener. Installing an opener is only necessary if you want urlopen to
use that opener; otherwise, simply call OpenerDirector.open()
instead of urlopen(). The code does not check for a real
OpenerDirector, and any class with the appropriate interface
will work.
build_opener(
[handler, ...])
Return an OpenerDirector instance, which chains the
handlers in the order given. handlers can be either instances
of BaseHandler, or subclasses of BaseHandler (in
which case it must be possible to call the constructor without
any parameters). Instances of the following classes will be in
front of the handlers, unless the handlers contain
them, instances of them or subclasses of them:
ProxyHandler, UnknownHandler, HTTPHandler,
HTTPDefaultErrorHandler, HTTPRedirectHandler,
FTPHandler, FileHandler, HTTPErrorProcessor.
If the Python installation has SSL support (socket.ssl()
exists), HTTPSHandler will also be added.
Beginning in Python 2.3, a BaseHandler subclass may also
change its handler_order member variable to modify its
position in the handlers list.
The following exceptions are raised as appropriate:
exceptionURLError
The handlers raise this exception (or derived exceptions) when they
run into a problem. It is a subclass of IOError.
exceptionHTTPError
A subclass of URLError, it can also function as a
non-exceptional file-like return value (the same thing that
urlopen() returns). This is useful when handling exotic
HTTP errors, such as requests for authentication.
exceptionGopherError
A subclass of URLError, this is the error raised by the
Gopher handler.
data may be a string specifying additional data to send to the
server, or None if no such data is needed.
Currently HTTP requests are the only ones that use data;
the HTTP request will be a POST instead of a GET when the data
parameter is provided. data should be a buffer in the standard
application/x-www-form-urlencoded format. The
urllib.urlencode() function takes a mapping or sequence of
2-tuples and returns a string in this format.
headers should be a dictionary, and will be treated as if
add_header() was called with each key and value as arguments.
The final two arguments are only of interest for correct handling of
third-party HTTP cookies:
origin_req_host should be the request-host of the origin
transaction, as defined by RFC 2965. It defaults to
cookielib.request_host(self). This is the host name or IP
address of the original request that was initiated by the user. For
example, if the request is for an image in an HTML document, this
should be the request-host of the request for the page containing the
image.
unverifiable should indicate whether the request is
unverifiable, as defined by RFC 2965. It defaults to False. An
unverifiable request is one whose URL the user did not have the option
to approve. For example, if the request is for an image in an HTML
document, and the user had no option to approve the automatic fetching
of the image, this should be true.
classOpenerDirector(
)
The OpenerDirector class opens URLs via BaseHandlers
chained together. It manages the chaining of handlers, and recovery
from errors.
classBaseHandler(
)
This is the base class for all registered handlers -- and handles only
the simple mechanics of registration.
classHTTPDefaultErrorHandler(
)
A class which defines a default handler for HTTP error responses; all
responses are turned into HTTPError exceptions.
classHTTPRedirectHandler(
)
A class to handle redirections.
classHTTPCookieProcessor(
[cookiejar])
A class to handle HTTP Cookies.
classProxyHandler(
[proxies])
Cause requests to go through a proxy.
If proxies is given, it must be a dictionary mapping
protocol names to URLs of proxies.
The default is to read the list of proxies from the environment
variables <protocol>_proxy.
classHTTPPasswordMgr(
)
Keep a database of
(realm, uri) -> (user, password)
mappings.
classHTTPPasswordMgrWithDefaultRealm(
)
Keep a database of
(realm, uri) -> (user, password) mappings.
A realm of None is considered a catch-all realm, which is searched
if no other realm fits.
classAbstractBasicAuthHandler(
[password_mgr])
This is a mixin class that helps with HTTP authentication, both
to the remote host and to a proxy.
password_mgr, if given, should be something that is compatible
with HTTPPasswordMgr; refer to section 18.6.7
for information on the interface that must be supported.
classHTTPBasicAuthHandler(
[password_mgr])
Handle authentication with the remote host.
password_mgr, if given, should be something that is compatible
with HTTPPasswordMgr; refer to section 18.6.7
for information on the interface that must be supported.
classProxyBasicAuthHandler(
[password_mgr])
Handle authentication with the proxy.
password_mgr, if given, should be something that is compatible
with HTTPPasswordMgr; refer to section 18.6.7
for information on the interface that must be supported.
classAbstractDigestAuthHandler(
[password_mgr])
This is a mixin class that helps with HTTP authentication, both
to the remote host and to a proxy.
password_mgr, if given, should be something that is compatible
with HTTPPasswordMgr; refer to section 18.6.7
for information on the interface that must be supported.
classHTTPDigestAuthHandler(
[password_mgr])
Handle authentication with the remote host.
password_mgr, if given, should be something that is compatible
with HTTPPasswordMgr; refer to section 18.6.7
for information on the interface that must be supported.
classProxyDigestAuthHandler(
[password_mgr])
Handle authentication with the proxy.
password_mgr, if given, should be something that is compatible
with HTTPPasswordMgr; refer to section 18.6.7
for information on the interface that must be supported.
classHTTPHandler(
)
A class to handle opening of HTTP URLs.
classHTTPSHandler(
)
A class to handle opening of HTTPS URLs.
classFileHandler(
)
Open local files.
classFTPHandler(
)
Open FTP URLs.
classCacheFTPHandler(
)
Open FTP URLs, keeping a cache of open FTP connections to minimize
delays.