The standard library received many enhancements and bug fixes in Python 2.5. Here's a partial list of the most notable changes, sorted alphabetically by module name. Consult the Misc/NEWS file in the source tree for a more complete list of changes, or look through the SVN logs for all the details.
The first argument to defaultdict's constructor is a factory function that gets called whenever a key is requested but not found. This factory function receives no arguments, so you can use built-in type constructors such as list() or int(). For example, you can make an index of words based on their initial letter like this:
words = """Nel mezzo del cammin di nostra vita mi ritrovai per una selva oscura che la diritta via era smarrita""".lower().split() index = defaultdict(list) for w in words: init_letter = w[0] index[init_letter].append(w)
Printing index
results in the following output:
defaultdict(<type 'list'>, {'c': ['cammin', 'che'], 'e': ['era'], 'd': ['del', 'di', 'diritta'], 'm': ['mezzo', 'mi'], 'l': ['la'], 'o': ['oscura'], 'n': ['nel', 'nostra'], 'p': ['per'], 's': ['selva', 'smarrita'], 'r': ['ritrovai'], 'u': ['una'], 'v': ['vita', 'via']}
(Contributed by Guido van Rossum.)
cProfile.run('main()')
to profile a function, can save profile
data to a file, etc. It's not yet known if the Hotshot profiler,
which is also written in C but doesn't match the profile
module's interface, will continue to be maintained in future versions
of Python. (Contributed by Armin Rigo.)
Also, the pstats module for analyzing the data measured by the profiler now supports directing the output to any file object by supplying a stream argument to the Stats constructor. (Contributed by Skip Montanaro.)
The CSV parser is now stricter about multi-line quoted fields. Previously, if a line ended within a quoted field without a terminating newline character, a newline would be inserted into the returned field. This behavior caused problems when reading files that contained carriage return characters within fields, so the code was changed to return the field without inserting newlines. As a consequence, if newlines embedded within fields are important, the input should be split into lines in a manner that preserves the newline characters.
(Contributed by Skip Montanaro and Andrew McNamara.)
from datetime import datetime ts = datetime.strptime('10:13:15 2006-03-07', '%H:%M:%S %Y-%m-%d')
SKIP
option that
keeps an example from being executed at all. This is intended for
code snippets that are usage examples intended for the reader and
aren't actually test cases.
An encoding parameter was added to the testfile() function and the DocFileSuite class to specify the file's encoding. This makes it easier to use non-ASCII characters in tests contained within a docstring. (Contributed by Bjorn Tillenius.)
"r"
was added to the
input() function to allow opening files in binary or
universal-newline mode. Another new parameter, openhook,
lets you use a function other than open()
to open the input files. Once you're iterating over
the set of files, the FileInput object's new
fileno() returns the file descriptor for the currently opened file.
(Contributed by Georg Brandl.)
key
keyword parameter similar to the one
provided by the min()/max() functions
and the sort() methods. For example:
>>> import heapq >>> L = ["short", 'medium', 'longest', 'longer still'] >>> heapq.nsmallest(2, L) # Return two lowest elements, lexicographically ['longer still', 'longest'] >>> heapq.nsmallest(2, L, key=len) # Return two shortest elements ['short', 'medium']
(Contributed by Raymond Hettinger.)
None
for the start and step arguments. This makes it more
compatible with the attributes of slice objects, so that you can now write
the following:
s = slice(5) # Create slice object itertools.islice(iterable, s.start, s.stop, s.step)
(Contributed by Raymond Hettinger.)
The format() function's val parameter could
previously be a string as long as no more than one %char specifier
appeared; now the parameter must be exactly one %char specifier with
no surrounding text. An optional monetary parameter was also
added which, if True
, will use the locale's rules for
formatting currency in placing a separator between groups of three
digits.
To format strings with multiple %char specifiers, use the new format_string() function that works like format() but also supports mixing %char specifiers with arbitrary text.
A new currency() function was also added that formats a number according to the current locale's settings.
(Contributed by Georg Brandl.)
import mailbox # 'factory=None' uses email.Message.Message as the class representing # individual messages. src = mailbox.Maildir('maildir', factory=None) dest = mailbox.mbox('/tmp/mbox') for msg in src: dest.add(msg)
(Contributed by Gregory K. Johnson. Funding was provided by Google's 2005 Summer of Code.)
operator.attrgetter('a', 'b')
will return a function
that retrieves the a and b attributes. Combining
this new feature with the sort() method's key
parameter
lets you easily sort lists using multiple fields.
(Contributed by Raymond Hettinger.)
Constants named os.SEEK_SET, os.SEEK_CUR, and os.SEEK_END have been added; these are the parameters to the os.lseek() function. Two new constants for locking are os.O_SHLOCK and os.O_EXLOCK.
Two new functions, wait3() and wait4(), were added. They're similar the waitpid() function which waits for a child process to exit and returns a tuple of the process ID and its exit status, but wait3() and wait4() return additional information. wait3() doesn't take a process ID as input, so it waits for any child process to exit and returns a 3-tuple of process-id, exit-status, resource-usage as returned from the resource.getrusage() function. wait4(pid) does take a process ID. (Contributed by Chad J. Schroeder.)
On FreeBSD, the os.stat() function now returns times with nanosecond resolution, and the returned object now has st_gen and st_birthtime. The st_flags member is also available, if the platform supports it. (Contributed by Antti Louko and Diego Pettenò.)
None
from the
__reduce__() method; the method must return a tuple of
arguments instead. The ability to return None
was deprecated
in Python 2.4, so this completes the removal of the feature.
sys.path
, so unless your programs explicitly added the directory to
sys.path
, this removal shouldn't affect your code.
'/'
and '/RPC2'
. Setting
rpc_paths to None
or an empty tuple disables
this path checking.
(pid, group_mask)
.
Two new methods on socket objects, recv_buf(buffer) and recvfrom_buf(buffer), store the received data in an object that supports the buffer protocol instead of returning the data as a string. This means you can put the data directly into an array or a memory-mapped file.
Socket objects also gained getfamily(), gettype(), and getproto() accessor methods to retrieve the family, type, and protocol values for the socket.
s = struct.Struct('ih3s') data = s.pack(1972, 187, 'abc') year, number, name = s.unpack(data)
You can also pack and unpack data to and from buffer objects directly using the pack_into(buffer, offset, v1, v2, ...) and unpack_from(buffer, offset) methods. This lets you store data directly into an array or a memory-mapped file.
(Struct objects were implemented by Bob Ippolito at the NeedForSpeed sprint. Support for buffer objects was added by Martin Blais, also at the NeedForSpeed sprint.)
sys.subversion
variable, a 3-tuple of
(interpreter-name, branch-name,
revision-range)
. For example, at the time of writing my copy
of 2.5 was reporting ('CPython', 'trunk', '45313:45315')
.
This information is also available to C extensions via the
Py_GetBuildInfo() function that returns a
string of build information like this:
"trunk:45355:45356M, Apr 13 2006, 07:42:19"
.
(Contributed by Barry Warsaw.)
The compression used for a tarfile opened in stream mode can now be
autodetected using the mode 'r|*'
.
(Contributed by Lars Gustäbel.)
>>> import uuid >>> # make a UUID based on the host ID and current time >>> uuid.uuid1() UUID('a8098c1a-f86e-11da-bd1a-00112444be1e') >>> # make a UUID using an MD5 hash of a namespace UUID and a name >>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org') UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e') >>> # make a random UUID >>> uuid.uuid4() UUID('16fd2706-8baf-433b-82eb-8c7fada847da') >>> # make a UUID using a SHA-1 hash of a namespace UUID and a name >>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org') UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d')
(Contributed by Ka-Ping Yee.)
python -m webbrowser
, taking a
URL as the argument; there are a number of switches
to control the behaviour (-n for a new browser window,
-t for a new tab). New module-level functions,
open_new() and open_new_tab(), were added
to support this. The module's open() function supports an
additional feature, an autoraise parameter that signals whether
to raise the open window when possible. A number of additional
browsers were added to the supported list such as Firefox, Opera,
Konqueror, and elinks. (Contributed by Oleg Broytmann and Georg
Brandl.)
use_datetime=True
to the loads() function
or the Unmarshaller class to enable this feature.
(Contributed by Skip Montanaro.)
The ctypes package, written by Thomas Heller, has been added to the standard library. ctypes lets you call arbitrary functions in shared libraries or DLLs. Long-time users may remember the dl module, which provides functions for loading shared libraries and calling functions in them. The ctypes package is much fancier.
To load a shared library or DLL, you must create an instance of the CDLL class and provide the name or path of the shared library or DLL. Once that's done, you can call arbitrary functions by accessing them as attributes of the CDLL object.
import ctypes libc = ctypes.CDLL('libc.so.6') result = libc.printf("Line of output\n")
Type constructors for the various C types are provided: c_int, c_float, c_double, c_char_p (equivalent to char *), and so forth. Unlike Python's types, the C versions are all mutable; you can assign to their value attribute to change the wrapped value. Python integers and strings will be automatically converted to the corresponding C types, but for other types you must call the correct type constructor. (And I mean must; getting it wrong will often result in the interpreter crashing with a segmentation fault.)
You shouldn't use c_char_p with a Python string when the C function will be modifying the memory area, because Python strings are supposed to be immutable; breaking this rule will cause puzzling bugs. When you need a modifiable memory area, use create_string_buffer():
s = "this is a string" buf = ctypes.create_string_buffer(s) libc.strfry(buf)
C functions are assumed to return integers, but you can set the restype attribute of the function object to change this:
>>> libc.atof('2.71828') -1783957616 >>> libc.atof.restype = ctypes.c_double >>> libc.atof('2.71828') 2.71828
ctypes also provides a wrapper for Python's C API
as the ctypes.pythonapi
object. This object does not
release the global interpreter lock before calling a function, because the lock must be held when calling into the interpreter's code.
There's a py_object() type constructor that will create a
PyObject * pointer. A simple usage:
import ctypes d = {} ctypes.pythonapi.PyObject_SetItem(ctypes.py_object(d), ctypes.py_object("abc"), ctypes.py_object(1)) # d is now {'abc', 1}.
Don't forget to use py_object(); if it's omitted you end up with a segmentation fault.
ctypes has been around for a while, but people still write and distribution hand-coded extension modules because you can't rely on ctypes being present. Perhaps developers will begin to write Python wrappers atop a library accessed through ctypes instead of extension modules, now that ctypes is included with core Python.
See Also:
A subset of Fredrik Lundh's ElementTree library for processing XML has been added to the standard library as xml.etree. The available modules are ElementTree, ElementPath, and ElementInclude from ElementTree 1.2.6. The cElementTree accelerator module is also included.
The rest of this section will provide a brief overview of using ElementTree. Full documentation for ElementTree is available at http://effbot.org/zone/element-index.htm.
ElementTree represents an XML document as a tree of element nodes. The text content of the document is stored as the .text and .tail attributes of (This is one of the major differences between ElementTree and the Document Object Model; in the DOM there are many different types of node, including TextNode.)
The most commonly used parsing function is parse(), that takes either a string (assumed to contain a filename) or a file-like object and returns an ElementTree instance:
from xml.etree import ElementTree as ET tree = ET.parse('ex-1.xml') feed = urllib.urlopen( 'http://planet.python.org/rss10.xml') tree = ET.parse(feed)
Once you have an ElementTree instance, you can call its getroot() method to get the root Element node.
There's also an XML() function that takes a string literal and returns an Element node (not an ElementTree). This function provides a tidy way to incorporate XML fragments, approaching the convenience of an XML literal:
svg = ET.XML("""<svg width="10px" version="1.0"> </svg>""") svg.set('height', '320px') svg.append(elem1)
Each XML element supports some dictionary-like and some list-like access methods. Dictionary-like operations are used to access attribute values, and list-like operations are used to access child nodes.
Operation | Result |
---|---|
elem[n] |
Returns n'th child element. |
elem[m:n] |
Returns list of m'th through n'th child elements. |
len(elem) |
Returns number of child elements. |
list(elem) |
Returns list of child elements. |
elem.append(elem2) |
Adds elem2 as a child. |
elem.insert(index, elem2) |
Inserts elem2 at the specified location. |
del elem[n] |
Deletes n'th child element. |
elem.keys() |
Returns list of attribute names. |
elem.get(name) |
Returns value of attribute name. |
elem.set(name, value) |
Sets new value for attribute name. |
elem.attrib |
Retrieves the dictionary containing attributes. |
del elem.attrib[name] |
Deletes attribute name. |
Comments and processing instructions are also represented as Element nodes. To check if a node is a comment or processing instructions:
if elem.tag is ET.Comment: ... elif elem.tag is ET.ProcessingInstruction: ...
To generate XML output, you should call the ElementTree.write() method. Like parse(), it can take either a string or a file-like object:
# Encoding is US-ASCII tree.write('output.xml') # Encoding is UTF-8 f = open('output.xml', 'w') tree.write(f, encoding='utf-8')
(Caution: the default encoding used for output is ASCII. For general XML work, where an element's name may contain arbitrary Unicode characters, ASCII isn't a very useful encoding because it will raise an exception if an element's name contains any characters with values greater than 127. Therefore, it's best to specify a different encoding such as UTF-8 that can handle any Unicode character.)
This section is only a partial description of the ElementTree interfaces. Please read the package's official documentation for more details.
A new hashlib module, written by Gregory P. Smith, has been added to replace the md5 and sha modules. hashlib adds support for additional secure hashes (SHA-224, SHA-256, SHA-384, and SHA-512). When available, the module uses OpenSSL for fast platform optimized implementations of algorithms.
The old md5 and sha modules still exist as wrappers around hashlib to preserve backwards compatibility. The new module's interface is very close to that of the old modules, but not identical. The most significant difference is that the constructor functions for creating new hashing objects are named differently.
# Old versions h = md5.md5() h = md5.new() # New version h = hashlib.md5() # Old versions h = sha.sha() h = sha.new() # New version h = hashlib.sha1() # Hash that weren't previously available h = hashlib.sha224() h = hashlib.sha256() h = hashlib.sha384() h = hashlib.sha512() # Alternative form h = hashlib.new('md5') # Provide algorithm as a string
Once a hash object has been created, its methods are the same as before: update(string) hashes the specified string into the current digest state, digest() and hexdigest() return the digest value as a binary string or a string of hex digits, and copy() returns a new hashing object with the same digest state.
The pysqlite module (http://www.pysqlite.org), a wrapper for the SQLite embedded database, has been added to the standard library under the package name sqlite3.
SQLite is a C library that provides a SQL-language database that stores data in disk files without requiring a separate server process. pysqlite was written by Gerhard Häring and provides a SQL interface compliant with the DB-API 2.0 specification described by PEP 249. This means that it should be possible to write the first version of your applications using SQLite for data storage. If switching to a larger database such as PostgreSQL or Oracle is later necessary, the switch should be relatively easy.
If you're compiling the Python source yourself, note that the source tree doesn't include the SQLite code, only the wrapper module. You'll need to have the SQLite libraries and headers installed before compiling Python, and the build process will compile the module when the necessary headers are available.
To use the module, you must first create a Connection object that represents the database. Here the data will be stored in the /tmp/example file:
conn = sqlite3.connect('/tmp/example')
You can also supply the special name ":memory:" to create a database in RAM.
Once you have a Connection, you can create a Cursor object and call its execute() method to perform SQL commands:
c = conn.cursor() # Create table c.execute('''create table stocks (date timestamp, trans varchar, symbol varchar, qty decimal, price decimal)''') # Insert a row of data c.execute("""insert into stocks values ('2006-01-05','BUY','RHAT',100,35.14)""")
Usually your SQL operations will need to use values from Python variables. You shouldn't assemble your query using Python's string operations because doing so is insecure; it makes your program vulnerable to an SQL injection attack.
Instead, use the DB-API's parameter substitution. Put "?" as a placeholder wherever you want to use a value, and then provide a tuple of values as the second argument to the cursor's execute() method. (Other database modules may use a different placeholder, such as "%s" or ":1".) For example:
# Never do this -- insecure! symbol = 'IBM' c.execute("... where symbol = '%s'" % symbol) # Do this instead t = (symbol,) c.execute('select * from stocks where symbol=?', t) # Larger example for t in (('2006-03-28', 'BUY', 'IBM', 1000, 45.00), ('2006-04-05', 'BUY', 'MSOFT', 1000, 72.00), ('2006-04-06', 'SELL', 'IBM', 500, 53.00), ): c.execute('insert into stocks values (?,?,?,?,?)', t)
To retrieve data after executing a SELECT statement, you can either treat the cursor as an iterator, call the cursor's fetchone() method to retrieve a single matching row, or call fetchall() to get a list of the matching rows.
This example uses the iterator form:
>>> c = conn.cursor() >>> c.execute('select * from stocks order by price') >>> for row in c: ... print row ... (u'2006-01-05', u'BUY', u'RHAT', 100, 35.140000000000001) (u'2006-03-28', u'BUY', u'IBM', 1000, 45.0) (u'2006-04-06', u'SELL', u'IBM', 500, 53.0) (u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0) >>>
For more information about the SQL dialect supported by SQLite, see http://www.sqlite.org.
See Also:
The Web Server Gateway Interface (WSGI) v1.0 defines a standard interface between web servers and Python web applications and is described in PEP 333. The wsgiref package is a reference implementation of the WSGI specification.
The package includes a basic HTTP server that will run a WSGI application; this server is useful for debugging but isn't intended for production use. Setting up a server takes only a few lines of code:
from wsgiref import simple_server wsgi_app = ... host = '' port = 8000 httpd = simple_server.make_server(host, port, wsgi_app) httpd.serve_forever()
See Also:
See About this document... for information on suggesting changes.