Python does not currently have an equivalent to scanf().
Regular expressions are generally more powerful, though also more
verbose, than scanf() format strings. The table below
offers some more-or-less equivalent mappings between
scanf() format tokens and regular expressions.
scanf() Token
Regular Expression
%c
.
%5c
.{5}
%d
[-+]?\d+
%e, %E, %f, %g
[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?
%i
[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)
%o
0[0-7]*
%s
\S+
%u
\d+
%x, %X
0[xX][\dA-Fa-f]+
To extract the filename and numbers from a string like
/usr/sbin/sendmail - 0 errors, 4 warnings
you would use a scanf() format like
%s - %d errors, %d warnings
The equivalent regular expression would be
(\S+) - (\d+) errors, (\d+) warnings
Avoiding recursion
If you create regular expressions that require the engine to perform a
lot of recursion, you may encounter a RuntimeError exception with
the message maximum recursion limit exceeded. For example,
>>> import re
>>> s = 'Begin ' + 1000*'a very long string ' + 'end'
>>> re.match('Begin (\w| )*? end', s).end()
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.5/re.py", line 132, in match
return _compile(pattern, flags).match(string)
RuntimeError: maximum recursion limit exceeded
You can often restructure your regular expression to avoid recursion.
Starting with Python 2.3, simple uses of the *? pattern are
special-cased to avoid recursion. Thus, the above regular expression
can avoid recursion by being recast as
Begin [a-zA-Z0-9_ ]*?end. As a further benefit, such regular
expressions will run faster than their recursive equivalents.