Python does not currently have an equivalent to scanf(). Regular expressions are generally more powerful, though also more verbose, than scanf() format strings. The table below offers some more-or-less equivalent mappings between scanf() format tokens and regular expressions.
scanf() Token | Regular Expression |
---|---|
%c |
. |
%5c |
.{5} |
%d |
[-+]?\d+ |
%e , %E , %f , %g |
[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)? |
%i |
[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+) |
%o |
0[0-7]* |
%s |
\S+ |
%u |
\d+ |
%x , %X |
0[xX][\dA-Fa-f]+ |
To extract the filename and numbers from a string like
/usr/sbin/sendmail - 0 errors, 4 warnings
you would use a scanf() format like
%s - %d errors, %d warnings
The equivalent regular expression would be
(\S+) - (\d+) errors, (\d+) warnings
If you create regular expressions that require the engine to perform a
lot of recursion, you may encounter a RuntimeError exception with
the message maximum recursion limit
exceeded. For example,
>>> import re >>> s = 'Begin ' + 1000*'a very long string ' + 'end' >>> re.match('Begin (\w| )*? end', s).end() Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.5/re.py", line 132, in match return _compile(pattern, flags).match(string) RuntimeError: maximum recursion limit exceeded
You can often restructure your regular expression to avoid recursion.
Starting with Python 2.3, simple uses of the *? pattern are special-cased to avoid recursion. Thus, the above regular expression can avoid recursion by being recast as Begin [a-zA-Z0-9_ ]*?end. As a further benefit, such regular expressions will run faster than their recursive equivalents.
See About this document... for information on suggesting changes.