|You are here: Home > Dive Into Python > Test-First Programming > roman.py, stage 5||<< >>|
Python from novice to pro
Now that fromRoman works properly with good input, it's time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it's a valid Roman numeral. This is inherently more difficult than validating numeric input in toRoman, but you have a powerful tool at your disposal: regular expressions.
If you're not familiar with regular expressions and didn't read Chapter 7, Regular Expressions, now would be a good time.
As you saw in Section 7.3, “Case Study: Roman Numerals”, there are several simple rules for constructing a Roman numeral, using the letters M, D, C, L, X, V, and I. Let's review the rules:
This file is available in py/roman/stage5/ in the examples directory.
If you have not already done so, you can download this and other examples used in this book.
"""Convert to and from Roman numerals""" import re #Define exceptions class RomanError(Exception): pass class OutOfRangeError(RomanError): pass class NotIntegerError(RomanError): pass class InvalidRomanNumeralError(RomanError): pass #Define digit mapping romanNumeralMap = (('M', 1000), ('CM', 900), ('D', 500), ('CD', 400), ('C', 100), ('XC', 90), ('L', 50), ('XL', 40), ('X', 10), ('IX', 9), ('V', 5), ('IV', 4), ('I', 1)) def toRoman(n): """convert integer to Roman numeral""" if not (0 < n < 4000): raise OutOfRangeError, "number out of range (must be 1..3999)" if int(n) <> n: raise NotIntegerError, "non-integers can not be converted" result = "" for numeral, integer in romanNumeralMap: while n >= integer: result += numeral n -= integer return result #Define pattern to detect valid Roman numerals romanNumeralPattern = '^M?M?M?(CM|CD|D?C?C?C?)(XC|XL|L?X?X?X?)(IX|IV|V?I?I?I?)$' def fromRoman(s): """convert Roman numeral to integer""" if not re.search(romanNumeralPattern, s): raise InvalidRomanNumeralError, 'Invalid Roman numeral: %s' % s result = 0 index = 0 for numeral, integer in romanNumeralMap: while s[index:index+len(numeral)] == numeral: result += integer index += len(numeral) return result
|This is just a continuation of the pattern you discussed in Section 7.3, “Case Study: Roman Numerals”. The tens places is either XC (90), XL (40), or an optional L followed by 0 to 3 optional X characters. The ones place is either IX (9), IV (4), or an optional V followed by 0 to 3 optional I characters.|
|Having encoded all that logic into a regular expression, the code to check for invalid Roman numerals becomes trivial. If re.search returns an object, then the regular expression matched and the input is valid; otherwise, the input is invalid.|
At this point, you are allowed to be skeptical that that big ugly regular expression could possibly catch all the types of invalid Roman numerals. But don't take my word for it, look at the results:
fromRoman should only accept uppercase input ... ok toRoman should always return uppercase ... ok fromRoman should fail with malformed antecedents ... ok fromRoman should fail with repeated pairs of numerals ... ok fromRoman should fail with too many repeated numerals ... ok fromRoman should give known result with known input ... ok toRoman should give known result with known input ... ok fromRoman(toRoman(n))==n for all n ... ok toRoman should fail with non-integer input ... ok toRoman should fail with negative input ... ok toRoman should fail with large input ... ok toRoman should fail with 0 input ... ok ---------------------------------------------------------------------- Ran 12 tests in 2.864s OK
|One thing I didn't mention about regular expressions is that, by default, they are case-sensitive. Since the regular expression romanNumeralPattern was expressed in uppercase characters, the re.search check will reject any input that isn't completely uppercase. So the uppercase input test passes.|
|More importantly, the bad input tests pass. For instance, the malformed antecedents test checks cases like MCMC. As you've seen, this does not match the regular expression, so fromRoman raises an InvalidRomanNumeralError exception, which is what the malformed antecedents test case is looking for, so the test passes.|
|In fact, all the bad input tests pass. This regular expression catches everything you could think of when you made your test cases.|
|And the anticlimax award of the year goes to the word “OK”, which is printed by the unittest module when all the tests pass.|
|When all of your tests pass, stop coding.|
<< roman.py, stage 4
| 1 | 2 | 3 | 4 | 5 |