The simplest example of reading a CSV file:
import csv reader = csv.reader(open("some.csv", "rb")) for row in reader: print row
Reading a file with an alternate format:
import csv reader = csv.reader(open("passwd", "rb"), delimiter=':', quoting=csv.QUOTE_NONE) for row in reader: print row
The corresponding simplest possible writing example is:
import csv writer = csv.writer(open("some.csv", "wb")) writer.writerows(someiterable)
Registering a new dialect:
import csv csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) reader = csv.reader(open("passwd", "rb"), 'unixpwd')
A slightly more advanced use of the reader -- catching and reporting errors:
import csv, sys filename = "some.csv" reader = csv.reader(open(filename, "rb")) try: for row in reader: print row except csv.Error, e: sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e))
And while the module doesn't directly support parsing strings, it can easily be done:
import csv for row in csv.reader(['one,two,three']): print row
The csv module doesn't directly support reading and writing Unicode, but it is 8-bit-clean save for some problems with ASCII NUL characters. So you can write functions or classes that handle the encoding and decoding for you as long as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.
unicode_csv_reader below is a generator that wraps csv.reader to handle Unicode CSV data (a list of Unicode strings). utf_8_encoder is a generator that encodes the Unicode strings as UTF-8, one string (or row) at a time. The encoded strings are parsed by the CSV reader, and unicode_csv_reader decodes the UTF-8-encoded cells back into Unicode:
import csv def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs): # csv.py doesn't do Unicode; encode temporarily as UTF-8: csv_reader = csv.reader(utf_8_encoder(unicode_csv_data), dialect=dialect, **kwargs) for row in csv_reader: # decode UTF-8 back to Unicode, cell by cell: yield [unicode(cell, 'utf-8') for cell in row] def utf_8_encoder(unicode_csv_data): for line in unicode_csv_data: yield line.encode('utf-8')
For all other encodings the following UnicodeReader and UnicodeWriter classes can be used. They take an additional encoding parameter in their constructor and make sure that the data passes the real reader or writer encoded as UTF-8:
import csv, codecs, cStringIO class UTF8Recoder: """ Iterator that reads an encoded stream and reencodes the input to UTF-8 """ def __init__(self, f, encoding): self.reader = codecs.getreader(encoding)(f) def __iter__(self): return self def next(self): return self.reader.next().encode("utf-8") class UnicodeReader: """ A CSV reader which will iterate over lines in the CSV file "f", which is encoded in the given encoding. """ def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): f = UTF8Recoder(f, encoding) self.reader = csv.reader(f, dialect=dialect, **kwds) def next(self): row = self.reader.next() return [unicode(s, "utf-8") for s in row] def __iter__(self): return self class UnicodeWriter: """ A CSV writer which will write rows to CSV file "f", which is encoded in the given encoding. """ def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds): # Redirect output to a queue self.queue = cStringIO.StringIO() self.writer = csv.writer(self.queue, dialect=dialect, **kwds) self.stream = f self.encoder = codecs.getincrementalencoder(encoding)() def writerow(self, row): self.writer.writerow([s.encode("utf-8") for s in row]) # Fetch UTF-8 output from the queue ... data = self.queue.getvalue() data = data.decode("utf-8") # ... and reencode it into the target encoding data = self.encoder.encode(data) # write to the target stream self.stream.write(data) # empty queue self.queue.truncate(0) def writerows(self, rows): for row in rows: self.writerow(row)