numarray Manual

12.2 Record array functions

array( buffer=None, formats=None, shape=0, names=None, byteorder=sys.byteorder)

The function array is, for most practical purposes, all a user needs to know to construct a record array.

formats is a string containing the format information of all fields. Each format can be the letter code, such as f4 or i2, or longer name like Float32 or Int16. For a list of letter codes or the longer names, see Table 4.1 or use the letterCode() function. A field of strings is specified by the letter a, followed by an integer giving the maximum length; thus a5 is the format for a field of strings of (maximum) length of 5.

The formats are separated by commas, and each cell (element in a field) can be a numarray itself, by attaching a number or a tuple in front of the format specification. So if formats='i4,Float64,a5,3i2,(2,3)f4,Complex64,b1', the record array will have:

   1st field: (4-byte) integers
   2nd field: double precision floating point numbers
   3rd field: strings of length 5
   4th field: short (2-byte) integers, each element is an array of shape=(3,)
   5th field: single precision floating point numbers, each element is an 
       array of shape=(2,3)
   6th field: double precision complex numbers
   7th field: (1-byte) Booleans

formats specification takes precedence over the data. For example, if a field is specified as integers in buffer, but is specified as floats in formats, it will be floats in the record array. If a field in the buffer is not convertible to the corresponding data type in the formats specification, e.g. from strings to numbers (integers, floats, Booleans) or vice versa, an exception will be raised.

shape is the shape of the record array. It can be an integer, in which case it is equivalent to the number of rows in a table. It can also be a tuple where the record array is an N-D array with Records as its elements. shape must be consistent with the data in buffer for buffer types (5) and (6), explained below.

names is a string containing the names of the fields, separated by commas. If there are more formats specified than names, then default names will be used: If there are five fields specified in formats but names=None (default), then the field names will be: c1, c2, c3, c4, c5. If names="a,b", then the field names will be: a, b, c3, c4, c5.

If more names have been specified than there are formats, the extra names will be discarded. If duplicate names are specified, a ValueError will be raised. Field names are case sensitive, e.g. column ABC will not be found if it is referred to as abc or Abc (for example) when using the field() method.

byteorder is a string of the value big or little, referring to big endian or little endian. This is useful when reading (binary) data from a string or a file. If not specified, it will use the sys.byteorder value and the result will be platform dependent for string or file input.

The first argument, buffer, may be any one of the following:

(1) None (default). The data block in the record array will not be initialized. The user must assign valid data before trying to read the contents or before writing the record array to a disk file.

(2) a Python string containing binary data. For example:

   >>> r=rec.array('abcdefg'*100, formats='i2,a3,i4', shape=3, byteorder='big')
   >>> print r
   RecArray[ 
   (24930, 'cde', 1718051170),
   (25444, 'efg', 1633837924),
   (25958, 'gab', 1667523942)
   ]

(3) a Python file object for an open file. The data will be copied from the file, starting at the current position of the read pointer, with byte order as specified in byteorder.

(4) a record array. This results in a deep copy of the input record array; any other arguments to array() will be silently ignored.

(5) a list of numarrays. There must be one such numarray for each field. The formats and shape arguments to array() are not required, but if they are specified, they need to be consistent with the input arrays. The shapes of all the input numarrays also need to be consistent to one another.

   # this will have 3 rows, each cell in the 2nd field is an array of 4 elements
   # note that the formats sepcification needs to reflect the data shape
   >>> arr1=numarray.arange(3)
   >>> arr2=numarray.arange(12,shape=(3,4))
   >>> r=rec.array([arr1, arr2],formats='i2,4f4')

In this example, arr2 is cast up to float.

(6) a list of sequences. Each sequence contains the number(s)/string(s) of a record. The example in the introduction uses such input, sometimes called longhand input. The data types are automatically determined after comparing all input data. Data of the same field will be cast to the highest type:

   # the first field uses the highest data type: Float64
   >>> r=rec.array([[1,'abc'],(3.5, 'xx')]); print r
   RecArray[ 
   (1.0, 'abc'),
   (3.5, 'xx')
   ]

unless overruled by the formats argument:

   # overrule the first field to short integers, second field to shorter strings
   >>> r=rec.array([[1,'abc'],(3.5, 'xx')],formats='i2,a1'); print r
   RecArray[ 
   (1, 'a'),
   (3, 'x')
   ]

Inconsistent data in the same field will cause a ValueError:

   >>> r=rec.array([[1,'abc'],('a', 'xx')])
   ValueError: inconsistent data at row 1,field 0

A record array with multi-dimensional numarray cells in a field can also be constructed by using nested sequences:

   >>> r=rec.array([[(11,12,13),'abc'],[(2,3,4), 'xx']]); print r
   RecArray[ 
   (array([11, 12, 13]), 'abc'),
   (array([2, 3, 4]), 'xx')
   ]

letterCode( ): This function will list the letter codes acceptable by the formats argument in array().

Send comments to the NumArray community.