buffer=None, formats=None, shape=0, names=None, byteorder=sys.byteorder) |
array
is, for most practical purposes, all a user needs
to know to construct a record array.
formats
is a string containing the format information of all fields.
Each format can be the letter code, such as f4
or i2
,
or longer name like Float32
or Int16
. For a list of letter
codes or the longer names, see Table 4.1 or use
the letterCode()
function. A field of strings is specified by the
letter a
, followed by an integer giving the maximum length; thus
a5
is the format for a field of strings of (maximum) length of 5.
The formats are separated by commas, and each cell
(element in a field) can be a numarray itself, by attaching a number or a
tuple in front of the format specification. So if
formats='i4,Float64,a5,3i2,(2,3)f4,Complex64,b1'
, the record array
will have:
1st field: (4-byte) integers 2nd field: double precision floating point numbers 3rd field: strings of length 5 4th field: short (2-byte) integers, each element is an array of shape=(3,) 5th field: single precision floating point numbers, each element is an array of shape=(2,3) 6th field: double precision complex numbers 7th field: (1-byte) Booleans
formats
specification takes precedence over the data. For
example, if a field is specified as integers in buffer
, but is
specified as floats in formats
, it will be floats in the record
array. If a field in the buffer
is not convertible to the
corresponding data type in the formats
specification, e.g. from
strings to numbers (integers, floats, Booleans) or vice versa, an
exception will be raised.
shape
is the shape of the record array. It can be an integer,
in which case it is equivalent to the number of rows in a table.
It can also be a tuple where the record array is an N-D array with
Records
as its elements. shape
must be consistent with the
data in buffer
for buffer types (5) and (6), explained below.
names
is a string containing the names of the fields, separated by
commas. If there are more formats specified than names, then default
names will be used: If there are five fields specified in formats
but names=None
(default), then the field names will be:
c1, c2, c3, c4, c5
. If names="a,b"
, then the field
names will be: a, b, c3, c4, c5
.
If more names have been specified than there are formats, the extra names
will be discarded. If duplicate names are specified, a ValueError
will be raised. Field names are case sensitive, e.g. column ABC
will
not be found if it is referred to as abc
or Abc
(for example) when using the field()
method.
byteorder
is a string of the value big
or little
,
referring to big endian or little endian. This is useful when reading
(binary) data from a string or a file. If not specified, it will use the
sys.byteorder
value and the result will be platform dependent for
string or file input.
The first argument, buffer
, may be any one of the following:
(1) None
(default). The data block in the record array will not be
initialized. The user must assign valid data before trying to read the
contents or before writing the record array to a disk file.
(2) a Python string containing binary data. For example:
>>> r=rec.array('abcdefg'*100, formats='i2,a3,i4', shape=3, byteorder='big') >>> print r RecArray[ (24930, 'cde', 1718051170), (25444, 'efg', 1633837924), (25958, 'gab', 1667523942) ]
(3) a Python file object for an open file. The data will be copied from
the file, starting at the current position of the read pointer, with
byte order as specified in byteorder
.
(4) a record array. This results in a deep copy of the input record array;
any other arguments to array()
will be silently ignored.
(5) a list of numarrays. There must be one such numarray for each field.
The formats
and shape
arguments to array()
are not
required, but if they are specified, they need to be consistent with the
input arrays. The shapes of all the input numarrays also need to be
consistent to one another.
# this will have 3 rows, each cell in the 2nd field is an array of 4 elements # note that the formats sepcification needs to reflect the data shape >>> arr1=numarray.arange(3) >>> arr2=numarray.arange(12,shape=(3,4)) >>> r=rec.array([arr1, arr2],formats='i2,4f4')
In this example, arr2
is cast up to float.
(6) a list of sequences. Each sequence contains the number(s)/string(s) of a record. The example in the introduction uses such input, sometimes called longhand input. The data types are automatically determined after comparing all input data. Data of the same field will be cast to the highest type:
# the first field uses the highest data type: Float64 >>> r=rec.array([[1,'abc'],(3.5, 'xx')]); print r RecArray[ (1.0, 'abc'), (3.5, 'xx') ]
formats
argument:
# overrule the first field to short integers, second field to shorter strings >>> r=rec.array([[1,'abc'],(3.5, 'xx')],formats='i2,a1'); print r RecArray[ (1, 'a'), (3, 'x') ]
ValueError
:
>>> r=rec.array([[1,'abc'],('a', 'xx')]) ValueError: inconsistent data at row 1,field 0
A record array with multi-dimensional numarray cells in a field can also be constructed by using nested sequences:
>>> r=rec.array([[(11,12,13),'abc'],[(2,3,4), 'xx']]); print r RecArray[ (array([11, 12, 13]), 'abc'), (array([2, 3, 4]), 'xx') ]
) |
formats
argument in array()
.
Send comments to the NumArray community.