." .hy 0
." .na
.ce
About Arrays in Postgres v4.2
.pp
The purpose of this note is to give a short overview of the
way arrays are implemented, and hopefully to yield enough
insight to writers of backend user-define adt code to be able
to use direct access to Postgres Multidimensional Arrays.
.pp
Arrays in Postgres are implemented in a few compatible ways.
If the total size of the array is small enough to be fit
on single page, the an array attribute can be stored like
any other attribute.
However, many arrays useful for scientific computation
exceed this limit, so array attributes may be stored in
Postgres large objects.
.pp
Another enhancement is the facility for rearranging
the way the data is stored to maximize access performance,
identifying subsections or dimension frequently accessed together.
This is called "Chunking", and is the subject of Sunita Sarawagi's
dissertation research, and is discussed in much greater detail
elsewhere.
.pp
All implementations share a common header, which is defined
in src/backend/utils/adt/array.h.
There are some utility macros defined in this file, for
convenient access.
Code implementing most array access is found in arrayfuncs.c and
arrayutils.c, and the chunking code is in chunk.c (also in
the utils/adt directory).
.pp
The header is a Postgres variable
length object whose initial segment is described by the following
C structure (remember, variable arrays don't work in C):
.ip
.nf
typedef struct {
	int	size;
	int	ndim;		/* # of dimensions */
	int	flags;		/* implementation flags */
#ifdef for_show_only
/*	int	dims[ndim];	/* size of each dimension */
/*	int	lbounds[ndim];	/* starting value for each dim */
#endif
} ArrayType;
.fi
.pp
Like any other VARLENA structure in postgres, the 4\-byte size
field is the total number of bytes to store whatever portion of
the array is kept directly as a postgres attribute in the classes
themselves.  When the array references a large object, the size
field does not include the number of bytes contained within
the large object.
The flags field encodes:
.ip \(bu
Whether the array data is stored externally in a large object
.ip \(bu
The type of large object, if so.
.ip \(bu
Whether the array is chunked.
.pp
There is no indication in the header of the underlying type
that the array is built on.  This must be obtained through
the attribute structures.
.pp
The common header is followed by a vector of 4-byte integers
giving the size of each dimension, followed by a vector of 4-byte
integers giving the lower bounds for each dimension.
Unlike C, the default lower bound is 1.
.pp
In the case where the data is stored in a large object,
the vectors of information are followed by a string name
giving the Postgres path name of the large object.
Otherwise the data directly follows the object.
.pp
One dimensional arrays of variable size objects are stored
sequentially.  Thus, accessing individual elements of an
array of text may be time consuming!
For fixed size data, the conventions are the same as C\-language
arrays, with the last index cycling the fastest.
.uh "An Example"
.pp
If you wanted to compute the total number of elements in an
array you would write the following C code:
.nf


#include "tmp/c.h"
#include "utils/log.h"
#include "utils/palloc.h"
#include "utils/adt/array.h"

int
ArEx(arg)
	ArrayType *arg;
{
	int ndims, *dimp, total;

	ndims = ARR_NDIM(arg);
	dimp = ARR_DIMS(arg);

	for (total = *dimp++; --ndims > 0; )
		total *= dimp++;

	return (total);
}