.sh 1 "POSTQUEL" 2
.pp
The next two subsections will motivate the different issues that the 
optimizer must consider through several examples.
Then, it will indicate how these new features affect the optimizer. 
.sh 2 "An Extendible Type System"
.pp
One of the features that POSTGRES supports are abstract data types
(ADTs).
An ADT facility allow users to define their own data types,
simplifying representation of complex information.
For example, a user who must store box coordinates in his database
can define a data type called \*(lqboxcoordinates.\*(rq
From here, he can define a relation
BOX with a coordinates field of type \*(lqboxcoordinates.\*(rq
The unit square box shown in figure 2.1, 
therefore, would be represented as shown in figure 2.2.
This is in contrast to figure 2.3.
.(z
.hl
.GS
file figure2.1
.GE
.sp
.ce 2
\fBFigure 2.1\fR
.sp
.sz \n(ep
.TS
center;
|c||c|c|
c||c|c|.
_
BOX	boxid 	coordinates
_
	1	((-1,0), (0,1), (1,0), (0,-1))
	_	_
.TE
.sp
.ce 2
.sz \n(pp
\fBFigure 2.2\fR
Relation with user-defined type \*(lqboxcoordinates\*(rq
.sp 2
.sz \n(ep
.TS
center;
|c||c|c|c|c|c|c|c|c|c|
c||c|c|c|c|c|c|c|c|c|.
_
BOX	boxid	x1	y1	x2	y2	x3	y3	x4	y4
_
	1	-1	0	0	1	1	0	0	-1
	_	_	_	_	_	_	_	_	_
.TE
.sz \n(pp
.sp
.ce 2
\fBFigure 2.3\fR
Relation without user-defined types
.hl
.)z
.pp
POSTGRES also allows users to define operators
to be used in conjunction with user-defined data types.
By defining an operator \*(lqAREA,\*(rq a query to compute the area of the 
above box would be expressed as:
.(l
\fBretrieve\fR (a = AREA(BOX.coordinates)) \fBwhere\fR BOX.boxid = 1
.)l
rather than:
.(l
\fBretrieve\fR (a = sqrt (sqr (BOX.x1 - BOX.x2) + sqr (BOX.y1 - BOX.y2)) +
			sqrt (sqr (BOX.x1 - BOX.x4) + sqr (BOX.y1 - BOX.y4)))
	\fBwhere\fR BOX.boxid = 1.
.)l
.sp
In addition, an operator \*(lqAREAEQ\*(rq (area equals) can be defined
to find all boxes whose area is equal to some desired value.
For example,
.(l
\fBretrieve\fR (BOX.all) \fBwhere\fR BOX.coordinate AREAEQ 10
.)l
finds all boxes whose area equals ten.
Similarly, operators like \*(lqAREALT\*(rq (area less than) and
\*(lqAREAGT\*(rq (area greater than) can also be defined.
.pp
The operators, AREAEQ, AREAGT, and AREALT, are quite similar
to the conventional relational operators, =, >, and <.
Therefore, in the same way that users can specify access methods like
B-trees and hash tables to provide efficient access when used
with conventional operators,
users should also be able to specify access methods to efficiently
scan relations when non-standard operators
appear within restriction clauses.
One solution, which POSTGRES supports, is to allow users to extend 
existing access methods so they can be used with a new set of operators.
For example, by maintaining coordinate records within a B-Tree sorted 
in \*(lqAREALT\*(rq order rather than a more conventional \*(lq<\*(rq
or \*(lq>\*(rq order,
a user has an access method that
provides efficient access to queries whose restrictions contain
the AREAEQ, AREAGT, or AREALT operators.
Another example is a hash table that can be constructed using the operator
\*(lqAREAEQ\*(rq rather than \*(lq=.\*(rq
.pp
The technique just described is suitable provided there exists an
appropriate access method upon which extensions can be made.
However, suppose a user defines an operator contained-in, \*(lq$<<$,\*(rq that
returns true if the left operand is spatially contained within the
right operand.
To provide efficient access to this operator,
two-dimensional access methods are required, e.g.
an R-tree [GUTT84] or a K-D-B Tree [ROBI81].
Since conventional databases do not support two-dimensional access methods,
an extension of an existing access method is not possible.
To alleviate this problem,
POSTGRES allows users to define their own access methods.
Details on user-defined access methods are discussed in [STON86a].
.pp
In summary,
with an extendible type system, users can build data types to suit their
application needs, operators to manipulate the new types, and access methods
to provide efficient access to queries containing these new operators.
.sh 2 "Procedural Data Fields"
.pp
Existing relational databases do not provide good support for storage of
complex objects.
For example, if a complex object consists of a single box, circle, and
triangle, this information would be represented as shown in
figure 2.4.
As a result, three join queries must be executed to
retrieve all information about subobjects within this 
complex object.
.(z
.hl
.sz \n(ep
.in +2.5i
.nf
BOX (bid, bdata)
CIRCLE (cid, cdata)
TRIANGLE (tid, tdata)
.fi
.sp
.in -2.5i
.TS
center;
|c||c|c|c|
c||c|c|c|.
_
COBJECT	coid	objtype	oid
_
	1	box	2
	1	circle	3
	1	triangle	4
	_	_	_
.TE
.sp
.in +2i
.nf
\fBretrieve\fR (BOX.all) \fBwhere\fR
	BOX.bid = COBJECT.oid \fBand\fR
	COBJECT.objtype = \*(lqbox\*(rq \fBand\fR
	COBJECT.coid = 1

\fBretrieve\fR (CIRCLE.all) \fBwhere\fR
	CIRCLE.cid = COBJECT.oid \fBand\fR
	COBJECT.objtype = \*(lqcircle\*(rq \fBand\fR
	COBJECT.coid = 1

\fBretrieve\fR (TRIANGLE.all) \fBwhere\fR
	TRIANGLE.tid = COBJECT.oid \fBand\fR
	COBJECT.objtype = \*(lqtriangle\*(rq \fBand\fR
	COBJECT.coid = 1
.fi
.in -2i
.sz \n(pp
.sp
.ce 2
\fBFigure 2.4\fR
Storage of complex objects in a relational system
.hl
.)z
In the more general case where a complex object is composed of up to \fIn\fR
different subobject types, a user would have to execute \fIn\fR join queries.
Without extra information indicating which subobject types are actually
contained within a desired object, the user has no choice but to execute all
\fIn\fR queries.
This is quite inefficient, particularly when \fIn\fR is large, because
as indicated in the previous sentence, many of the join queries
are unnecessarily executed.
.pp
The basic problem here is that the relational model is not well-suited for
representing hierarchical relationships.
As a solution, Stonebraker has proposed embedding queries within data
fields and using these queries to express the hierarchical relationship
between the corresponding tuple and information elsewhere in the database
[STON84].
Using this idea, which POSTGRES supports, our complex object example is
now represented as shown in figure 2.5.
.(z
.hl
.sz \n(ep
.TS
center;
|c||c|c|
c||c|l|.
_
COBJECT	coid	components
_
	1	\fBretrieve\fR (BOX.all) \fBwhere\fR BOX.bid = 2
		\fBretrieve\fR (CIRCLE.all) \fBwhere\fR CIRCLE.cid = 3
		\fBretrieve\fR (TRIANGLE.all) \fBwhere\fR TRIANGLE.tid = 4
	_	_
.TE
.sz \n(pp
.sp
.ce 2
\fBFigure 2.5\fR
Storage of complex objects with procedural data fields
.hl
.)z
To retrieve information executed by the queries embedded within this data field,
the user would issue the following query:
.(l
\fBexecute\fR (COBJECT.components) \fBwhere\fR COBJECT.coid = 1.
.)l
Thus, \fIn\fR join queries reduce to a single \fBexecute\fR query.
In addition, users can selectively retrieve information linked through
these hierarchies by nesting attributes in a manner similar to the proposal
in GEM [ZANI83].
For example, to retrieve triangle information for a particular complex object,
a user would nest \*(lqtdata\*(rq within \*(lqcomponents\*(rq as shown below:
.(l
\fBretrieve\fR (COBJECT.components.tdata) \fBwhere\fR COBJECT.coid = 1.
.)l
In general, attributes can be nested to an arbitrary number of levels.
.sh 2 "The POSTGRES Optimizer"
.pp
Query optimization decisions are made based upon the characteristics of
operators appearing within queries as well as the index types defined
on relations.
In a conventional optimizer, information about
operators and access methods can be \*(lqhardwired\*(rq into the
optimization code because there are only a fixed number of
operators and access methods within the system.
Such a solution would not suffice in a system like POSTGRES where
an arbitrary number of operators and access methods are at a
user's disposal.
Consequently, this was an issue that had to be considered when designing the
POSTGRES optimizer.
.pp
The optimizer also must consider queries containing nested attributes.
As section 3.5 will describe, there is a clean and simple solution
to this problem, which only requires that the optimizer apply 
the basic planning algorithm once for each nesting level.
.pp
Rules and triggers will be processed using 
query modification [STON75].
This aspect of POSTGRES will not be discussed in this report because query
modification is being implemented in a module separate from the optimizer.
For details, see [STON86d].
.pp
Sophisticated algorithms have been proposed to optimize transitive closure
queries as well as sets of queries.
This is done by transforming sequences of database operations to
equivalent sequences that execute more efficiently.
This report, however, will not discuss these techniques any further because
the topic is outside the scope of this project.
For details, see [SELL85], [SELL86].
