.\"----------------------------------------------------------------------
.\" FILE
.\"	query-tree.t
.\"
.\" DESCRIPTION
.\"	Specification for the query (parser => planner) tree
.\"
.\" RCS ID
.\"	$Header: /usr/local/devel/postgres/src/newdoc/RCS/parse-tree.me,v 1.1 1993/08/06 23:25:27 avi Exp $
.\"----------------------------------------------------------------------
.ds ar <\\(em\\(em
.nr pp 11
.nr sp 12
.nr tp 12
.nr fp 10
.ll 6.5i
.(l C
.sz \n(sp
.ft B
QUERY TREE SPECIFICATION
(Printed: \*(td)
.ft R
.)l
.sp
.sz \n(pp
.sp
This document describes the query tree as output by the parser.  The
tree is represented internally as a LISP object, which can be displayed
by calling the procedure \fBlispDisplay\fR(lispObject).  In addition,
there are three backend flags that can be set to display the query
tree at different stages of processing :
.(l
\fBDebugPrintParse\fR, \fBDebugPrintRewrittenParsetree\fR and
\fBDebugPrintPlan\fR
.)l
All three are set when POSTGRES is run in debug mode.
.sp
For \fBretrieve\fR, \fBreplace\fR, \fBappend\fR and \fBdelete\fR
commands, the parser will produce a query tree as described below.
Other commands produce a list in which the first element is the
command name.  The format of the rest of the tree is command-specific
and will not be described here.
.sp
For notational purposes, anything appearing in italics is defined
elsewhere in this document.
Anything in bold face surrounded by quotes appears explicitly within a
runtime query tree/plan as a LISP string.
Anything appearing as a list is represented literally as a LISP list.
[] indicates something that is optional; {} indicates 0 or more.
.sp
.he 'Query Tree / Query Plan Specification''%'
.sh 1 "Query Tree Representation"
.sp
The parser will produce the following query tree for each query that
requires optimization:
.(l
(\fIRoot TargetList Qualification\fR)
.)l
Each of the above three components is a list containing further information.
.sh 2 "Root"
.sp
\fBRoot\fR looks like:
.(l
(\fINumLevels CommandType ResultRelation RangeTable Priority RuleInfo
UniqueFlag SortClause\fR)
.)l
This list contains information required by the planner as well as the
executor.
The planner itself does not modify any of these elements,
although some of its preprocessing routines may change portions of 
root before it is seen by the executor.
.sh 3 "NumLevels"
.sp
\fBNumLevels\fR indicates the maximum attribute nesting in a query.
That is, it indicates the maximum number of \*(lqnested dot\*(rq
fields found in any variable in that query.
For example, a query without range variables such as
.(l
retrieve (x = 1 + 2)
.)l
will have \fBNumLevels\fR = 0,
a query such as 
.(l
retrieve (pinhead.name) where pinhead.saying = "Yow!!!"
.)l
that contains normal range variables will have \fBNumLevels\fR = 1,
a query such as 
.(l
retrieve (sorority.all) where sorority.members.name = "Marti"
.)l
will have \fBNumLevels\fR = 2,
and so on.
.sh 3 "CommandType"
.sp
\fBCommandType\fR indicates what kind of query is being executed.
\fBCommandType\fR will be a single symbol from:
.(l
\fBretrieve append replace delete\fR
.)l
(other POSTQUEL commands are not optimized, so the planner should
never see them).
.sh 3 "ResultRelation"
.sp
\fBResultRelation\fR specifies the target relation of a 
\*(lqretrieve\*(rq
or the update relation for
\*(lqappend\*(rq, \*(lqreplace\*(rq, and \*(lqdelete\*(rq.
It may take one of three forms:
.np
If the query is a \*(lqretrieve\*(rq and no result relation is
specified, then \fBResultRelation\fR is nil.
.np
If the query is a \*(lqretrieve into\*(rq, then \fBResultRelation\fR
is a list containing the symbol \fBinto\fR and the result relation's
name.
.np
If the query is an update, \fBResultRelation\fR will always be the
index into the \fIrangetable\fR corresponding to the relation to be
updated.
.sh 3 "RangeTable"
.sp
\fBRangeTable\fR is a list containing a \fIrangetable entry\fR for
each instance of a relation (i.e., range variable) in a query.  Every
\fIrelid\fR and \fIvarno\fR within the query tree, and almost every one
within the resulting query plan, is an integer index within the
query's rangetable.  This index is 1-based (e.g., var nodes that refer
to the first relation in the rangetable are given varno ``1'').
.sp
A rangetable entry looks like:
.(l
( \fIInstanceVariableName RelationName RelationOID TimeRange Flags
RuleLocks\fR )
.)l
\fBInstanceVariableName\fR is the name used to identify the relation.
It may be a class name, a surrogate for a class name, or the symbol
new or current. \fBRelationName\fR is a string corresponding to the
relation's catalog name, \fBRelationOID\fR is an integer corresponding
to the OID of the RELATION relation tuple that describes the given
relation, \fBTimeRange\fR is an internal structure that describes the
historical time span over which this rangetable entry is valid,
\fBFlags\fR is a list of zero or more symbols from:
.(l
\fBarchive inheritance union version\fR
.)l
that specifies special treatment from the planner, and 
\fBRuleLocks\fR holds the rule lock information for that relation
(since this is generated by a planner preprocessor, the parser simply
puts nil here).
In the case of archival, inheritance, union and version queries,
the planner generates an \fBAPPEND\fR query plan node (see plan tree
documentation) that contains a set of rangetable entries which must be
substituted in turn for this rangetable entry, as well as separate
query plans to be executed.
.sh 3 "Priority"
.sp
\fBPriority\fR is simply an integer from 0..7 that indicates the
query priority.  Only rule plans can have priorities greater than 0.
.sh 3 "RuleInfo"
.sp
If the query is part of a \*(lqdefine rule\*(rq query, \fBRuleInfo\fR
is a containing a \fIrule identifier\fR and exactly one symbol from:
.(l
\fBalways once never\fR
.)l
The rule identifier is an object identifier from the RULE relation;
the symbol is the tag associated with the rule definition.
For example, \fBRuleInfo\fR might look like:
.(l
( 387457893 always )
.)l
The rule identifier will be filled in by a traffic cop or planner
preprocessor routine.  Coming out of the parser, it is just a list
containing the integer 0. For all other queries, \fBRuleInfo\fR is
nil. 
.sh 3 "UniqueFlag"
.sp
If the query is a \*(lqretrieve\*(rq and the unique tag is present,
this will be the symbol \fBunique\fR, otherwise it will be nil.
.sh 3 "SortClause"
.sp
If the query is a \*(lqretrieve\*(rq and the sort clause or the unique
tag is present, this is a list of three elements: the symbol
\fBsort\fR, a list of \fBresdom\fR nodes (see below) indicating which
attributes are to be used to perform the sort, and a list of operator
OIDs, one for each attribute used.  If the unique tag is present, all
attributes in the target list will be used in the sort.
.sh 2 "TargetList"
.sp
A targetlist describes the structure of a tuple
and consists of a list of (\fIresdom expr\fR) pairs, one for
each attribute in the tuple to be constructed.
A \fBresdom\fR (result domain) node contains information about the
attribute, and
an \fBexpr\fR (expression) describes how to set the value of the
attribute.
.sp
The initial targetlist from the query tree describes what the final
result tuples should look like (targetlists are also used in
query plans to describe intermediate results such as scan projections,
join results, and temporaries).
In the case of the delete command, the initial targetlist is always nil.
For the other update queries, the planner's preprocessor fills in any
missing attributes that the user has not specified, either with
constants corresponding to the default values of the missing
attributes\**
.(f
\** If no default value exists for an attribute type, the attribute
will have a null value.
.)f
(for \*(lqappend\*(rq) or with variables corresponding to
the unchanged attributes of a tuple (for \*(lqreplace\*(rq).
.sp
The structure of the \fBresdom\fR node is described below;
an \fBexpr\fR can consist of a combination of nodes:
.(l
\fIexpr\fR = \fIvar\fR | \fIconst\fR | \fIparam\fR | (\fIoper expr\fR
[\fIexpr\fR]) | (\fIfunc\fR {\fIexpr\fR} | \fI arrayref\fR) 
.)l
.sp
The structures of the different nodes in an expr are described
below, but the function of the nodes may be briefly summarized as:
.nr zz \n(ii
.nr ii 6m
.sp
.ip \fIvar\fR
Refers to some relation attribute entry which corresponds to the target
list entry (i.e., the contents of a relation's tuple).
.ip \fIconst\fR
Constant values.
.ip \fIparam\fR
Used as parameters \(em placeholders for constants \(em within a
query.  The planner treats them as if they are constants of unknown
value (i.e., null constants) for optimization purposes.
.ip \fIoper\fR
Describes the system- or user-defined unary or binary operator that
will be applied to the argument exprs to produce a new value.
.ip \fIfunc\fR
Correspond to calls to system- or user-defined functions, which
may have zero or more arguments.
.ip \fIarrayref\fR
Refers to an array reference within some array.  It contains a slot
describing how to produce the array.
.nr ii \n(zz
.lp
Example:  ((\fIresdom var\fR) (\fIresdom\fR (\fIoper var var\fR)))
.br
is a targetlist with two attributes, where the second element involves
a computation between two tuple attributes.
.sh 2 "Qualification"
.sp
A qualification consists of restriction and join clauses which are
evaluated before the final result is returned.
.sp
The qualifications produced by the parser are not normalized in any
way.
That is, the parsed qualification will be an arbitrary boolean
expression:
.(l
\fIqualification\fR = ({\fIqual\fR})
\fIqual\fR = \fIexpr\fR | (\*(lq\fBNOT\fR\*(rq \fIqual\fR) | (\*(lq\fBOR\fR\*(rq \fIqual \fR{\fIqual\fR}) | (\*(lq\fBAND\fR\*(rq \fIqual \fR{\fIqual\fR})
.)l
.sp
The qualification supplied to the planner must be normalized in
certain ways.\**
.(f
\** This normalization occurs in yet another preprocessing module.
.)f
A qualification must be specified in conjunctive normal form.
Furthermore, if it is at all possible, all \*(lqnot\*(rqs are removed
and all variables are placed on the left hand side of the clause
operator.
If it is not possible to remove a \*(lqnot\*(rq (perhaps because a
negator is unavailable),
then the \*(lqnot\*(rq must be pushed to the innermost comparison,
e.g.,
\*(lq(not ((a = b) and (c = d)))\*(rq
might become
\*(lq(not (a = b) or not (c = d))\*(rq
if the operator \*(lq!=\*(rq does not exist for the appropriate types.
.sp
A qualification as seen by the planner is:
.(l
\fIqualification\fR = ({\fIqual\fR | \fIorqual\fR})
\fIqual\fR = \fIexpr\fR | (\*(lq\fBNOT\fR\*(rq \fIexpr\fR)
\fIorqual\fR = (\*(lq\fBOR\fR\*(rq \fIqual qual \fR{\fIqual\fR})
.)l
.sp
Example: If a user specifies the following clause:
.(l
not (r.f = 1) or (r.f2 > 1 and 2 > r.f2),
.)l
then in conjunctive normal form with \*(lqnot\*(rqs removed
and variables on the left hand side, we have the following equivalent clause:
.(l
(r.f != 1 or r.f2 > 1) and (r.f != 1 or r.f2 < 2)
.)l
In LISP form, it would look like:
.(l
((or (!= r.f 1) (> r.f2 1)) (or (!= r.f 1) (< r.f2 2)))
.)l
.bp
.sh 1 "Primitive Nodes"
.sp
The following is a description of the primitive nodes, which are used
in both the input query tree and the final query plan.
.sp
Each type of node is implemented as a defstruct vector.  To access
slots within a node, use:
.(l
(get_\fIslotname node\fR)
.)l
where \fIslotname\fR is the desired slot name as described in this
section and \fInode\fR is an appropriate defstruct node.
.sp
To set a slot field, use:
.(l
(set_\fIslotname node new-slot-value\fR)
.)l
.sp
To create a node, use:
.(l
(Make\fINodename\fR {\fIslot-value-i\fR})
.)l
where \fINodename\fR is the name of the node with the first letter
capitalized, and \fIslot-value-i\fR is the value for the i-th slot in
the node.
.sp
Some of the slot fields in these nodes are not set by the parser.  In
addition, \fBfjoin\fR nodes are not created by the parser at all,
while \fBarray\fR nodes do not appear in query-trees, but they are
included here for completeness because they are also primitive nodes.
.sp
.sh 2 "RESDOM"
.ip "1)"
\fBresno\fR - result domain number.  This is equivalent to the
attribute number for a (temporary or result) tuple.  Just as attribute
numbers being at 1, a \fBresno\fR of 1 denotes the first attribute in
a targetlist.
.ip "2)"
\fBrestype\fR - either:
.br
(1) the OID corresponding to the type of the result, or
.br
(2) *UNION-TYPE-ID*, an identifier for a type internal to the planner
that describes multi-dot attributes (since it cannot determine the
final attribute type until the nesting is eliminated by the executor).
.ip "3)"
\fBrescomplex\fR - t if the result type is a complex (i.e. tuple) type
.ip "4)"
\fBreslen\fR - length of a domain element in bytes (-1 if it is
variable length).
.ip "5)"
\fBresname\fR - result domain name, either user-specified or the
corresponding name of an attribute (if applicable).
.ip "6)"
\fBreskey\fR - ordinality of the result domain as a sort or hash key,
e.g.,
0 if the domain won't be included as a sort or hash key;
1 if the domain is the first key,
2 if it is the second,
and so on.
.ip "7)"
\fBreskeyop\fR - OID of the operator on which \fBreskey\fR will be
sorted or hashed.
.ip "8)"
\fBresjunk\fR - t if the attribute is a junk attribute internal to the
executor
.sp
.sh 2 "VAR"
.ip "1)"
\fBvarno\fR - From the parser's standpoint, \fBvarno\fR is simply the
index into the rangetable corresponding to the relation an attribute
belongs to.
.sp
For the query processor, \fBvarno\fR may take on one of three values
depending on the attribute entry under consideration.
.br
(1) If \fBVAR\fR belongs to the relation currently being scanned
(either a cataloged or temporary relation, but not a materialized
one), \fBvarno\fR is still the \fIindex into the rangetable\fP
corresponding to the relation being scanned.
.br
(2) If \fBVAR\fR is associated with some join node, varno is the
\fIname of the temporary join relation\fP accessed.  If \fBVAR\fR
originates from the outer join relation, \fBvarno\fR =
\fB\*(lqOUTER\*(rq\fR.  If it originated from the inner join relation,
\fBvarno\fR = \fB\*(lqINNER\*(rq\fR.
.br
(3) For \fBVAR\fRs corresponding to entries within relations that are
materialized from a POSTQUEL field or attributes from some previous
nesting level, \fBvarno\fR is a \fIreference pair\fP, i.e., a list
(\fIlevelnum resno\fR), that corresponds to the attribute entry where
either the query field or desired value can be found.  \fIlevelnum\fR
refers to the processing level number (where \fI0\fR is the topmost),
and \fIresno\fR refers to the result domain number within the tuple
where the entry is stored.
.sp
Therefore, whether a relation should be materialized is implicit in
the \fBvarno\fR value and information in the attribute entry
corresponding to the query field.
.ip "2)"
\fBvarattno\fR - as with \fBvarno\fR, the meaning of this field
depends on the node's position in the plan tree.
.br
(1) For attributes at the topmost nesting level, \fBvarattno\fR
corresponds to the \fIattribute number\fR within a relation (or domain
number within a tuple if this is a join tuple \(em that is, if
\fBvarno\fR = \fB\*(lqOUTER\*(rq\fR or \fB\*(lqINNER\*(rq\fR).
.br
(2) If the attribute corresponds to some attribute from a previous
nesting level, \fBvarattno\fR = \fInil\fR.
.br
(3) For nested attributes, the value of \fBvarattno\fR is a
\fIstring\fR identical to the attribute name since the attribute
number will not be known until the POSTQUEL field is materialized.
Note that at the lowest nesting level, \fBvarattno\fR may be
\*(lq\fBALL\fR\*(rq (again, since neither the parser nor the planner
know what the attributes of the field's parent are \(em it must be
determined at run-time by the query processor when the appropriate
POSTQUEL field is retrieved).
.ip "3)"
\fBvartype\fR - either:
.br
(1) the OID of the type of the attribute referred to by \fBvarattno\fR,
or
.br
(2) *UNION-TYPE-ID*
.ip "4)"
\fBvarid\fR - a list containing \fBvarno\fR and \fBvarattno\fR.
.br
(\fIThis field is internal to the planner\fR)
.ip "5)"
\fBvarslot\fR - cached pointer to address of expression context slot.
.br
(\fIThis field is internal to the executor\fR)
.sp
.sh 2 "OPER"
.ip "1)"
\fBopno\fR - PG_OPERATOR OID of the operator to which this node corresponds.
.ip "2)"
\fBopid\fR - PG_PROC OID of the function to which this operator is
attached.  This will be determined in the planner.
.ip "3)"
\fBoprelationlevel\fR - t if an operator is a relation level operator.
.ip "4)"
\fBopresulttype\fR - OID of the result type of this particular
operator.
.ip "5)"
\fBopsize\fR - size of the return result
.ip "6)"
\fBop_fcache\fR - pointer to cached information on the function to
which the operator is attached.
.br
(\fIThis field is internal to the planner and executor\fR)
.sp
.sh 2 "CONST"
.ip "1)"
\fBconsttype\fR - OID of the constant's type.
.ip "2)"
\fBconstlen\fR - length in bytes (-1 if it is a variable-length type).
.ip "3)"
\fBconstvalue\fR - actual value of the constant (i.e., the internal
representation).
.br
\fBconstvalue\fR is nil if the constant is null.  Objects represented
by pointers must appear as palloc'ed LISP vectori-bytes.
.ip "4)"
\fBconstisnull\fR - t if the constant has the null value.
.br
This is set by the planner when filling in \*(lqreplace\*(rq and
\*(lqappend\*(rq targetlist entries which are unspecified by the user.
If the typdefault field of the \fBTYPE\fR relation is not specified,
\fBconstisnull\fR is set to t and \fBconstvalue\fR is set to nil.
.sp
.sh 2 "PARAM"
.ip "1)"
\fBparamkind\fR - specifies the kind of parameter. The possible values
for this field are specified in \*(lqrules/params.h\*(rq.  They are:
.br
\fBPARAM_NAMED\fR: The parameter has a name, i.e. something
like `$.salary' or `$.foobar'.  In this case field `paramname' must be
a valid Name.
\fBPARAM_NUM\fR:   The parameter has only a numeric identifier,
i.e. something like `$1', `$2' etc.  The number is contained in the
`paramid' field.
\fBPARAM_NEW\fR:   Used in an instance level rule, similar to PARAM_NAMED.
The `paramname' and `paramid' refer to the \*(lqNEW\*(rq tuple.
The `paramname' is the attribute name and `paramid'
is the attribute number.
PARAM_OLD:   Same as PARAM_NEW, but in this case we refer to
the \*(lqOLD\*(rq tuple.
.ip "2)"
\fBparamid\fR - numeric identifier for literal constant parameters
(\*(lq$1\*(rq).
.ip "3)"
\fBparamname\fR - attribute name for tuple substitution parameters
(\*(lq$.foo\*(rq).
.ip "4)"
\fBparamtype\fR - OID of the type of the parameter
.ip "5)"
\fBparam_tlist\fR - a \fItargetlist\fR describing the structure of the
parameter as if it were the target of a query.  This allows for
projection in a param node.
.sp
.sh 2 "FUNC"
.ip "1)"
\fBfuncid\fR - OID of the function to which this node corresponds.
.ip "2)"
\fBfunctype\fR - OID of the type of the function's return value.
.ip "3)"
\fBfuncisindex\fR - can an index be used to evaluate this function?
.br
(\fIIn the present system only operators can be evaluated using indices\fR)
.ip "4)"
\fBfuncsize\fR - size of the function's return result
.ip "5)"
\fBfunc_fcache\fR - pointer to cached information on the function
.ip "6)"
\fBfunc_tlist\fR - a \fItargetlist\fR describing the function's return
value
.ip "7)"
\fBfunc_planlist\fR - list of plan trees generated by planning the
function, if it's a POSTQUEL function.
.br
(\fIThis field is internal to the planner and executor\fR)
.sp
.sh 2 "ARRAYREF"
.sp
.ip "1)"
\fBrefattrlength\fR - length of array being referred to, or -1 if it's
a variable length array
.ip "2)"
\fBrefelemlength\fR - length in bytes of array element
.ip "3"
\fBrefelemtype\fR - OID of type of array element
.ip "4"
\fBrefelembyval\fR - true if the element type can be passed
by value
.ip "5"
\fBupperindexpr\fR - expression evaluating to upper array index
.ip "6"
\fBlowerindexpr\fR - expression evaluating to lower array index
.ip "7"
\fBrefexpr\fR - expression evaluating to the array being referred to
.ip "8"
\fBrefassgnexpr\fR - expression evaluating to the new value to be
assigned to the array in a \fBREPLACE\fR query
.sp
.sh 2 "ARRAY"
.sp
An \fBARRAY\fR node describes the structure of an array type.  It
appears in the tree for \fBCREATE\fR and \fBADD_ATTR\fR commands.
.ip "1)"
\fBarrayelemtype\fR - OID of type of array element
.ip "2)"
\fBarrayelemlength\fR - length in bytes of array element
.ip "3)"
\fBarrayelembyval\fR - true if the array element can be
passed by value
.ip "4)"
\fBarrayndim\fR - number of dimensions of the array
.ip "5)"
\fBarraylow\fR - base for array indexing
.ip "6)"
\fBarrayhigh\fR - limit for array indexing
.ip "7)"
\fBarraylen\fR - length of entire array (\fInot used\fR)
.sp
.lp
To create an \fBARRAY\fR node:
.sp
(MakeArray \fIarrayelemtype arrayelemlength arrayelembyval arrayndim
arraylow arrayhigh arraylen\fR)
.sp
.sh 2 "FJOIN"
.sp
An \fBFJOIN\fR node describes how to flatten a target list after
executing a join.  This node is internal to the planner and executor.
A list will be produced containing an \fBFJOIN\fR node for each node
included in the join.
.ip "1)"
\fBfj_initialized\fR - true if the node has been initialized
for the current target list evaluation
.ip "2)"
\fBfj_nnodes\fR - number of nodes to flatten
.ip "3)"
\fBfj_innerNode\fR - one node to flatten
.ip "4)"
\fBfj_results\fR - the complete flattened join result
.ip "5)"
\fBfl_alwaysDone\fR - true if the node never produces any results, and
therefore never needs to be reprocessed