QQUUEERRYY TTRREEEE SSPPEECCIIFFIICCAATTIIOONN ((PPrriinntteedd:: AAuugguusstt 1177,, 11999933)) This document describes the query tree as output by the parser. The tree is represented internally as a LISP object, which can be displayed by calling the procedure lliissppDDiissppllaayy(lispObject). In addition, there are three backend flags that can be set to dis- play the query tree at different stages of processing : DDeebbuuggPPrriinnttPPaarrssee, DDeebbuuggPPrriinnttRReewwrriitttteennPPaarrsseettrreeee and DDeebbuuggPPrriinnttPPllaann All three are set when POSTGRES is run in debug mode. For rreettrriieevvee, rreeppllaaccee, aappppeenndd and ddeelleettee commands, the parser will produce a query tree as described below. Other commands produce a list in which the first element is the command name. The format of the rest of the tree is command-specific and will not be described here. For notational purposes, anything appearing in italics is defined elsewhere in this document. Anything in bold face surrounded by quotes appears explicitly within a runtime query tree/plan as a LISP string. Anything appearing as a list is represented liter- ally as a LISP list. [] indicates something that is optional; {} indicates 0 or more. 11.. QQuueerryy TTrreeee RReepprreesseennttaattiioonn The parser will produce the following query tree for each query that requires optimization: (_R_o_o_t _T_a_r_g_e_t_L_i_s_t _Q_u_a_l_i_f_i_c_a_t_i_o_n) Each of the above three components is a list containing further information. 11..11.. RRoooott RRoooott looks like: (_N_u_m_L_e_v_e_l_s _C_o_m_m_a_n_d_T_y_p_e _R_e_s_u_l_t_R_e_l_a_t_i_o_n _R_a_n_g_e_T_a_b_l_e _P_r_i_o_r_i_t_y _R_u_l_e_I_n_f_o _U_n_i_q_u_e_F_l_a_g _S_o_r_t_C_l_a_u_s_e) This list contains information required by the planner as well as the executor. The planner itself does not modify any of these elements, although some of its preprocessing routines may change portions of root before it is seen by the executor. QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 22 11..11..11.. NNuummLLeevveellss NNuummLLeevveellss indicates the maximum attribute nesting in a query. That is, it indicates the maximum number of "nested dot" fields found in any variable in that query. For example, a query with- out range variables such as retrieve (x = 1 + 2) will have NNuummLLeevveellss = 0, a query such as retrieve (pinhead.name) where pinhead.saying = "Yow!!!" that contains normal range variables will have NNuummLLeevveellss = 1, a query such as retrieve (sorority.all) where sorority.members.name = "Marti" will have NNuummLLeevveellss = 2, and so on. 11..11..22.. CCoommmmaannddTTyyppee CCoommmmaannddTTyyppee indicates what kind of query is being executed. CCoomm-- mmaannddTTyyppee will be a single symbol from: rreettrriieevvee aappppeenndd rreeppllaaccee ddeelleettee (other POSTQUEL commands are not optimized, so the planner should never see them). 11..11..33.. RReessuullttRReellaattiioonn RReessuullttRReellaattiioonn specifies the target relation of a "retrieve" or the update relation for "append", "replace", and "delete". It may take one of three forms: (1) If the query is a "retrieve" and no result relation is specified, then RReessuullttRReellaattiioonn is nil. (2) If the query is a "retrieve into", then RReessuullttRReellaattiioonn is a list containing the symbol iinnttoo and the result rela- tion's name. (3) If the query is an update, RReessuullttRReellaattiioonn will always be the index into the _r_a_n_g_e_t_a_b_l_e corresponding to the rela- tion to be updated. 11..11..44.. RRaannggeeTTaabbllee RRaannggeeTTaabbllee is a list containing a _r_a_n_g_e_t_a_b_l_e _e_n_t_r_y for each instance of a relation (i.e., range variable) in a query. Every _r_e_l_i_d and _v_a_r_n_o within the query tree, and almost every one QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 33 within the resulting query plan, is an integer index within the query's rangetable. This index is 1-based (e.g., var nodes that refer to the first relation in the rangetable are given varno ``1''). A rangetable entry looks like: ( _I_n_s_t_a_n_c_e_V_a_r_i_a_b_l_e_N_a_m_e _R_e_l_a_t_i_o_n_N_a_m_e _R_e_l_a_t_i_o_n_O_I_D _T_i_m_e_R_a_n_g_e _F_l_a_g_s _R_u_l_e_L_o_c_k_s ) IInnssttaanncceeVVaarriiaabblleeNNaammee is the name used to identify the relation. It may be a class name, a surrogate for a class name, or the sym- bol new or current. RReellaattiioonnNNaammee is a string corresponding to the relation's catalog name, RReellaattiioonnOOIIDD is an integer corresponding to the OID of the RELATION relation tuple that describes the given relation, TTiimmeeRRaannggee is an internal structure that describes the historical time span over which this rangetable entry is valid, FFllaaggss is a list of zero or more symbols from: aarrcchhiivvee iinnhheerriittaannccee uunniioonn vveerrssiioonn that specifies special treatment from the planner, and RRuulleeLLoocckkss holds the rule lock information for that relation (since this is generated by a planner preprocessor, the parser simply puts nil here). In the case of archival, inheritance, union and version queries, the planner generates an AAPPPPEENNDD query plan node (see plan tree documentation) that contains a set of rangetable entries which must be substituted in turn for this rangetable entry, as well as separate query plans to be executed. 11..11..55.. PPrriioorriittyy PPrriioorriittyy is simply an integer from 0..7 that indicates the query priority. Only rule plans can have priorities greater than 0. 11..11..66.. RRuulleeIInnffoo If the query is part of a "define rule" query, RRuulleeIInnffoo is a con- taining a _r_u_l_e _i_d_e_n_t_i_f_i_e_r and exactly one symbol from: aallwwaayyss oonnccee nneevveerr The rule identifier is an object identifier from the RULE rela- tion; the symbol is the tag associated with the rule definition. For example, RRuulleeIInnffoo might look like: ( 387457893 always ) The rule identifier will be filled in by a traffic cop or planner preprocessor routine. Coming out of the parser, it is just a list containing the integer 0. For all other queries, RRuulleeIInnffoo is nil. QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 44 11..11..77.. UUnniiqquueeFFllaagg If the query is a "retrieve" and the unique tag is present, this will be the symbol uunniiqquuee, otherwise it will be nil. 11..11..88.. SSoorrttCCllaauussee If the query is a "retrieve" and the sort clause or the unique tag is present, this is a list of three elements: the symbol ssoorrtt, a list of rreessddoomm nodes (see below) indicating which attributes are to be used to perform the sort, and a list of operator OIDs, one for each attribute used. If the unique tag is present, all attributes in the target list will be used in the sort. 11..22.. TTaarrggeettLLiisstt A targetlist describes the structure of a tuple and consists of a list of (_r_e_s_d_o_m _e_x_p_r) pairs, one for each attribute in the tuple to be constructed. A rreessddoomm (result domain) node contains infor- mation about the attribute, and an eexxpprr (expression) describes how to set the value of the attribute. The initial targetlist from the query tree describes what the final result tuples should look like (targetlists are also used in query plans to describe intermediate results such as scan pro- jections, join results, and temporaries). In the case of the delete command, the initial targetlist is always nil. For the other update queries, the planner's preprocessor fills in any missing attributes that the user has not specified, either with constants corresponding to the default values of the missing attributes1 (for "append") or with variables corresponding to the unchanged attributes of a tuple (for "replace"). The structure of the rreessddoomm node is described below; an eexxpprr can consist of a combination of nodes: _e_x_p_r = _v_a_r | _c_o_n_s_t | _p_a_r_a_m | (_o_p_e_r _e_x_p_r [_e_x_p_r]) | (_f_u_n_c {_e_x_p_r} | _a_r_r_a_y_r_e_f) The structures of the different nodes in an expr are described below, but the function of the nodes may be briefly summarized as: ____________________ 1 If no default value exists for an attribute type, the at- tribute will have a null value. QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 55 _v_a_r Refers to some relation attribute entry which corresponds to the target list entry (i.e., the contents of a rela- tion's tuple). _c_o_n_s_t Constant values. _p_a_r_a_m Used as parameters -- placeholders for constants -- within a query. The planner treats them as if they are constants of unknown value (i.e., null constants) for optimization purposes. _o_p_e_r Describes the system- or user-defined unary or binary oper- ator that will be applied to the argument exprs to produce a new value. _f_u_n_c Correspond to calls to system- or user-defined functions, which may have zero or more arguments. _a_r_r_a_y_r_e_f Refers to an array reference within some array. It con- tains a slot describing how to produce the array. Example: ((_r_e_s_d_o_m _v_a_r) (_r_e_s_d_o_m (_o_p_e_r _v_a_r _v_a_r))) is a targetlist with two attributes, where the second element involves a computation between two tuple attributes. 11..33.. QQuuaalliiffiiccaattiioonn A qualification consists of restriction and join clauses which are evaluated before the final result is returned. The qualifications produced by the parser are not normalized in any way. That is, the parsed qualification will be an arbitrary boolean expression: _q_u_a_l_i_f_i_c_a_t_i_o_n = ({_q_u_a_l}) _q_u_a_l = _e_x_p_r | ("NNOOTT" _q_u_a_l) | ("OORR" _q_u_a_l {_q_u_a_l}) | ("AANNDD" _q_u_a_l {_q_u_a_l}) The qualification supplied to the planner must be normalized in certain ways.2 A qualification must be specified in conjunctive normal form. Furthermore, if it is at all possible, all "not"s are removed and all variables are placed on the left hand side of the clause operator. If it is not possible to remove a "not" (perhaps because a negator is unavailable), then the "not" must be pushed to the innermost comparison, e.g., "(not ((a = b) and (c = d)))" might become "(not (a = b) or not (c = d))" if the ____________________ 2 This normalization occurs in yet another preprocessing mod- ule. QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 66 operator "!=" does not exist for the appropriate types. A qualification as seen by the planner is: _q_u_a_l_i_f_i_c_a_t_i_o_n = ({_q_u_a_l | _o_r_q_u_a_l}) _q_u_a_l = _e_x_p_r | ("NNOOTT" _e_x_p_r) _o_r_q_u_a_l = ("OORR" _q_u_a_l _q_u_a_l {_q_u_a_l}) Example: If a user specifies the following clause: not (r.f = 1) or (r.f2 > 1 and 2 > r.f2), then in conjunctive normal form with "not"s removed and variables on the left hand side, we have the following equivalent clause: (r.f != 1 or r.f2 > 1) and (r.f != 1 or r.f2 < 2) In LISP form, it would look like: ((or (!= r.f 1) (> r.f2 1)) (or (!= r.f 1) (< r.f2 2))) QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 77 22.. PPrriimmiittiivvee NNooddeess The following is a description of the primitive nodes, which are used in both the input query tree and the final query plan. Each type of node is implemented as a defstruct vector. To access slots within a node, use: (get__s_l_o_t_n_a_m_e _n_o_d_e) where _s_l_o_t_n_a_m_e is the desired slot name as described in this sec- tion and _n_o_d_e is an appropriate defstruct node. To set a slot field, use: (set__s_l_o_t_n_a_m_e _n_o_d_e _n_e_w_-_s_l_o_t_-_v_a_l_u_e) To create a node, use: (Make_N_o_d_e_n_a_m_e {_s_l_o_t_-_v_a_l_u_e_-_i}) where _N_o_d_e_n_a_m_e is the name of the node with the first letter cap- italized, and _s_l_o_t_-_v_a_l_u_e_-_i is the value for the i-th slot in the node. Some of the slot fields in these nodes are not set by the parser. In addition, ffjjooiinn nodes are not created by the parser at all, while aarrrraayy nodes do not appear in query-trees, but they are included here for completeness because they are also primitive nodes. 22..11.. RREESSDDOOMM 1) rreessnnoo - result domain number. This is equivalent to the attribute number for a (temporary or result) tuple. Just as attribute numbers being at 1, a rreessnnoo of 1 denotes the first attribute in a targetlist. 2) rreessttyyppee - either: (1) the OID corresponding to the type of the result, or (2) *UNION-TYPE-ID*, an identifier for a type internal to the planner that describes multi-dot attributes (since it cannot determine the final attribute type until the nesting is eliminated by the executor). 3) rreessccoommpplleexx - t if the result type is a complex (i.e. tuple) type 4) rreesslleenn - length of a domain element in bytes (-1 if it is variable length). QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 88 5) rreessnnaammee - result domain name, either user-specified or the corresponding name of an attribute (if applicable). 6) rreesskkeeyy - ordinality of the result domain as a sort or hash key, e.g., 0 if the domain won't be included as a sort or hash key; 1 if the domain is the first key, 2 if it is the second, and so on. 7) rreesskkeeyyoopp - OID of the operator on which rreesskkeeyy will be sorted or hashed. 8) rreessjjuunnkk - t if the attribute is a junk attribute internal to the executor 22..22.. VVAARR 1) vvaarrnnoo - From the parser's standpoint, vvaarrnnoo is simply the index into the rangetable corresponding to the relation an attribute belongs to. For the query processor, vvaarrnnoo may take on one of three val- ues depending on the attribute entry under consideration. (1) If VVAARR belongs to the relation currently being scanned (either a cataloged or temporary relation, but not a materi- alized one), vvaarrnnoo is still the _i_n_d_e_x _i_n_t_o _t_h_e _r_a_n_g_e_t_a_b_l_e corresponding to the relation being scanned. (2) If VVAARR is associated with some join node, varno is the _n_a_m_e _o_f _t_h_e _t_e_m_p_o_r_a_r_y _j_o_i_n _r_e_l_a_t_i_o_n accessed. If VVAARR origi- nates from the outer join relation, vvaarrnnoo = ""OOUUTTEERR"". If it originated from the inner join relation, vvaarrnnoo = ""IINNNNEERR"". (3) For VVAARRs corresponding to entries within relations that are materialized from a POSTQUEL field or attributes from some previous nesting level, vvaarrnnoo is a _r_e_f_e_r_e_n_c_e _p_a_i_r, i.e., a list (_l_e_v_e_l_n_u_m _r_e_s_n_o), that corresponds to the attribute entry where either the query field or desired value can be found. _l_e_v_e_l_n_u_m refers to the processing level number (where _0 is the topmost), and _r_e_s_n_o refers to the result domain number within the tuple where the entry is stored. Therefore, whether a relation should be materialized is implicit in the vvaarrnnoo value and information in the attribute entry corresponding to the query field. 2) vvaarraattttnnoo - as with vvaarrnnoo, the meaning of this field depends on the node's position in the plan tree. (1) For attributes at the topmost nesting level, vvaarraattttnnoo corresponds to the _a_t_t_r_i_b_u_t_e _n_u_m_b_e_r within a relation (or domain number within a tuple if this is a join tuple -- that is, if vvaarrnnoo = ""OOUUTTEERR"" or ""IINNNNEERR""). (2) If the attribute corresponds to some attribute from a QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 99 previous nesting level, vvaarraattttnnoo = _n_i_l. (3) For nested attributes, the value of vvaarraattttnnoo is a _s_t_r_i_n_g identical to the attribute name since the attribute number will not be known until the POSTQUEL field is materialized. Note that at the lowest nesting level, vvaarraattttnnoo may be "AALLLL" (again, since neither the parser nor the planner know what the attributes of the field's parent are -- it must be determined at run-time by the query processor when the appropriate POSTQUEL field is retrieved). 3) vvaarrttyyppee - either: (1) the OID of the type of the attribute referred to by vvaarraattttnnoo, or (2) *UNION-TYPE-ID* 4) vvaarriidd - a list containing vvaarrnnoo and vvaarraattttnnoo. (_T_h_i_s _f_i_e_l_d _i_s _i_n_t_e_r_n_a_l _t_o _t_h_e _p_l_a_n_n_e_r) 5) vvaarrsslloott - cached pointer to address of expression context slot. (_T_h_i_s _f_i_e_l_d _i_s _i_n_t_e_r_n_a_l _t_o _t_h_e _e_x_e_c_u_t_o_r) 22..33.. OOPPEERR 1) ooppnnoo - PG_OPERATOR OID of the operator to which this node corresponds. 2) ooppiidd - PG_PROC OID of the function to which this operator is attached. This will be determined in the planner. 3) oopprreellaattiioonnlleevveell - t if an operator is a relation level oper- ator. 4) oopprreessuullttttyyppee - OID of the result type of this particular operator. 5) ooppssiizzee - size of the return result 6) oopp__ffccaacchhee - pointer to cached information on the function to which the operator is attached. (_T_h_i_s _f_i_e_l_d _i_s _i_n_t_e_r_n_a_l _t_o _t_h_e _p_l_a_n_n_e_r _a_n_d _e_x_e_c_u_t_o_r) 22..44.. CCOONNSSTT 1) ccoonnssttttyyppee - OID of the constant's type. 2) ccoonnssttlleenn - length in bytes (-1 if it is a variable-length type). QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 1100 3) ccoonnssttvvaalluuee - actual value of the constant (i.e., the inter- nal representation). ccoonnssttvvaalluuee is nil if the constant is null. Objects repre- sented by pointers must appear as palloc'ed LISP vectori- bytes. 4) ccoonnssttiissnnuullll - t if the constant has the null value. This is set by the planner when filling in "replace" and "append" targetlist entries which are unspecified by the user. If the typdefault field of the TTYYPPEE relation is not specified, ccoonnssttiissnnuullll is set to t and ccoonnssttvvaalluuee is set to nil. 22..55.. PPAARRAAMM 1) ppaarraammkkiinndd - specifies the kind of parameter. The possible values for this field are specified in "rules/params.h". They are: PPAARRAAMM__NNAAMMEEDD: The parameter has a name, i.e. something like `$.salary' or `$.foobar'. In this case field `paramname' must be a valid Name. PPAARRAAMM__NNUUMM: The parameter has only a numeric identifier, i.e. something like `$1', `$2' etc. The number is contained in the `paramid' field. PPAARRAAMM__NNEEWW: Used in an instance level rule, similar to PARAM_NAMED. The `paramname' and `paramid' refer to the "NEW" tuple. The `paramname' is the attribute name and `paramid' is the attribute number. PARAM_OLD: Same as PARAM_NEW, but in this case we refer to the "OLD" tuple. 2) ppaarraammiidd - numeric identifier for literal constant parameters ("$1"). 3) ppaarraammnnaammee - attribute name for tuple substitution parameters ("$.foo"). 4) ppaarraammttyyppee - OID of the type of the parameter 5) ppaarraamm__ttlliisstt - a _t_a_r_g_e_t_l_i_s_t describing the structure of the parameter as if it were the target of a query. This allows for projection in a param node. 22..66.. FFUUNNCC 1) ffuunncciidd - OID of the function to which this node corresponds. 2) ffuunnccttyyppee - OID of the type of the function's return value. 3) ffuunncciissiinnddeexx - can an index be used to evaluate this func- tion? (_I_n _t_h_e _p_r_e_s_e_n_t _s_y_s_t_e_m _o_n_l_y _o_p_e_r_a_t_o_r_s _c_a_n _b_e _e_v_a_l_u_a_t_e_d _u_s_i_n_g QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 1111 _i_n_d_i_c_e_s) 4) ffuunnccssiizzee - size of the function's return result 5) ffuunncc__ffccaacchhee - pointer to cached information on the function 6) ffuunncc__ttlliisstt - a _t_a_r_g_e_t_l_i_s_t describing the function's return value 7) ffuunncc__ppllaannlliisstt - list of plan trees generated by planning the function, if it's a POSTQUEL function. (_T_h_i_s _f_i_e_l_d _i_s _i_n_t_e_r_n_a_l _t_o _t_h_e _p_l_a_n_n_e_r _a_n_d _e_x_e_c_u_t_o_r) 22..77.. AARRRRAAYYRREEFF 1) rreeffaattttrrlleennggtthh - length of array being referred to, or -1 if it's a variable length array 2) rreeffeelleemmlleennggtthh - length in bytes of array element 3 rreeffeelleemmttyyppee - OID of type of array element 4 rreeffeelleemmbbyyvvaall - true if the element type can be passed by value 5 uuppppeerriinnddeexxpprr - expression evaluating to upper array index 6 lloowweerriinnddeexxpprr - expression evaluating to lower array index 7 rreeffeexxpprr - expression evaluating to the array being referred to 8 rreeffaassssggnneexxpprr - expression evaluating to the new value to be assigned to the array in a RREEPPLLAACCEE query 22..88.. AARRRRAAYY An AARRRRAAYY node describes the structure of an array type. It appears in the tree for CCRREEAATTEE and AADDDD__AATTTTRR commands. 1) aarrrraayyeelleemmttyyppee - OID of type of array element 2) aarrrraayyeelleemmlleennggtthh - length in bytes of array element 3) aarrrraayyeelleemmbbyyvvaall - true if the array element can be passed by value 4) aarrrraayynnddiimm - number of dimensions of the array QQuueerryy TTrreeee // QQuueerryy PPllaann SSppeecciiffiiccaattiioonn 1122 5) aarrrraayyllooww - base for array indexing 6) aarrrraayyhhiigghh - limit for array indexing 7) aarrrraayylleenn - length of entire array (_n_o_t _u_s_e_d) To create an AARRRRAAYY node: (MakeArray _a_r_r_a_y_e_l_e_m_t_y_p_e _a_r_r_a_y_e_l_e_m_l_e_n_g_t_h _a_r_r_a_y_e_l_e_m_b_y_v_a_l _a_r_r_a_y_n_d_i_m _a_r_r_a_y_l_o_w _a_r_r_a_y_h_i_g_h _a_r_r_a_y_l_e_n) 22..99.. FFJJOOIINN An FFJJOOIINN node describes how to flatten a target list after exe- cuting a join. This node is internal to the planner and execu- tor. A list will be produced containing an FFJJOOIINN node for each node included in the join. 1) ffjj__iinniittiiaalliizzeedd - true if the node has been initialized for the current target list evaluation 2) ffjj__nnnnooddeess - number of nodes to flatten 3) ffjj__iinnnneerrNNooddee - one node to flatten 4) ffjj__rreessuullttss - the complete flattened join result 5) ffll__aallwwaayyssDDoonnee - true if the node never produces any results, and therefore never needs to be reprocessed