TThhee PPOOSSTTGGRREESS UUsseerr MMaannuuaall _E_d_i_t_e_d _b_y _J_o_n _R_h_e_i_n_, _G_r_e_g _K_e_m_n_i_t_z _a_n_d _T_h_e _P_O_S_T_G_R_E_S _G_r_o_u_p _E_E_C_S _D_e_p_t_. _U_n_i_v_e_r_s_i_t_y _o_f _C_a_l_i_f_o_r_n_i_a_, _B_e_r_k_e_l_e_y 11.. OOVVEERRVVIIEEWW This document is the user manual for the POSTGRES database system under development at the University of California, Berkeley. This project, led by Professor Michael Stonebraker, is sponsored by the Defense Advanced Research Projects Agency (DARPA), the Army Research Office (ARO), the National Science Foundation (NSF), and ESL, Inc. 11..11.. DDIISSTTRRIIBBUUTTIIOONN This manual describes Version 4.1 of POSTGRES. The POSTGRES Group has compiled and tested Version 4.1 on the following platforms: | architecture | operating system ------------------+------------------ DECstation (MIPS) | ULTRIX V4.2 SPARC | SunOS 4.1.2 11..22.. PPEERRFFOORRMMAANNCCEE Version 4.1 has been tuned modestly. On the Wisconsin benchmark, one should expect performance about twice that of the public domain, University of California version of INGRES, a relational prototype from the late 1970's. 11..33.. AACCKKNNOOWWLLEEDDGGEEMMEENNTTSS POSTGRES has been constructed by a team of undergradu- ate, graduate, and staff programmers. The contributors (in alphabetical order) consisted of Paul Aoki, James Bell, Jennifer Caetta, Jolly Chen, Ron Choi, Adam Glass, Jeffrey Goh, Joey Hellerstein, Wei Hong, Anant Jhingran, Greg Kemnitz, Case Larsen, Jeff Meredith, Michael Olson, Nels Olson, Lay-Peng Ong, Spyros Potami- anos, Sunita Sarawagi and Cimarron Taylor. For version 4.1 Marc Teitelbaum served as chief pro- grammer and was responsible for overall coordination of the project. The above implementation team contributed significantly to this manual, as did Claire Mosher, Chandra Ghosh, 11 and Jim Frew. 22.. IINNTTRROODDUUCCTTIIOONN Traditional relational DBMSs support a data model con- sisting of a collection of named relations, each attribute of which has a specific type. In current commercial systems, possible types are floating point numbers, integers, character strings, money, and dates. It is commonly recognized that this model is inadequate for future data processing applications. The relational model succeeded in replacing previous models in part because of its simplicity. The POSTGRES data model offers substantial additional power by incorporating the following four additional basic con- structs: classes inheritance types functions The POSTGRES DBMS has been under construction since 1986. The initial concepts for the system were pre- sented in [STON86] and the initial data model appeared in [ROWE87]. The first rule system that was imple- mented is discussed in [STON88] and the storage manager concepts are detailed in [STON87]. The first "demo- ware" was operational in 1987, and we released Version 1 of POSTGRES to a few external users in June 1989. A critique of version 1 of POSTGRES appears in [STON90]. Version 2 followed in June 1990, and it included a new rule system documented in [STON90B]. Version 4.1, the current version of POSTGRES, is about 200,000 lines of code in the C programming language. POSTGRES is avail- able free of charge, and is being used by approximately 200 sites around the world at this writing. 33.. OORRGGAANNIIZZAATTIIOONN This manual discusses the POSTQUEL query language, including extensions such as user-defined types, opera- tors, and both query language and programming language functions. Arrays of types and functions of an instance are discussed, as well as the POSTGRES rule system. This manual concludes with a discussion on adding an operator class to POSTGRES for use in access methods. This manual describes the major concepts of the system and attempts to provide an accessible path into using the system. As such, it tries to give examples of the use of the major constructs, so a beginning user does not need to delve immediately into the Reference Man- ual. 22 44.. WWHHAATT YYOOUU SSHHOOUULLDD RREEAADD This manual is primarily intended to provide a broad overview of the system, as well as to illustrate how programmers would use functions to interact with the POSTGRES "backend." The POSTGRES Reference Manual dis- cusses additional aspects of the system, and provides full syntactic descriptions of every POSTGRES and POSTQUEL command in a format similar to that used in UNIX "man pages." If you are new to POSTGRES, you should probably read this manual first, followed by the parts of the POST- GRES Reference Manual necessary to build your applica- tion. In particular, you should read the section on LIBPQ if you intend to build a client application around POSTGRES, as this is not discussed at all in this manual. 55.. IInnssttaalllliinngg aanndd rruunnnniinngg PPOOSSTTGGRREESS For purposes of examples in this manual it is assumed that POSTGRES has been installed in the directory "/usr/local/postgres" with all the default settings. In practice, however, POSTGRES can be installed any- where. Also, in all the following examples it is required that the POSTGRES ppoossttmmaasstteerr be running. This is because all the commands, such as ccrreeaatteeddbb, mmoonniittoorr, and others all try to connect to the ppoossttmmaasstteerr. If you get the following error message from any of these commands SSttrreeaammOOppeenn:: ccoonnnneecctt:: eerrrrnnoo==6611 EErrrroorr:: FFaaiilleedd ttoo ccoonnnneecctt ttoo bbaacckkeenndd ((hhoosstt==xxxxxx,, ppoorrtt==44332211)) it is because the postmaster is not running. To start the postmaster, type the following command. %% ppoossttmmaasstteerr && 66.. TThhee PPOOSSTTQQUUEELL QQuueerryy LLaanngguuaaggee POSTQUEL is the query language used for interacting with POSTGRES. Here, we give an overview of how to use POSTQUEL to access data. In other sections, user extensions to POSTQUEL will be discussed. 66..11.. CCrreeaattiinngg aa ddaattaabbaassee Once POSTGRES has been installed at your site by fol- lowing the directions in the release notes, you can create a database named ffoooo using the following com- mand: %% ccrreeaatteeddbb ffoooo POSTGRES allows you to create any number of databases 33 at a given site and you automatically become the database administrator of the database just created. Database names must have an alphabetic first character and are limited to 16 characters in length. Once you have constructed a database, there are four ways to interact with it: +o You can run the POSTGRES terminal monitor which allows you to interactively enter, edit, and exe- cute commands in the query language POSTQUEL. +o You can interact with POSTGRES from a C program by using the LIBPQ library of subroutine and call facilities. This allows you to submit POSTQUEL commands from C and get answers and status mes- sages back to your program. This interface is discussed further in the LIBPQ section of the Ref- erence Manual. +o You can use the ffaasstt ppaatthh facility, which allows you to directly execute functions stored in the database. This facility is described in the Ref- erence Manual under "Fast Path." The terminal monitor can be activated for the ffoooo database by typing the command: %% mmoonniittoorr ffoooo (the ``%%'' is your UNIX shell prompt.) You will be greeted by the following message: WWeellccoommee ttoo tthhee CC PPOOSSTTGGRREESS tteerrmmiinnaall mmoonniittoorr GGoo ** The GGoo indicates the terminal monitor is listening to you and that you can type POSTQUEL commands into a workspace maintained by the monitor. The monitor indi- cates it is listening by typing ** as a prompt. Print- ing the workspace can be performed by typing: ** \\pp and it can be passed to POSTGRES for execution by typ- ing: ** \\gg If you make a typing mistake, you can invoke the vvii text editor by typing: ** \\ee The workspace will be passed to the editor, and you have the full power of vvii to make any necessary 44 changes. For more info on using vvii, type %% mmaann vvii Once you exit vvii, your edited query will be in the mon- itor's query buffer and you can submit it to POSTGRES by using the \\gg command described above. To get out of the monitor and return to UNIX, type ** \\qq and the monitor will respond: II lliivvee ttoo sseerrvvee yyoouu.. %% For a complete collection of monitor commands, see the manual page on mmoonniittoorr in the UNIX section of the Ref- erence Manual. If you are the database administrator for the database ffoooo, you can destroy it using the following UNIX com- mand: %% ddeessttrrooyyddbb ffoooo Other DBA commands include ccrreeaatteeuusseerr and ddeessttrrooyyuusseerr, which are discussed further in the UNIX section of the Reference Manual. 66..22.. CCllaasssseess aanndd tthhee QQuueerryy LLaanngguuaaggee PPOOSSTTQQUUEELL 66..22..11.. BBaassiicc CCaappaabbiilliittiieess The fundamental notion in POSTGRES is that of a ccllaassss,, which is a named collection of instances of objects. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has an installation-wide unique (never-changing) oobbjjeecctt iiddeennttiiffiieerr or ooiidd. 66..22..22.. CCrreeaattiinngg aa NNeeww CCllaassss (In order to try out the following POSTQUEL examples, create the ffoooo database as described in the previous section, and start the terminal monitor.) A user can create a new class by specifying the class name, along with all attribute names and their types: ** ccrreeaattee EEMMPP ((nnaammee == tteexxtt,, ssaallaarryy == iinntt44,, aaggee == iinntt44,, ddeepptt == cchhaarr1166)) \\gg ** ccrreeaattee DDEEPPTT ((ddnnaammee == cchhaarr1166,, fflloooorr == iinntt44,, mmaannaaggeerr == tteexxtt)) \\gg The POSTQUEL base types used above are a variable- 55 length array of printable characters (tteexxtt), a 4-byte signed integer (iinntt44), and a fixed-length array of 16 characters (cchhaarr1166.)1 Spaces, tabs and newlines may be used freely in POSTQUEL queries. So far, the ccrreeaattee command looks exactly like the cre- ate statement in a traditional relational system. How- ever, we will presently see that classes have proper- ties that are extensions of the relational model, so we use a different word to describe them. 66..22..33.. PPooppuullaattiinngg aa CCllaassss wwiitthh IInnssttaanncceess To populate a class with instances, one can use the aappppeenndd command: ** aappppeenndd EEMMPP ((nnaammee == ""JJooee"",, ssaallaarryy == 11440000,, aaggee == 4400,, ddeepptt == ""sshhooee"")) \\gg ** aappppeenndd EEMMPP ((nnaammee == ""SSaamm"",, ssaallaarryy == 11220000,, aaggee == 2299,, ddeepptt == ""ttooyy"")) \\gg ** aappppeenndd EEMMPP ((nnaammee == ""BBiillll"",, ssaallaarryy == 11660000,, aaggee == 3366,, ddeepptt == ""ccaannddyy"")) \\gg This will add 3 instances to EEMMPP, one for each aappppeenndd command. 66..22..44.. QQuueerryyiinngg aa CCllaassss The EEMMPP class can be queried with normal selection and projection queries. For example, to find the employees under 35 years of age, one would type: ** rreettrriieevvee ((EEMMPP..nnaammee)) wwhheerree EEMMPP..aaggee << 3355 \\gg and the output would be: +-----+ |nnaammee | +-----+ |SSaamm | +-----+ Notice that parentheses are required around the ttaarrggeett lliisstt of returned attributes (e.g., EEMMPP..nnaammee.) POSTQUEL allows you to return computations in the tar- get list as long as they are given a name (e.g., rreessuulltt): ** rreettrriieevvee ((rreessuulltt == EEMMPP..ssaallaarryy // EEMMPP..aaggee)) wwhheerree EEMMPP..nnaammee == ""BBiillll"" \\gg ____________________ 1See "Built-In Types" in the Reference Manual. 66 66..22..55.. RReeddiirreeccttiinngg rreettrriieevvee qquueerriieess Any retrieve query can be redirected to a new class in the database, and arbitrary boolean operators (aanndd, oorr, nnoott) are allowed in the qualification of any query: ** rreettrriieevvee iinnttoo tteemmpp ((EEMMPP..nnaammee)) wwhheerree EEMMPP..aaggee << 3355 aanndd EEMMPP..ssaallaarryy >> 11000000 \\gg 66..22..66.. JJooiinnss To find the names of employees which are the same age, one could write: ** rreettrriieevvee ((EE11..nnaammee,, EE22..nnaammee)) ffrroomm EE11 iinn EEMMPP,, EE22 iinn EEMMPP wwhheerree EE11..aaggee == EE22..aaggee aanndd EE11..nnaammee !!== EE22..nnaammee \\gg In this case both E1 and E2 are ssuurrrrooggaatteess for an instance of the class EEMMPP, and both range over all instances of the class. A POSTQUEL query can contain an arbitrary number of class names and surrogates.2 66..22..77.. UUppddaatteess Updates are accomplished in POSTQUEL using the rreeppllaaccee command: ** rreeppllaaccee EEMMPP ((ssaallaarryy == EE..ssaallaarryy)) ffrroomm EE iinn EEMMPP wwhheerree EEMMPP..nnaammee == ""JJooee"" aanndd EE..nnaammee == ""SSaamm"" \\gg This command replaces the salary of Joe by that of Sam. 66..22..88.. DDeelleettiioonnss Deletions are done using the ddeelleettee command: ** ddeelleettee EEMMPP wwhheerree EEMMPP..ssaallaarryy >> 00 \\gg Since all employees have positive salaries, this com- mand will leave the EEMMPP class empty. 66..22..99.. AArrrraayyss POSTGRES supports both fixed-length and variable-length one-dimensional arrays. To illustrate their use, we first create a class with an array type. ____________________ 2The semantics of such a join are that the qualification is a truth expression defined for the Cartesian product of the classes indicated in the query. For those instances in the Cartesian product for which the qualification is true, POSTGRES must compute and return the target list. 77 ** ccrreeaattee SSAALL__EEMMPP ((nnaammee == cchhaarr[[]],, ppaayy__bbyy__qquuaarrtteerr == iinntt44[[44]])) \\gg The above query will create a class named SSAALL__EEMMPP with a variable-length array of tteexxtt strings (nnaammee), and an array of 4 iinntt44 integers (ppaayy__bbyy__qquuaarrtteerr), which repre- sents the employee's salary by quarter. Now we do some aappppeenndds; note that when appending to a non-character array, we enclose the values within braces and separate them by commas. ** aappppeenndd SSAALL__EEMMPP ((nnaammee == ""bbiillll"",, ppaayy__bbyy__qquuaarrtteerr == ""{{1100000000,, 1100000000,, 1100000000,, 1100000000}}"")) \\gg ** aappppeenndd SSAALL__EEMMPP ((nnaammee == ""jjaacckk"",, ppaayy__bbyy__qquuaarrtteerr == ""{{1100000000,, 1155000000,, 1155000000,, 1155000000}}"")) \\gg ** aappppeenndd SSAALL__EEMMPP ((nnaammee == ""jjooee"",, ppaayy__bbyy__qquuaarrtteerr == ""{{2200000000,, 2255000000,, 2255000000,, 2255000000}}"")) \\gg POSTGRES uses the FORTRAN numbering convention for arrays--that is, POSTGRES arrays start with array[1] and end with array[n]. Now, we can run some queries on SSAALL__EEMMPP. This query retrieves the names of the employees whose pay changed in the second quarter: ** rreettrriieevvee ((SSAALL__EEMMPP..nnaammee)) wwhheerree SSAALL__EEMMPP..ppaayy__bbyy__qquuaarrtteerr[[11]] !!== SSAALL__EEMMPP..ppaayy__bbyy__qquuaarrtteerr[[22]] \\gg This query retrieves the third quarter pay of all employees: ** rreettrriieevvee ((SSAALL__EEMMPP..ppaayy__bbyy__qquuaarrtteerr[[33]])) \\gg This query deletes everyone from SSAALL__EEMMPP whose name begins with the letter "j." SSAALL__EEMMPP should now contain only the employee named "bill": ** ddeelleettee SSAALL__EEMMPP wwhheerree SSAALL__EEMMPP..nnaammee[[11]] == ''jj'' \\gg Let's make sure (note that the attribute aallll may be used as a shorthand for all attributes of a class): ** rreettrriieevvee ((SSAALL__EEMMPP..aallll)) \\gg 88 +-----+---------------------------+ |nnaammee | ppaayy__bbyy__qquuaarrtteerr | +-----+---------------------------+ |bbiillll | {{1100000000,,1100000000,,1100000000,,1100000000}} | +-----+---------------------------+ POSTGRES supports arrays of base and user-defined types, as well as "arrays of arrays," as in the follow- ing example: ** ccrreeaattee mmaannaaggeerr ((nnaammee == cchhaarr1166,, eemmppllooyyeeeess == tteexxtt[[]])) \\gg ** aappppeenndd mmaannaaggeerr ((nnaammee == ""mmiikkee"",, eemmppllooyyeeeess == ""{{""wweeii"",, ""ggrreegg"",, ""jjeeffff""}}"")) \\gg ** aappppeenndd mmaannaaggeerr ((nnaammee == ""aalliiccee"",, eemmppllooyyeeeess == ""{{""bbiillll"",, ""jjooee""}}"")) \\gg ** aappppeenndd mmaannaaggeerr ((nnaammee == ""mmaarrggee"",, eemmppllooyyeeeess == ""{{""mmiikkee"",, ""aalliiccee""}}"")) \\gg This creates a class mmaannaaggeerr, and provides a list of employees. 66..33.. AAddvvaanncceedd PPOOSSTTQQUUEELL Now we have covered the basics of using POSTQUEL to access your data. In this section we will discuss those features of POSTGRES which distinguish it from other data managers, such as inheritance and time travel. In the next section we will cover how the user can extend the query language via query language func- tions and composite objects, as well as additional extensions to POSTGRES using user defined types, opera- tors, and programming language functions. 66..33..11.. IInnhheerriittaannccee First, re-populate the EEMMPP class by repeating the aappppeenndd commands in section 6.2.3. Then, create a sec- ond class SSTTUUDD__EEMMPP, and populate it as follows: ** ccrreeaattee SSTTUUDD__EEMMPP ((llooccaattiioonn == ppooiinntt)) iinnhheerriittss ((EEMMPP)) \\gg ** aappppeenndd SSTTUUDD__EEMMPP ((nnaammee == ""SSuunniittaa"",, ssaallaarryy == 44000000,, aaggee == 2233,, ddeepptt == ""eelleeccttrroonniiccss"",, llooccaattiioonn == ""((33,, 55))"")) \\gg In this case, an instance of SSTTUUDD__EEMMPP iinnhheerriittss all data fields (nnaammee, ssaallaarryy, aaggee, and ddeepptt) from its parent, EEMMPP. Furthermore, student employees have an extra field, llooccaattiioonn, that shows their address as a coordi- nate pair. In POSTGRES, a class can inherit from zero 99 or more other classes,3 and a query can reference either all instances of a class or all instances of a class plus all of its descendants. For example, the following query finds the employees over 20: ** rreettrriieevvee ((EE..nnaammee)) ffrroomm EE iinn EEMMPP wwhheerree EE..aaggee >> 2200 \\gg On the other hand, to find the names of all employees, including student employees, over age 20, the query is: ** rreettrriieevvee ((EE..nnaammee)) ffrroomm EE iinn EEMMPP** wwhheerree EE..aaggee >> 2200 \\gg which returns: +-------+ |nnaammee | +-------+ |JJooee | +-------+ |SSaamm | +-------+ |BBiillll | +-------+ |SSuunniittaa | +-------+ Here the ** after EEMMPP indicates that the query should be run over EEMMPP and all classes below EEMMPP in the inheri- tance hierarchy. Note that llooccaattiioonn in SSTTUUDD__EEMMPP is not a traditional relational data type. As we will see later, POSTGRES can be customized with an arbitrary number of user- defined data types. 66..33..22.. TTiimmee TTrraavveell POSTGRES supports the notion of ttiimmee ttrraavveell. This fea- ture allows a user to run historical queries. For example, to find Sam's current salary, one would query: ** rreettrriieevvee ((EE..ssaallaarryy)) ffrroomm EE iinn EEMMPP[[""nnooww""]] wwhheerree EE..nnaammee == ""SSaamm"" \\gg POSTGRES will automatically find the version of Sam's record valid at the correct time and get the appropri- ate salary. One can also give a time rraannggee. For example to see all the salaries that Sam has ever earned, one would query: ** rreettrriieevvee ((EE..ssaallaarryy)) ffrroomm EE iinn EEMMPP[[""JJaann 11 0000::0000::0000 11997700 GGMMTT"",, ""nnooww""]] ____________________ 3i.e., the inheritance hierarchy is a directed acyclic graph. 1100 wwhheerree EE..nnaammee == ""SSaamm"" \\gg If you have executed all of the examples so far, then the above query returns: +-------+ |ssaallaarryy | +-------+ |11220000 | +-------+ |11220000 | +-------+ There are two salaries for Sam, since he was deleted from and then re-appended to the EEMMPP class. The default beginning of a time range is the origin of the system clock (which just happens to be ``JJaann 11 0000::0000::0000 11997700 GGMMTT'' on UNIX systems), and the default end is the current time; thus, the above time range can be abbreviated as ``[[,,]].'' 77.. UUsseerr EExxtteennssiioonnss ttoo PPOOSSTTQQUUEELL Here, we will discuss user extensions to the POSTQUEL query language, query language functions, composite types, and user defined types, functions and operators. 77..11.. UUsseerr DDeeffiinneedd PPOOSSTTQQUUEELL FFuunnccttiioonnss POSTQUEL provides two types of functions: qquueerryy llaann-- gguuaaggee ffuunnccttiioonnss (functions written in POSTQUEL) and pprrooggrraammmmiinngg llaanngguuaaggee ffuunnccttiioonnss (functions written in a separately-compiled programming language such as C.) In this section we will cover POSTQUEL functions; pro- gramming language functions will be covered below with the discussion on user-defined types. Any collection of commands in the POSTQUEL query lan- guage can be packaged together and defined as a func- tion, usually returning either a set of instances or a set of base types. For example, the following function hhiigghh__ppaayy returns all employees in class EEMMPP whose salaries exceed 50,000: ** ddeeffiinnee ffuunnccttiioonn hhiigghh__ppaayy ((llaanngguuaaggee == ""ppoossttqquueell"",, rreettuurrnnttyyppee == sseettooff EEMMPP)) aass ""rreettrriieevvee ((EEMMPP..aallll)) wwhheerree EEMMPP..ssaallaarryy >> 5500000000"" \\gg POSTQUEL functions can also have parameters. The fol- lowing function llaarrggee__ppaayy allows the threshold salary to be specified as an argument: ** ddeeffiinnee ffuunnccttiioonn llaarrggee__ppaayy ((llaanngguuaaggee == ""ppoossttqquueell"",, rreettuurrnnttyyppee == sseettooff EEMMPP)) aarrgg iiss ((iinntt44)) aass ""rreettrriieevvee ((EEMMPP..aallll)) wwhheerree EEMMPP..ssaallaarryy >> $$11"" \\gg 1111 In addition to their obvious utility as "aliases" for commonly-used queries, POSTQUEL functions are useful for creating composite types, as described below. 77..22.. CCoommppoossiittee TTyyppeess Since POSTQUEL functions return instances or sets of instances, they are the mechanism used to assign values to composite types. For example, consider extending the EEMMPP class with a mmaannaaggeerr field. That is, for each instance of EEMMPP, we want to associate another instance of EEMMPP corresponding to the manager of the first instance. Specifically, we will define a POSTQUEL function mmaannaaggeerr: ** ddeeffiinnee ffuunnccttiioonn mmaannaaggeerr ((llaanngguuaaggee == ""ppoossttqquueell"",, rreettuurrnnttyyppee == EEMMPP)) aarrgg iiss ((EEMMPP)) aass ""rreettrriieevvee ((EE..aallll)) ffrroomm EE iinn EEMMPP wwhheerree EE..nnaammee == DDEEPPTT..mmaannaaggeerr aanndd DDEEPPTT..ddnnaammee == $$11..ddeepptt"" \\gg The function mmaannaaggeerr takes an instance as its only argument, so POSTQUEL allows referencing into it with the use of the nested dot notation. Whenever such a function is defined over a class, a user can utilize the cascaded dot notation to reference into (i.e. access the fields of) the objects returned by the func- tion. The following query finds all the employees who work for Joe: ** rreettrriieevvee ((EEMMPP..nnaammee)) wwhheerree EEMMPP..mmaannaaggeerr..nnaammee == ""JJooee"" \\gg This is exactly equivalent to: ** rreettrriieevvee ((EEMMPP..nnaammee)) wwhheerree nnaammee((mmaannaaggeerr((EEMMPP)))) == ""JJooee"" \\gg Here, we have essentially added an attribute to the EEMMPP class which is of type EEMMPP, i.e. it has a value which is an instance of the class EEMMPP. Since the value of mmaannaaggeerr has a record-oriented structure, we call it a ccoommppoossiittee oobbjjeecctt. Consequently, the user can think of the function mmaannaaggeerr as an attribute of EEMMPP and can reference it just like any other attribute, with the following two exceptions. First, one cannot do direct aappppeenndds--that is, ** aappppeenndd eemmpp ((eemmpp..mmaannaaggeerr..nnaammee == ""SSmmiitthh"")) \\gg wwoonn''tt work. Non-projected retrieves will also be rejected, i.e.: 1122 ** rreettrriieevvee ((eemmpp..mmaannaaggeerr)) \\gg will result in a warning from the POSTQUEL language parser. Note that mmaannaaggeerr is defined as returning a single instance of EEMMPP.. We can also write a POSTQUEL function that returns sets of instances. For example, consider the function ** ddeeffiinnee ffuunnccttiioonn cchhiillddrreenn ((llaanngguuaaggee == ""ppoossttqquueell"",, rreettuurrnnttyyppee == sseettooff KKIIDDSS)) aarrgg iiss ((EEMMPP)) aass ""rreettrriieevvee ((KKIIDDSS..aallll)) wwhheerree $$11..nnaammee == KKIIDDSS..ddaadd oorr $$11..nnaammee == KKIIDDSS..mmoomm""\\gg The cchhiillddrreenn function is defined as returning a set of instances, rather than a single instance. Given the query ** rreettrriieevvee((eemmpp..nnaammee,, eemmpp..cchhiillddrreenn..nnaammee)) if the query in the body of the cchhiillddrreenn function returns many instances, the retrieve query will return all of them, in a "flattened" form. If the query in the body of mmaannaaggeerr returns more than one instance, the mmaannaaggeerr function will return only one instance, arbi- trarily chosen from the set returned by the query in the function's body. See the POSTGRES Reference Man- ual's entry on the ddeeffiinnee ffuunnccttiioonn command for further details and examples. 88.. UUsseerr DDeeffiinneedd TTyyppeess,, OOppeerraattoorrss,, aanndd PPrrooggrraammmmiinngg LLaanngguuaaggee FFuunnccttiioonnss The central concept of extending POSTGRES lies in POST- GRES's ability to ddyynnaammiiccaallllyy llooaadd a binary object file created by the user. This allows POSTGRES to call arbitrary user functions which can be written in a standard programming language. These functions can then be used: +o to convert between iinntteerrnnaall (binary) and eexxtteerrnnaall (character string) representations of user-defined types; +o as operators; and +o to define ordering for indices on user-defined types. POSTGRES's concept of types includes bbuuiilltt--iinn types and uusseerr--ddeeffiinneedd types. Built-in types are those required by the system to bootstrap itself. User-defined types are those created by the user in the manner described below. There is no intrinsic performance difference between using a system type or user-defined type, other 1133 than the overhead due to the complexity of the type itself. 88..11.. IInntteerrnnaall ssttoorraaggee ooff ttyyppeess Internally, POSTGRES regards a user-defined type as a "blob of memory" upon which user-defined functions impose structure and meaning. POSTGRES will store and retrieve the data from disk and use user-defined func- tions to input, process, and output the data. 88..22.. FFuunnccttiioonnss nneeeeddeedd ffoorr aa uusseerr--ddeeffiinneedd ttyyppee A completely defined user type requires the following user-defined functions: +o iinnppuutt and oouuttppuutt functions for the type: These functions determine how the type appears in strings (for input by the user and output to the user) and how the type is organized in memory. These at least are necessary to define the type. +o ooppeerraattoorr functions for the type: These functions define the meanings of "equal," "less than," "greater than," etc., for your type. 88..33.. AAnn EExxaammppllee UUsseerr DDeeffiinneedd TTyyppee In this discussion, we will be defining a cciirrccllee type, using functions written in the C programming language. 88..33..11.. DDaattaa ssttrruuccttuurreess ffoorr oouurr ttyyppee Before we do anything, we have to decide on what a cir- cle looks like, both in string format and internally in memory. Circles have a center and a radius, so a rea- sonable string representation of a circle would be an ordered triple: (center_x, center_y, radius) where each element is a real number with arbitrary units, e.g.: ((55..00,, 1100..33,, 33)) This is what the input to the circle input function looks like, and what the output from the circle output function looks like. Now we have to come up with an internal representation for a circle in memory. The following declarations are legal and reasonable given the format we chose above: ttyyppeeddeeff ssttrruucctt {{ ddoouubbllee xx,, yy;; }} PPOOIINNTT;; ttyyppeeddeeff ssttrruucctt {{ PPOOIINNTT cceenntteerr;; 1144 ddoouubbllee rr;; }} CCIIRRCCLLEE;; Memory containing values of type CCIIRRCCLLEE will be written to disk and read from disk, so CCIIRRCCLLEE must be both ccoomm-- pplleettee and ccoonnttiigguuoouuss; that is, it cannot contain any pointers. The alternate declaration ttyyppeeddeeff ssttrruucctt {{ PPOOIINNTT **cceenntteerr ddoouubbllee rr;; }} CCIIRRCCLLEE;; will NNOOTT work, because only the address stored in cceenn-- tteerr would be written to disk, not the PPOOIINNTT structure that cceenntteerr presumably points to. POSTGRES cannot detect this kind of coding error; you must guard against it yourself. 88..33..22.. DDeeffiinniinngg tthhee iinnppuutt aanndd oouuttppuutt ffuunnccttiioonnss ffoorr oouurr ttyyppee Suppose in defining our type "circle," we have a C source file called cciirrccllee..cc, and a corresponding object code file //uussrr//llooccaall//ppoossttggrreess//cciirrccllee..oo. (All functions related to our cciirrccllee type have to be in the same object file.) For the sake of argument, suppose we our platform is a DECstation, where sizeof(double) is 8 bytes (this will be important later). We will create source file cciirrccllee..cc, containing C source code for the functions that support our CCIIRRCCLLEE type. cciirrccllee..cc contains three functions: +o cciirrccllee__iinn, which is the input function for cir- cles. It takes a string as an argument and returns a pointer to a CCIIRRCCLLEE. +o cciirrccllee__oouutt, which is the output function for cir- cles. It is takes a pointer to s CCIIRRCCLLEE as input and returns a string. The return value of cciirrccllee__iinn must be a legal argument to cciirrccllee__oouutt, and vice versa. +o eeqq__aarreeaa__cciirrccllee, which is the equality function for circles. For the purposes of this discussion, circles are equal if their areas are equal. The contents of cciirrccllee..cc are: ##iinncclluuddee <> ##iinncclluuddee <> ##iinncclluuddee <> ##iinncclluuddee ""ttmmpp//cc..hh"" //** ((aallwwaayyss)) **// ##iinncclluuddee ""uuttiillss//ggeeoo--ddeeccllss..hh"" //** ffoorr PPOOIINNTT ddeeccllaarraattiioonn **// ##iinncclluuddee ""uuttiillss//ppaalllloocc..hh"" //** ffoorr ppaalllloocc(()) ddeeccllaarraattiioonn **// ttyyppeeddeeff ssttrruucctt {{ 1155 PPOOIINNTT cceenntteerr;; ddoouubbllee rraaddiiuuss;; }} CCIIRRCCLLEE;; ##ddeeffiinnee LLDDEELLIIMM ''(('' ##ddeeffiinnee RRDDEELLIIMM ''))'' ##ddeeffiinnee NNAARRGGSS 33 CCIIRRCCLLEE ** cciirrccllee__iinn((ssttrr)) cchhaarr **ssttrr;; {{ cchhaarr **pp,, **ccoooorrdd[[NNAARRGGSS]];; iinntt ii;; CCIIRRCCLLEE **rreessuulltt;; iiff ((ssttrr ==== NNUULLLL)) rreettuurrnn((NNUULLLL));; ffoorr ((ii == 00,, pp == ssttrr;; **pp &&&& ii << NNAARRGGSS &&&& **pp !!== RRDDEELLIIMM;; pp++++)) {{ iiff ((**pp ==== '',,'' |||| ((**pp ==== LLDDEELLIIMM &&&& !!ii)))) ccoooorrdd[[ii++++]] == pp ++ 11;; }} iiff ((ii << NNAARRGGSS -- 11)) rreettuurrnn((NNUULLLL));; rreessuulltt == ((CCIIRRCCLLEE **)) ppaalllloocc((ssiizzeeooff((CCIIRRCCLLEE))));; rreessuulltt-->>cceenntteerr..xx == aattooff((ccoooorrdd[[00]]));; rreessuulltt-->>cceenntteerr..yy == aattooff((ccoooorrdd[[11]]));; rreessuulltt-->>rraaddiiuuss == aattooff((ccoooorrdd[[22]]));; rreettuurrnn((rreessuulltt));; }} cchhaarr ** cciirrccllee__oouutt((cciirrccllee)) CCIIRRCCLLEE **cciirrccllee;; {{ cchhaarr **rreessuulltt;; iiff ((cciirrccllee ==== NNUULLLL)) rreettuurrnn((NNUULLLL));; rreessuulltt == ((cchhaarr **)) ppaalllloocc((6600));; sspprriinnttff((rreessuulltt,, ""((%%gg,, %%gg,, %%gg))"",, cciirrccllee-->>cceenntteerr..xx,, cciirrccllee-->>cceenntteerr..yy,, cciirrccllee-->>rraaddiiuuss));; rreettuurrnn((rreessuulltt));; }} 1166 iinntt eeqq__aarreeaa__cciirrccllee((cciirrccllee11,, cciirrccllee22)) CCIIRRCCLLEE **cciirrccllee11,, **cciirrccllee22;; {{ rreettuurrnn((cciirrccllee11-->>rraaddiiuuss ==== cciirrccllee22-->>rraaddiiuuss));; }} Now that we have written these functions and compiled the source file,4 we have to let POSTGRES know that they exist. First, we run the following queries to define the input and output functions. These functions must be defined bbeeffoorree we define the type. POSTGRES will notify you that return type circle is not defined yet, but this is OK5: ** ddeeffiinnee ffuunnccttiioonn cciirrccllee__iinn ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == cciirrccllee)) aarrgg iiss ((cchhaarr1166)) aass ""//uussrr//llooccaall//ppoossttggrreess//cciirrccllee..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn cciirrccllee__oouutt ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == cchhaarr1166)) aarrgg iiss ((cciirrccllee)) aass ""//uussrr//llooccaall//ppoossttggrreess//cciirrccllee..oo"" \\gg Note that the full pathname of the object code file must be specified, so you would change //uussrr//llooccaall//ppoossttggrreess// to whatever is appropriate for your installation. Now we can define the cciirrccllee type: ** ddeeffiinnee ttyyppee cciirrccllee ((iinntteerrnnaalllleennggtthh == 2244,, iinnppuutt == cciirrccllee__iinn,, oouuttppuutt == cciirrccllee__oouutt)) \\gg where iinntteerrnnaalllleennggtthh is the size of the CCIIRRCCLLEE struc- ture in bytes. For circles, the type members are three ddoouubbllees, which on most platforms are 8 bytes each, with no additional alignment constraints. However, when ____________________ 4You will need to supply an option like --II//uussrr//llooccaall//ppoossttggrreess//ssrrcc//bbaacckkeenndd to your C compiler so it can find the POSTGRES ``..hh'' files. Also, various platform- specific compiler options may be required to support POST- GRES dynamic linking (for example, the DECstation ULTRIX compiler requires the ``--GG00'' option.) See "define func- tion" in the Reference Manual for details. 5By default, user-defined C functions use addresses in- stead of values for all but "small" (<= 4-byte) argument and return types, so we can use the POSTQUEL type cchhaarr1166 as a placeholder for the C type cchhaarr **. 1177 defining your own types, you should nnoott make assump- tions about structure sizes, but instead write a test program that does a pprriinnttff((""ssiizzee iiss %%dd\\nn"",, ssiizzeeooff ((MMYYTTYYPPEE))));; on your type. If iinntteerrnnaalllleennggtthh is defined incorrectly, you will encounter strange errors which may crash the data man- ager itself. If this were to happen with our CCIIRRCCLLEE type, we would have to do a ** rreemmoovvee ttyyppee cciirrccllee \\gg and then redefine the cciirrccllee type correctly. Note that we would nnoott have to redefine our functions, since their behavior would not have changed. 88..33..33.. DDeeffiinniinngg aann ooppeerraattoorr ffoorr oouurr ttyyppee Now that we have finished defining the cciirrccllee type, we can ccrreeaattee classes with circles in them, aappppeenndd records to them with circles defined, and rreettrriieevvee the values of the entire list of records. But we can do nothing else until we have some circle operators. To do this, we make use of the concept of ooppeerraattoorr oovveerrllooaaddiinngg, and in this case we will set the POSTGRES equality operator ``=='' to work for circles. First we have to tell POST- GRES that our circle equality function exists: ** ddeeffiinnee ffuunnccttiioonn eeqq__aarreeaa__cciirrccllee ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) aarrgg iiss ((cciirrccllee,, cciirrccllee)) aass ""//uussrr//llooccaall//ppoossttggrreess//cciirrccllee..oo"" \\gg We will now bind this function to the equality symbol with the following query: ** ddeeffiinnee ooppeerraattoorr == ((aarrgg11 == cciirrccllee,, aarrgg22 == cciirrccllee,, pprroocceedduurree == eeqq__aarreeaa__cciirrccllee)) \\gg 88..33..44.. UUssiinngg oouurr ttyyppee Let's create a class ttuuttoorriiaall that contains a cciirrccllee attribute, and run some queries against it: ** ccrreeaattee ttuuttoorriiaall((aa == cciirrccllee)) \\gg ** aappppeenndd ttuuttoorriiaall ((aa == ""((11..00,, 11..00,, 1100..00))""::::cciirrccllee)) \\gg ** aappppeenndd ttuuttoorriiaall ((aa == ""((22..00,, 22..00,, 55..00))""::::cciirrccllee)) \\gg ** aappppeenndd ttuuttoorriiaall ((aa == ""((00..00,, 11..88,, 1100..00))""::::cciirrccllee)) \\gg 1188 ** rreettrriieevvee ((ttuuttoorriiaall..aallll)) wwhheerree ttuuttoorriiaall..aa == ""((00..00,, 00..00,, 1100..00))""::::cciirrccllee \\gg which returns: +-----------------+ |aa | +-----------------+ |((11..00,, 11..00,, 1100..00)) | +-----------------+ |((00..00,, 11..88,, 1100..00)) | +-----------------+ Recall that we defined circles as being equal if their areas were equal. Other operators (less than, greater than, etc.) can be defined in a similar way. Note that the ``=='' symbol will still work for other types--it has merely had a new type added to the list of types it works on. Any string of "punctuation characters" other than brackets, braces, or parentheses can be used in defining an oper- ator. 88..44.. AAddddiittiioonnaall iinnffoo oonn ccrreeaattiinngg aa uusseerr--ddeeffiinneedd ffuunnccttiioonn 88..44..11.. UUssee ppaalllloocc aanndd nnoott mmaalllloocc In order for POSTGRES to correctly manage memory asso- ciated with processing your type, you must use the mem- ory allocator ppaalllloocc and avoid standard UNIX memory managers such as mmaalllloocc. If you do not, POSTGRES will chew up ever increasing amounts of memory. ppaalllloocc has the same arguments as mmaalllloocc, that is cchhaarr **ppaalllloocc((ssiizzee)) uunnssiiggnneedd lloonngg ssiizzee;; To free memory allocated with ppaalllloocc, use ppffrreeee, which is analogous to the UNIX library function ffrreeee: vvooiidd ppffrreeee((ppttrr)) cchhaarr **ppttrr;; 88..44..22.. RRee--llooaaddiinngg uusseerr ffuunnccttiioonnss In the process of creating a user-defined type, you may find it necessary to re-load a function in the course of debugging. This is nnoott done automatically when you edit or re-compile the file, but iiss done if you quit and restart the data manager. We would re-load our example functions by using the following command: ** llooaadd ""//uussrr//llooccaall//ppoossttggrreess//cciirrccllee..oo"" \\gg 1199 88..44..33.. WWrriittiinngg aa FFuunnccttiioonn ooff aann IInnssttaannccee We've already discussed user functions which take POST- GRES base or user defined types as arguments; in this section, we will discuss inheritable C functions or methods. C language methods are useful particularly when we want to make a function iinnhheerriittaabbllee; that is, to have the function process every instance in an inheritance hier- archy of classes. In using a function of an instance in qualifying an instance, POSTGRES defines the "current instance" to be the instance being qualified at the moment your func- tion is called. The instance itself will be passed in your function's parameter list as an opaque structure of type TUPLE, and you will use POSTGRES library rou- tines to access the data in the object as described below.6 Suppose we want to write a function to answer the query ** rreettrriieevvee ((EEMMPP..aallll)) wwhheerree oovveerrppaaiidd((EEMMPP)) \\gg In the query above, a reasonable oovveerrppaaiidd function might be: bbooooll oovveerrppaaiidd((tt)) TTUUPPLLEE tt;; //** tthhee ccuurrrreenntt iinnssttaannccee **// {{ eexxtteerrnn cchhaarr **GGeettAAttttrriibbuutteeBByyNNaammee(());; sshhoorrtt ssaallaarryy,, sseenniioorriittyy,, ppeerrffoorrmmaannccee;; ssaallaarryy == ((sshhoorrtt)) GGeettAAttttrriibbuutteeBByyNNaammee((tt,, ""ssaallaarryy""));; sseenniioorriittyy == ((sshhoorrtt)) GGeettAAttttrriibbuutteeBByyNNaammee((tt,, ""sseenniioorriittyy""));; ppeerrffoorrmmaannccee == ((sshhoorrtt)) GGeettAAttttrriibbuutteeBByyNNaammee((tt,, ""ppeerrffoorrmmaannccee""));; rreettuurrnn ((ssaallaarryy >> ((sseenniioorriittyy ** ppeerrffoorrmmaannccee))));; }} GGeettAAttttrriibbuutteeBByyNNaammee is the POSTGRES system function that returns attributes out of the current instance. It has two arguments: the argument of type TUPLE passed into the function, and the name of the desired attribute. GGeettAAttttrriibbuutteeBByyNNaammee will align data properly so you can cast its return value to the desired type. For exam- ple, if you have an attribute nnaammee which is of the POSTQUEL type cchhaarr1166, the GGeettAAttttrriibbuutteeBByyNNaammee call would look like: cchhaarr **ssttrr;; ____________________ 6In POSTGRES 4.1, TTUUPPLLEE is defined as vvooiidd **. 2200 ...... ssttrr == ((cchhaarr **)) GGeettAAttttrriibbuutteeBByyNNaammee((tt,, ""nnaammee"")) To let POSTGRES know about the oovveerrppaaiidd function, do: ** ddeeffiinnee ffuunnccttiioonn oovveerrppaaiidd ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) aarrgg iiss ((EEMMPP)) aass ""//uussrr//llooccaall//ppoossttggrreess//oovveerrppaaiidd..oo"" \\gg You can have additional complex, base or user-defined types as arguments to the inheritable function. Thus, ** rreettrriieevvee ((EEMMPP..aallll)) wwhheerree oovveerrppaaiidd22((EEMMPP,, DDEEPPTT,, ""bbiillll"",, 88)) \\gg could be written, and oovveerrppaaiidd22 would be declared: bbooooll oovveerrppaaiidd22((eemmpp,, ddeepptt,, nnaammee,, nnuummbbeerr)) TTUUPPLLEE eemmpp,, ddeepptt;; cchhaarr **nnaammee;; lloonngg nnuummbbeerr;; 88..55.. AArrrraayyss ooff ttyyppeess As discussed above, POSTGRES fully supports arrays of base types. Additionally, POSTGRES supports arrays of user-defined types as well. When you define a type, POSTGRES aauuttoommaattiiccaallllyy provides support for arrays of that type. 88..55..11.. AArrrraayyss ooff uusseerr--ddeeffiinneedd ttyyppeess Using the "circle" example discussed above, we will create a class containing an array of circles: ** ccrreeaattee cciirrcclleess ((lliisstt == cciirrccllee[[]])) \\gg and do some appends ** aappppeenndd cciirrcclleess ((lliisstt == ""{{""((11..00,, 11..00,, 55..00))"",, ""((22..00,, 22..00,, 1100..00))""}}"")) \\gg ** aappppeenndd cciirrcclleess ((lliisstt == ""{{""((22..00,, 33..00,, 1155..00))"",, ""((22..00,, 22..00,, 1100..00))""}}"")) \\gg ** aappppeenndd cciirrcclleess ((lliisstt == ""{{""((22..00,, 33..00,, 44..00))""}}"")) \\gg We can now run queries like: ** rreettrriieevvee ((cciirrcclleess..lliisstt[[11]])) \\gg which returns the first element of each lliisstt: 2211 +----------+ |lliisstt | +----------+ |((11,, 11,, 55)) | +----------+ |((22,, 33,, 44)) | +----------+ and ** rreettrriieevvee ((cciirrcclleess..aallll)) wwhheerree cciirrcclleess..lliisstt[[11]] == ""((00..00,, 00..00,, 44..00))"" \\gg which returns: +--------------+ |lliisstt | +--------------+ |{{""((22,, 33,, 44))""}} | +--------------+ Note the {{}}s, indicating that an array has been retrieved, as opposed to a single element. 88..55..22.. DDeeffiinniinngg aa nneeww aarrrraayy ttyyppee An array may be defined as an element of a class, as shown above, or it may be defined as a type in and of itself. This is useful for defining aarrrraayyss ooff aarrrraayyss. The special built-in functions aarrrraayy__iinn and aarrrraayy__oouutt are used by POSTGRES to input and output arrays of any existing type. Here, we define an array of integers: ** ddeeffiinnee ttyyppee iinntt__aarrrraayy ((eelleemmeenntt == iinntt44,, iinntteerrnnaalllleennggtthh == vvaarriiaabbllee,, iinnppuutt == aarrrraayy__iinn,, oouuttppuutt == aarrrraayy__oouutt)) \\gg The eelleemmeenntt parameter indicates that this is an array, and setting iinntteerrnnaalllleennggtthh to vvaarriiaabbllee indicates that the array is a variable-length attribute.7 We can use our type defined above to create an array of integer arrays: ** ddeeffiinnee ttyyppee iinntt__aarrrraayyss ((eelleemmeenntt == iinntt__aarrrraayy,, iinntteerrnnaalllleennggtthh == vvaarriiaabbllee,, iinnppuutt == aarrrraayy__iinn,, oouuttppuutt == aarrrraayy__oouutt)) \\gg ** ccrreeaattee ssttuuffff ((aa == iinntt__aarrrraayyss)) \\gg ** aappppeenndd ssttuuffff ((aa == ""{{{{11,, 22,, 33}} ,, {{44,, 55}},, {{66,, 77,, 88}}}}"")) \\gg ____________________ 7Note that any type using aarrrraayy__iinn and aarrrraayy__oouutt mmuusstt be variable-length. 2222 ** aappppeenndd ssttuuffff ((aa == ""{{{{8888,, 9999,, 33}}}}"")) \\gg ** aappppeenndd ssttuuffff ((aa == ""{{{{55,, 44,, 33}} ,, {{22,, 22}}}}"")) \\gg ** rreettrriieevvee ((ssttuuffff..aa[[11]])) wwhheerree ssttuuffff..aa[[11]][[11]] << ssttuuffff..aa[[11]][[22]] \\gg ** rreettrriieevvee ((ssttuuffff..aa)) wwhheerree ssttuuffff..aa[[33]][[11]] << ssttuuffff..aa[[11]][[22]] \\gg ** rreettrriieevvee ((ss..aallll)) ffrroomm ss iinn ssttuuffff wwhheerree ss..aa[[22]][[22]] == ssttuuffff..aa[[11]][[11]] \\gg We can also define operators for equality, less than, greater than, etc. which operate on our new array type as necessary. 88..55..33.. CCrreeaattiinngg aann aarrrraayy ttyyppee ffrroomm ssccrraattcchh There are many situations in which the above scheme for creating an array type is inappropriate, particularly when it is necessary to define a fixed-length array. In this section, we will create an array of four longs called qquuaarrtteerrllyy, and a variable-length array of longs called ssttaattss.8 The only special things we need to know when writing the input and output functions for qquuaarrtteerrllyy is that POSTGRES will pass a "simple" (i.e. fixed-length) array of lloonnggs to the output function and expect a simple array of lloonnggs in return from the input function. A simple array suitable for qquuaarrtteerrllyy can be declared: lloonngg qquuaarrtteerrllyy[[44]];; For the variable-length array ssttaattss, the situation is a little more complicated. Because POSTGRES will not know in advance how big the array is, POSTGRES will expect the length of the array (in bytes) to be encoded in the first four bytes of the memory which contains the array. The expected structure is: ttyyppeeddeeff ssttrruucctt {{ lloonngg lleennggtthh;; uunnssiiggnneedd cchhaarr bbyytteess[[11]];; //** FFoorrccee ccoonnttiigguuiittyy **// }} VVAARR__LLEENN__AATTTTRR;; The input function for the ssttaattss array will look some- thing like: VVAARR__LLEENN__AATTTTRR ** ____________________ We assume ssiizzeeooff((lloonngg)) == 4. 2233 ssttaattss__iinn((ss)) cchhaarr ss;; {{ VVAARR__LLEENN__AATTTTRR **ssttaattss;; lloonngg aarrrraayy__ssiizzee,, **aarrrraayypp,, nnbbyytteess;; //** ** nnbbyytteess iiss tthhee ttoottaall nnuummbbeerr ooff bbyytteess iinn ssttaattss,, ** IINNCCLLUUDDIINNGG tthhee bbyyttee ccoouunntt aatt tthhee bbeeggiinnnniinngg **// nnbbyytteess == aarrrraayy__ssiizzee ** ssiizzeeooff((lloonngg)) ++ ssiizzeeooff((lloonngg));; ssttaattss == ((VVAARR__LLEENN__AATTTTRR **)) ppaalllloocc((nnbbyytteess));; ssttaattss-->>lleennggtthh == nnbbyytteess;; aarrrraayypp == &&((ssttaattss-->>bbyytteess[[00]]));; //** ** ppuutt ccooddee hheerree tthhaatt llooaaddss iinntteerreessttiinngg ssttuuffff iinnttoo ** aarrrraayypp[[00]] .... aarrrraayypp[[aarrrraayy__ssiizzee]] **// rreettuurrnn((ssttaattss));; }} The output function for ssttaattss will get the same VVAARR__LLEENN__AATTTTRR structure. Now, assuming the functions are in //uussrr//llooccaall//ppoossttggrreess//ssttaattss..cc and //uussrr//llooccaall//ppoossttggrreess//qquuaarrtteerrllyy..cc, we can define our two arrays. First we will define the fixed-size array qquuaarrtteerrllyy.9 ** ddeeffiinnee ffuunnccttiioonn qquuaarrtteerrllyy__iinn ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == qquuaarrtteerrllyy)) aarrgg iiss ((cchhaarr1166)) aass ""//uussrr//llooccaall//ppoossttggrreess//qquuaarrtteerrllyy..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn qquuaarrtteerrllyy__oouutt ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == cchhaarr1166)) aarrgg iiss ((qquuaarrtteerrllyy)) aass ""//uussrr//llooccaall//ppoossttggrreess//qquuaarrtteerrllyy..oo"" \\gg ** ddeeffiinnee ttyyppee qquuaarrtteerrllyy ((eelleemmeenntt == iinntt44,, iinntteerrnnaalllleennggtthh == 1166,, iinnppuutt == qquuaarrtteerrllyy__iinn,, oouuttppuutt == qquuaarrtteerrllyy__oouutt)) \\gg ____________________ iinntteerrnnaalllleennggtthh == 16 follows from our assumption about ssiizzeeooff((lloonngg)). 2244 Now we define the ssttaattss array: ** ddeeffiinnee ffuunnccttiioonn ssttaattss__iinn ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == ssttaattss)) aarrgg iiss ((cchhaarr1166)) aass ""//uussrr//llooccaall//ppoossttggrreess//ssttaattss..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn ssttaattss__oouutt ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == cchhaarr1166)) aarrgg iiss ((ssttaattss)) aass ""//uussrr//llooccaall//ppoossttggrreess//ssttaattss..oo"" \\gg ** ddeeffiinnee ttyyppee ssttaattss ((eelleemmeenntt == iinntt44,, iinntteerrnnaalllleennggtthh == vvaarriiaabbllee,, iinnppuutt == ssttaattss__iinn,, oouuttppuutt == ssttaattss__oouutt)) \\gg Now we can run some queries: ** ccrreeaattee tteesstt ((aa == qquuaarrtteerrllyy,, bb == ssttaattss)) \\gg ** aappppeenndd tteesstt ((aa == ""11 22 33 44""::::qquuaarrtteerrllyy,, bb == ""55 66 77""::::ssttaattss)) \\gg ** aappppeenndd tteesstt ((aa == ""11 33 22 44""::::qquuaarrtteerrllyy,, bb == ""66 44""::::ssttaattss)) \\gg ** aappppeenndd tteesstt ((aa == ""77 1111 66 99""::::qquuaarrtteerrllyy,, bb == ""11 22""::::ssttaattss)) \\gg ** rreettrriieevvee ((tteesstt..aallll)) wwhheerree tteesstt..aa[[44]] == tteesstt..bb[[22]] \\gg which returns: +-----+----+ |aa | bb | +-----+----+ |11332244 | 6644 | +-----+----+ NNOOTTEE that when you use your own functions to input and output array types, yyoouurr ffuunnccttiioonn will define how to parse the external (string) representation. The braces notation is only a convention used by aarrrraayy__iinn and aarrrraayy__oouutt and is nnoott part of the formal POSTQUEL defi- nition. 88..66.. LLaarrggee OObbjjeecctt ttyyppeess The types discussed to this point are all ssmmaallll objects--that is, they are smaller than 8 Kbytes10 in size. If you require a larger type for something like ____________________ 108 * 1,024 == 8,192 bytes 2255 a document retrieval system or for storing bitmaps, you will need to use the POSTGRES llaarrggee oobbjjeecctt interface. The interface to large objects is quite similar to the UNIX file system interface. The particulars are detailed in Section 7 of the POSTGRES Reference Manual, which you should have available to consult as you read the following. 88..66..11.. DDeeffiinniinngg aa llaarrggee oobbjjeecctt Just like any other type, a large object type requires input and output functions. For the purposes of this discussion, we assume that two functions, llaarrggee__iinn and llaarrggee__oouutt have been written using the large object interface, and that the compiled functions are in //uussrr//llooccaall//ppoossttggrreess//llaarrggee..oo. We also presume that we are using the "file as an ADT" interface for large objects discussed in the Reference Manual. We define a large object which could be used for stor- ing map data: ** ddeeffiinnee ffuunnccttiioonn llaarrggee__iinn ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == mmaapp)) aarrgg iiss ((cchhaarr1166)) aass ""//uussrr//llooccaall//ppoossttggrreess//llaarrggee..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn llaarrggee__oouutt ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == cchhaarr1166)) aarrgg iiss ((mmaapp)) aass ""//uussrr//llooccaall//ppoossttggrreess//llaarrggee..oo"" \\gg ** ddeeffiinnee ttyyppee mmaapp ((iinntteerrnnaalllleennggtthh == vvaarriiaabbllee,, iinnppuutt == llaarrggee__iinn,, oouuttppuutt == llaarrggee__oouutt)) \\gg Note that large objects are aallwwaayyss variable-length. Now we can use our mmaapp object: ** ccrreeaattee mmaappss ((nnaammee == tteexxtt,, aa == mmaapp)) \\gg ** aappppeenndd mmaappss ((nnaammee == ""eeaarrtthh"",, aa == ""//uussrr//llooccaall//ppoossttggrreess//eeaarrtthh"")) \\gg ** aappppeenndd mmaappss ((nnaammee == ""mmoooonn"",, aa == ""//uussrr//llooccaall//ppoossttggrreess//mmoooonn"")) \\gg Notice that the above queries are identical in syntax to those we have been using all along to define types and such; the fact that this type is a large object is completely hidden in the large object interface and POSTGRES storage manager. 2266 88..66..22.. WWrriittiinngg ffuunnccttiioonnss aanndd ooppeerraattoorrss ffoorr llaarrggee oobbjjeecctt ttyyppeess Like any other POSTGRES type, you can define functions and operators for large object types. The only caveat is that, like any other functions which process a large object, they mmuusstt use the large object interface described in Section 7 of the POSTGRES Reference Man- ual. Possible queries which involve functions on large objects could include ** rreettrriieevvee ((eemmpp..nnaammee)) wwhheerree bbeeaarrdd((eemmpp..ppiiccttuurree)) == ""rreedd"" \\gg ** rreettrriieevvee ((mmoouunnttaaiinn..nnaammee)) wwhheerree hheeiigghhtt((mmoouunnttaaiinn..ttooppoommaapp)) >> 1100000000 \\gg Because all functionality is available to large objects, aannyy aspect of POSTGRES is available for use with them, including index access methods, if the appropriate operator classes have been defined. Opera- tor classes for index access methods will be discussed later in this manual. 99.. TThhee PPOOSSTTGGRREESS RRuullee SSyysstteemm The discussion in this section is intended to provide an overview of the POSTGRES rule system and point the user at helpful references and examples. POSTGRES actually has two rule systems, the iinnssttaannccee--lleevveell rule system and the qquueerryy rreewwrriittee rule system. 99..11.. TThhee IInnssttaannccee--lleevveell RRuullee SSyysstteemm The instance-level rule system uses markers placed in each instance in a class to "trigger" rules. Examples of the instance-level rule system are explained and illustrated in //uussrr//llooccaall//ppoossttggrreess//ssrrcc//rreeggrreessss//ddeemmoo, which is included with the POSTGRES distribution. Additional discussion of the instance-level rule system can be found in the Reference Manual under ddeeffiinnee rruullee. The theoretical foundations of the POSTGRES rule system can be found in [STON90]. 99..22.. TThhee QQuueerryy RReewwrriittee RRuullee SSyysstteemm The query rewrite rule system modifies queries to take rules into consideration, and then passes the modified query to the query optimizer for execution. It is very powerful, and can be used for many things such as query language procedures, views, and versions. Examples and discussion can be found in the demo in //uussrr//llooccaall//ppoossttggrreess//ssrrcc//rreeggrreessss//vviiddeeoo, and further dis- cussion is in the Reference Manual under ddeeffiinnee rruullee. The power of this rule system is discussed in [ONG90] and [STON90]. 2277 99..33.. WWhheenn ttoo uussee eeiitthheerr?? Since each rule system is architected quite differ- ently, they work best in different situations. The query rewrite system is best when rules affect most of the instances in a class, while the instance-level sys- tem is best when a rule affects only a few instances. 1100.. AAddmmiinniisstteerriinngg PPOOSSTTGGRREESS In this section, we will discuss aspects of POSTGRES of interest to those making extensive use of POSTGRES, or who are the database administrator for a group of POST- GRES users. 1100..11.. UUsseerr aaddmmiinniissttrraattiioonn The ccrreeaatteeuusseerr and ddeessttrrooyyuusseerr enable and disable access to POSTGRES by specific users on the host sys- tem. Please read the descriptions of these commands in the Reference Manual for specifics on their use. 1100..22.. MMoovviinngg ddaattaabbaassee ddiirreeccttoorriieess oouutt ooff //uussrr//llooccaall//ppoossttggrreess//ddaattaa//bbaassee By default, all POSTGRES databases are stored in sepa- rate subdirectories under //uussrr//llooccaall//ppoossttggrreess//ddaattaa//bbaassee//.11 To move a particular data base to an alternate directory (e.g., on a filesystem with more free space), do the following: +o Create the database (if it doesn't already exist) using the ccrreeaatteeddbb command. In the following steps we will assume the database is named ffoooo. +o Copy the directory //uussrr//llooccaall//ppoossttggrreess//ddaattaa//bbaassee//ffoooo and it contents to its ultimate destination. It should still be owned by the ppoossttggrreess user. +o Remove the directory //uussrr//llooccaall//ppoossttggrreess//ddaattaa//bbaassee//ffoooo. +o Make a symbolic link in //uussrr//llooccaall//ppoossttggrreess//ddaattaa//bbaassee to the new direc- tory. 1100..33.. TTrroouubblleesshhoooottiinngg PPOOSSTTGGRREESS Occasionally, POSTGRES will fail with cryptic error messages that are due to relatively simple problems. The following are a list of POSTGRES error messages and the likely fix. These messages are ones you would likely see in the monitor program. MMeessssaaggee:: sseemmggeett:: NNoo ssppaaccee lleefftt oonn ddeevviiccee ____________________ 11Data for certain classes may stored elsewhere if a non- standard storage manager was specified when they were creat- ed. 2288 EExxppllaannaattiioonn aanndd LLiikkeellyy FFiixx:: Either the kernel has not been configured for System V shared memory, or some other program is using it up. On most machines, the UNIX command iippccss will show shared memory and semaphore usage. To delete all shared memory and semaphores (may be nec- essary if a backend fails), run the iippcccclleeaann command. Note, however, that iippcccclleeaann deletes aallll semaphores belonging to the user running it, so the user should be certain that none of his/her non-POSTGRES processes are using semaphores before running this command. MMeessssaaggee:: UUnnaabbllee ttoo ggeett sshhaarreedd bbuuffffeerrss EExxppllaannaattiioonn aanndd LLiikkeellyy FFiixx:: This message means that a POSTGRES backend was expect- ing shared memory to be available and it was not. Usu- ally this is due to iippcccclleeaann being run while a ppoossttmmaass-- tteerr was also running. MMeessssaaggee:: CCaann''tt ccoonnnneecctt ttoo tthhee bbaacckkeenndd ((......)) EExxppllaannaattiioonn aanndd LLiikkeellyy FFiixx:: This message means that you are running a LIBPQ appli- cation but it could not link up with a ppoossttmmaasstteerr. If you see this error message, you should see if a ppoosstt-- mmaasstteerr is truly running. If one is running, the prob- lem is likely related to your network. 1111.. RREEFFEERREENNCCEESS [ONG90] Ong, L. and Goh, J., ``A Unified Framework for Version Modeling Using Production Rules in a Database System," Electronics Research Laboratory, University of California, Berke- ley, ERL Memo M90/33, April 1990. [ROWE87] Rowe, L. and Stonebraker, M., ``The POSTGRES Data Model,'' Proc. 1987 VLDB Conference, Brighton, England, Sept. 1987. [SCHA90] Shapiro, L., ``Join Processing in Database Systems with Large Main Memories,'' ACM- TODS, Sept. 1986. [STON86] (missing) [STON87] Stonebraker, M., ``The POSTGRES Storage Sys- tem,'' Proc. 1987 VLDB Conference, Brighton, England, Sept. 1987. [STON88] (missing) [STON90] Stonebraker, M. et. al., ``On Rules, Proce- dures, Caching and Views in Database Sys- tems,'' Proc. 1990 ACM-SIGMOD Conference on Management of Data, Atlantic City, N.J., 2299 June 1990. [STON90B] (missing) [WANG88] (missing) AAPPPPEENNDDIIXX:: UUsseerr ddeeffiinneedd ttyyppeess aanndd iinnddiicceess In this section, we will discuss how to extend POSTGRES to use a user-defined type and associated functions with existing access methods. This way, you can define a BTREE or RTREE index on your own type. To do this, we will discuss how to define a new operator class in POSTGRES for use with an existing access method. Our example will be to add a new operator class to the BTREE access method. The new operator class will sort integers in ascending absolute value order. This tuto- rial will describe how to define the operator class. If you work the example, you will be able to define and use indices that sort integer keys by absolute value. There are several POSTGRES system classes that are important in understanding how the access methods work. These will be discussed, and then a sample procedure for adding a new set of operators to an existing access method will be shown as an example. The ppgg__aamm class contains one instance for every user defined access method. Support for the HEAP access method is built into POSTGRES, but every other access method is described here. The schema is | aammnnaammee | name of the access method -------------+--------------------------------------------------------------- aammoowwnneerr | object id of the owner's instance in pg_user -------------+--------------------------------------------------------------- aammkkiinndd | not used at present, but set to 'o' as a place holder -------------+--------------------------------------------------------------- aammssttrraatteeggiieess | number of strategies for this access method (see below) -------------+--------------------------------------------------------------- aammssuuppppoorrtt | number of support routines for this access method (see below) -------------+--------------------------------------------------------------- aamm** | procedure identifiers for interface routines to the access | method. For example, rreeggpprroocc ids for opening, closing, and | getting instances from the access method appear here. The object ID of the instance in ppgg__aamm is used as a foreign key in lots of other classes. For BTREES, this object ID is 403. You don't need to add a new instance to this class; all you're interested in is the object ID of the access method instance you want to extend: ** rreettrriieevvee ((ppgg__aamm..ooiidd)) wwhheerree ppgg__aamm..aammnnaammee == ""bbttrreeee"" \\gg 3300 +----+ |ooiidd | +----+ |440033 | +----+ The aammssttrraatteeggiieess attribute exists to standardize com- parisons across data types. For example, BTREES impose a strict ordering on keys, less to greater. Since POSTGRES allows the user to define operators, POSTGRES cannot in general look at the name of an operator (eg, >>, <<) and tell what kind of comparison it is. In fact, some access methods (like rtrees) don't impose a less- to-greater ordering, but some other ordering, like con- tainment. POSTGRES needs some consistent way of taking a scan qualification, looking at the operator, deciding if a usable index exists, and rewriting the query qual- ification in order to improve access speeds. This implies that POSTGRES needs to know, for example, that <<== and >> partition a BTREE. Strategies is the way that we do this. Defining a new set of strategies is beyond the scope of this discussion, but how the BTREE strategies work will be explained, since you'll need to know that to add a new operator class. In the ppgg__aamm class, the aammssttrraattee-- ggiieess attribute is the number of strategies defined for this access method. For BTREES, this number is 5. These strategies correspond to | less than | 1 ----------------------+--- less than or equal | 2 ----------------------+--- equal | 3 ----------------------+--- greater than or equal | 4 ----------------------+--- greater than | 5 The idea is that you'll add procedures corresponding to the comparisons above to the ppgg__aammoopp relation (see below). The access method code can use these numbers, regardless of data type, to figure out how to partition the BTREE, compute selectivity, and so on. Don't worry about the details of adding procedures yet; just under- stand that there's a set of these for iinntt22, iinntt44, ooiidd, and every other data type on which a BTREE can operate. Strategies are used by all of the POSTGRES access meth- ods. Some access methods require other support rou- tines in order to work. For example, the BTREE access method must be able to compare two keys and determine whether one is greater than, equal to, or less than the 3311 other. Similarly, the RTREE access method must be able to compute intersections, unions, and sizes of rectan- gles. These operations do not correspond to user qual- ifications in POSTQUEL queries; they are administrative routines used by the access methods, internally. In order to manage diverse support routines consis- tently across all POSTGRES access methods, ppgg__aamm includes a field called aammssuuppppoorrtt. This field records the number of support routines used by an access method. For BTREES, this number is one--the routine to take two keys and return -1, 0, or +1, depending on whether the first key is less than, equal to, or greater than the second. The aammssttrraatteeggiieess entry in ppgg__aamm is just the number of strategies defined for the access method in question. The procedures for less than, less equal, and so on don't appear in ppgg__aamm. Similarly, aammssuuppppoorrtt is just the number of support routines required by the access method. The actual routines are listed elsewhere. The next class of interest is ppgg__ooppccllaassss.. This class exists only to associate a name with an ooiidd. In ppgg__aammoopp, every BTREE operator class has a set of proce- dures, one through five, above. Some existing opclasses are iinntt22__ooppss, andiinntt44__ooppss, ooiidd__ooppss. You need to add an instance with your opclass name (for example, iinntt44__aabbss__ooppss) to ppgg__ooppccllaassss. The ooiidd of this instance is a foreign key in other classes. ** aappppeenndd ppgg__ooppccllaassss ((ooppccnnaammee == ""iinntt44__aabbss__ooppss"")) \\gg ** rreettrriieevvee ((ccll..ooiidd,, ccll..ooppccnnaammee)) ffrroomm ccll iinn ppgg__ooppccllaassss wwhheerree ccll..ooppccnnaammee == ""iinntt44__aabbss__ooppss"" \\gg +------+--------------+ |ooiidd | ooppccnnaammee | +------+--------------+ |1177331144 | iinntt44__aabbss__ooppss | +------+--------------+ NNOOTTEE:: The ooiidd for your ppgg__ooppccllaassss instance mmaayy bbee ddiiff-- ffeerreenntt! You should substitute your value for 17314 wherever it appears in this discussion. So now we have an access method and an operator class. We still need a set of operators; the procedure for defining operators was discussed earlier in this man- ual. For the iinntt44__aabbss__ooppss operator class on BTREES, the operators we require are: absolute value less-than absolute value less-than-or-equal absolute value equal absolute value greater-than-or-equal 3322 absolute value greater-than Suppose the code that implements the functions defined is stored in the file //uussrr//llooccaall//ppoossttggrreess//iinntt44__aabbss..cc. The code is //** ** iinntt44__aabbss..cc ---- aabbssoolluuttee vvaalluuee ccoommppaarriissoonn ffuunnccttiioonnss ** ffoorr iinntt44 ddaattaa **// ##iinncclluuddee ""ttmmpp//cc..hh"" ##ddeeffiinnee AABBSS((aa)) aa == ((((aa << 00)) ?? --aa :: aa)) bbooooll iinntt44__aabbss__lltt((aa,, bb)) iinntt3322 aa,, bb;; {{ AABBSS((aa));; AABBSS((bb));; rreettuurrnn ((aa << bb));; }} bbooooll iinntt44__aabbss__llee((aa,, bb)) iinntt3322 aa,, bb;; {{ AABBSS((aa));; AABBSS((bb));; rreettuurrnn ((aa <<== bb));; }} bbooooll iinntt44__aabbss__eeqq((aa,, bb)) iinntt3322 aa,, bb;; {{ AABBSS((aa));; AABBSS((bb));; rreettuurrnn ((aa ==== bb));; }} bbooooll iinntt44__aabbss__ggee((aa,, bb)) iinntt3322 aa,, bb;; {{ AABBSS((aa));; AABBSS((bb));; rreettuurrnn ((aa >>== bb));; }} bbooooll iinntt44__aabbss__ggtt((aa,, bb)) iinntt3322 aa,, bb;; {{ AABBSS((aa));; AABBSS((bb));; rreettuurrnn ((aa >> bb));; }} There are a couple of important things that are happen- ing below. First, note that operators for less, less equal, equal, greater equal, and greater for iinntt44 are being defined. All of these operators are already defined for iinntt44 under the names <<, <<==, ==, >>==, and >>. The new operators behave differently, of course. In order to guarantee that POSTGRES uses these new opera- tors rather than the old ones, they need to be named differently from the old ones. This is a key point: you can overload operators in POSTGRES, but only if the operator isn't already defined for the argument types. That is, if you have << defined for (iinntt44, iinntt44), you can't define it again. POSTGRES ddooeessnn''tt check this when you define your operator, so be careful. To avoid this problem, odd names will be used for the operators. If you get this wrong, the access methods are likely to crash when you try to do scans. The other important point is that all the functions return bboooolleeaann values; the access methods rely on this fact. ** ddeeffiinnee ffuunnccttiioonn iinntt44__aabbss__lltt ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) 3333 aarrgg iiss ((iinntt44,, iinntt44)) aass ""//uussrr//llooccaall//ppoossttggrreess//iinntt44__aabbss..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn iinntt44__aabbss__llee ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) aarrgg iiss ((iinntt44,, iinntt44)) aass ""//uussrr//llooccaall//ppoossttggrreess//iinntt44__aabbss..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn iinntt44__aabbss__eeqq ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) aarrgg iiss ((iinntt44,, iinntt44)) aass ""//uussrr//llooccaall//ppoossttggrreess//iinntt44__aabbss..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn iinntt44__aabbss__ggee ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) aarrgg iiss ((iinntt44,, iinntt44)) aass ""//uussrr//llooccaall//ppoossttggrreess//iinntt44__aabbss..oo"" \\gg ** ddeeffiinnee ffuunnccttiioonn iinntt44__aabbss__ggtt ((llaanngguuaaggee == ""cc"",, rreettuurrnnttyyppee == bbooooll)) aarrgg iiss ((iinntt44,, iinntt44)) aass ""//uussrr//llooccaall//ppoossttggrreess//iinntt44__aabbss..oo"" \\gg Now define the operators that use them. As noted, the operator names must be unique for two iinntt44 operands. You can do a query on ppgg__ooppeerraattoorr: ** rreettrriieevvee ((ppgg__ooppeerraattoorr..aallll)) \\gg to see if your name is taken for the types you want. The important things here are the procedure (which are the C functions defined above) and the restriction and join selectivity functions. You should just use the ones used below--note that there are different such functions for the less-than, equal, and greater-than cases. These mmuusstt be supplied, or the access method will die when it tries to use the operator. You should copy the names for rreessttrriicctt and jjooiinn, but use the pro- cedure names you defined in the last step. ** ddeeffiinnee ooppeerraattoorr <<<<&& ((aarrgg11 == iinntt44,, aarrgg22 == iinntt44,, pprroocceedduurree==iinntt44__aabbss__lltt,, aassssoocciiaattiivviittyy == lleefftt,, rreessttrriicctt == iinnttllttsseell,, jjooiinn == iinnttllttjjooiinnsseell)) \\gg ** ddeeffiinnee ooppeerraattoorr <<==&& ((aarrgg11 == iinntt44,, aarrgg22 == iinntt44,, pprroocceedduurree == iinntt44__aabbss__llee,, aassssoocciiaattiivviittyy == lleefftt,, rreessttrriicctt == iinnttllttsseell,, jjooiinn == iinnttllttjjooiinnsseell)) \\gg ** ddeeffiinnee ooppeerraattoorr ====&& ((aarrgg11 == iinntt44,, aarrgg22 == iinntt44,, pprroocceedduurree == iinntt44__aabbss__eeqq,, aassssoocciiaattiivviittyy == lleefftt,, rreessttrriicctt == eeqqsseell,, 3344 jjooiinn == eeqqjjooiinnsseell)) \\gg ** ddeeffiinnee ooppeerraattoorr >>==&& ((aarrgg11 == iinntt44,, aarrgg22 == iinntt44,, pprroocceedduurree == iinntt44__aabbss__ggee,, aassssoocciiaattiivviittyy == lleefftt,, rreessttrriicctt == iinnttggttsseell,, jjooiinn == iinnttggttjjooiinnsseell)) \\gg ** ddeeffiinnee ooppeerraattoorr >>>>&& ((aarrgg11 == iinntt44,, aarrgg22 == iinntt44,, pprroocceedduurree == iinntt44__aabbss__ggtt,, aassssoocciiaattiivviittyy == lleefftt,, rreessttrriicctt == iinnttggttsseell,, jjooiinn == iinnttggttjjooiinnsseell)) \\gg Notice that five operators corresponding to less, less equal, equal, greater, and greater equal are defined. We're just about finished. the last thing we need to do is to update the ppgg__aammoopp relation. To do this, we need the following attributes: | aammooppiidd | the ooiidd of the ppgg__aamm | instance for BTREE | (== 400, see above) ------------+---------------------- aammooppccllaaiidd | the ooiidd of the | ppgg__ooppccllaassss instance | for iinntt44__aabbss__ooppss (== | whatever you got in- | stead of 17314, see | above) ------------+---------------------- aammooppoopprr | the ooiidds of the op- | erators for the op- | class (which we'll | get in just a | minute) ------------+---------------------- aammooppsseelleecctt,, | cost functions. aammooppnnppaaggeess | The cost functions are used by the query optimizer to decide whether or not to use a given index in a scan. Fortunately, these already exist. The two functions we'll use are bbttrreeeesseell,, which estimates the selectivity of the btree, and bbttrreeeennppaaggee,, which estimates the num- ber of pages a search will touch in the tree. So we need the ooiidds of the operators we just defined. We'll look up the names of all the operators that take two iinntt44s, and pick ours out: ** rreettrriieevvee ((oo..ooiidd,, oo..oopprrnnaammee)) ffrroomm oo iinn ppgg__ooppeerraattoorr,, tt iinn ppgg__ttyyppee wwhheerree oo..oopprrlleefftt == tt..ooiidd aanndd oo..oopprrrriigghhtt == tt..ooiidd aanndd tt..ttyyppnnaammee == ""iinntt44"" \\gg 3355 which returns: +------+---------+ |ooiidd | oopprrnnaammee | +------+---------+ |9966 | == | +------+---------+ |9977 | << | +------+---------+ |551144 | ** | +------+---------+ |551188 | !!== | +------+---------+ |552211 | >> | +------+---------+ |552233 | <<== | +------+---------+ |552255 | >>== | +------+---------+ |552288 | // | +------+---------+ |553300 | %% | +------+---------+ |555511 | ++ | +------+---------+ |555555 | -- | +------+---------+ |1177332211 | <<<<&& | +------+---------+ |1177332222 | <<==&& | +------+---------+ |1177332233 | ====** | +------+---------+ |1177332244 | >>==&& | +------+---------+ |1177332255 | >>>>&& | +------+---------+ (Note that your ooiidd numbers may be different.) The operators we are interested in are those with ooiidds 17321 through 17325. The values you get will probably be different, and you should substitute them for the values below. We can look at the operator names and pick out the ones we just added. (Of course, there are lots of other queries we could used to get the oids we wanted.) Now we're ready to update ppgg__aammoopp with our new operator class. The most important thing in this entire discus- sion is that the operators are ordered, from less equal through greater equal, in ppgg__aammoopp. Recall that the BTREE instance's ooiidd is 400 and iinntt44__aabbss__ooppss is ooiidd 17314. Then we add the instances we need: 3366 ** aappppeenndd ppgg__aammoopp ((aammooppiidd == ""440000""::::ooiidd,, //** bbttrreeee ooiidd **// aammooppccllaaiidd == ""1177331144""::::ooiidd,, //** ppgg__ooppccllaassss ttuuppllee **// aammooppoopprr == ""1177332211""::::ooiidd,, //** <<<<&& ttuupp ooiidd **// aammooppssttrraatteeggyy == ""11""::::iinntt22,, //** 11 iiss <<<<&& **// aammooppsseelleecctt == ""bbttrreeeesseell""::::rreeggpprroocc,, aammooppnnppaaggeess == ""bbttrreeeennppaaggee""::::rreeggpprroocc)) \\gg ** aappppeenndd ppgg__aammoopp ((aammooppiidd == ""440000""::::ooiidd,, aammooppccllaaiidd == ""1177331144""::::ooiidd,, aammooppoopprr == ""1177332222""::::ooiidd,, aammooppssttrraatteeggyy == ""22""::::iinntt22,, aammooppsseelleecctt == ""bbttrreeeesseell""::::rreeggpprroocc,, aammooppnnppaaggeess == ""bbttrreeeennppaaggee""::::rreeggpprroocc)) \\gg ** aappppeenndd ppgg__aammoopp ((aammooppiidd == ""440000""::::ooiidd,, aammooppccllaaiidd == ""1177331144""::::ooiidd,, aammooppoopprr == ""1177332233""::::ooiidd,, aammooppssttrraatteeggyy == ""33""::::iinntt22,, aammooppsseelleecctt == ""bbttrreeeesseell""::::rreeggpprroocc,, aammooppnnppaaggeess == ""bbttrreeeennppaaggee""::::rreeggpprroocc)) \\gg ** aappppeenndd ppgg__aammoopp ((aammooppiidd == ""440000""::::ooiidd,, aammooppccllaaiidd == ""1177331144""::::ooiidd,, aammooppoopprr == ""1177332244""::::ooiidd,, aammooppssttrraatteeggyy == ""44""::::iinntt22,, aammooppsseelleecctt == ""bbttrreeeesseell""::::rreeggpprroocc,, aammooppnnppaaggeess == ""bbttrreeeennppaaggee""::::rreeggpprroocc)) \\gg ** aappppeenndd ppgg__aammoopp ((aammooppiidd == ""440000""::::ooiidd,, aammooppccllaaiidd == ""1177331144""::::ooiidd,, aammooppoopprr == ""1177332255""::::ooiidd,, aammooppssttrraatteeggyy == ""55""::::iinntt22,, aammooppsseelleecctt == ""bbttrreeeesseell""::::rreeggpprroocc,, aammooppnnppaaggeess == ""bbttrreeeennppaaggee""::::rreeggpprroocc)) \\gg NOTE the order: "less" is 1, "less equal" is 2, "equal" is 3, "greater equal" is 4, and "greater" is 5. Okay, now it's time to test the new opclass. First we'll create and populate a class: ** ccrreeaattee ppaaiirrss ((nnaammee == cchhaarr1166,, nnuummbbeerr == iinntt44)) \\gg ** aappppeenndd ppaaiirrss ((nnaammee == ""mmiikkee"",, nnuummbbeerr == --1100000000)) \\gg ** aappppeenndd ppaaiirrss ((nnaammee == ""ggrreegg"",, nnuummbbeerr == 33000000)) \\gg ** aappppeenndd ppaaiirrss ((nnaammee == ""llaayy ppeenngg"",, nnuummbbeerr == 55000000)) \\gg ** aappppeenndd ppaaiirrss ((nnaammee == ""jjeeffff"",, nnuummbbeerr == --22000000)) \\gg ** aappppeenndd ppaaiirrss ((nnaammee == ""mmaaoo"",, nnuummbbeerr == 77000000)) \\gg 3377 ** aappppeenndd ppaaiirrss ((nnaammee == ""cciimmaarrrroonn"",, nnuummbbeerr == --33000000)) \\gg ** rreettrriieevvee ((ppaaiirrss..aallll)) \\gg +---------+--------+ |nnaammee | nnuummbbeerr | +---------+--------+ |mmiikkee | --1100000000 | +---------+--------+ |ggrreegg | 33000000 | +---------+--------+ |llaayy ppeenngg | 55000000 | +---------+--------+ |jjeeffff | --22000000 | +---------+--------+ |mmaaoo | 77000000 | +---------+--------+ |cciimmaarrrroonn | --33000000 | +---------+--------+ Okay, looks pretty random. Define an index using the new opclass: ** ddeeffiinnee iinnddeexx ppaaiirrssiinndd oonn ppaaiirrss uussiinngg bbttrreeee ((nnuummbbeerr iinntt44__aabbss__ooppss)) \\gg Now run a query that doesn't use one of our new opera- tors. What we're trying to do here is to run a query that wwoonn''tt use our index, so that we can tell the dif- ference when we see a query that ddooeess use the index. This query won't use the index because the operator we use in the qualification isn't one that appears in the list of strategies for our index. ** rreettrriieevvee ((ppaaiirrss..aallll)) wwhheerree ppaaiirrss..nnuummbbeerr << 99000000 \\gg +---------+--------+ |nnaammee | nnuummbbeerr | +---------+--------+ |mmiikkee | --1100000000 | +---------+--------+ |ggrreegg | 33000000 | +---------+--------+ |llaayy ppeenngg | 55000000 | +---------+--------+ |jjeeffff | --22000000 | +---------+--------+ |mmaaoo | 77000000 | +---------+--------+ |cciimmaarrrroonn | --33000000 | +---------+--------+ 3388 Yup, just as random; that didn't use the index. Okay, let's run a query that ddooeess use the index: ** rreettrriieevvee ((ppaaiirrss..aallll)) wwhheerree ppaaiirrss..nnuummbbeerr <<<<&& 99000000 \\gg +---------+--------+ |nnaammee | nnuummbbeerr | +---------+--------+ |jjeeffff | --22000000 | +---------+--------+ |cciimmaarrrroonn | --33000000 | +---------+--------+ |ggrreegg | 33000000 | +---------+--------+ |llaayy ppeenngg | 55000000 | +---------+--------+ |mmaaoo | 77000000 | +---------+--------+ Note that the nnuummbbeerr values are in order of increasing absolute value (as they should be, since the index was used for this scan) and that we got the right answer-- the instance for mmiikkee doesn't appear, because -10000 >=& 9000. 3399