.\".he '\*(dA'DRAFT'\*(tI' \" comment out in production version .\"======================================================== .\"------------------------------------ .\" /home1/frew/s2k/postgres/manual/RCS/manual.me,v 1.10 1992/07/07 17:44:44 frew Exp .\"------------------------------------ .\" XXX standard disclaimer belongs here.... .\"--------------------------------------------------------------------------- .de cW \" arg3arg1arg2, constant-width arg1 \&\\$3\\fC\\$1\\fP\\$2 .\"\&\\$3\\fC\\s-1\\$1\\s0\\fP\\$2 .. .\"------------------------------------ .de xP \" ip for references .ip \\$1 \\w'[STON90B]'u+2n .. .\"------------------------------------ .de (P \" prologue for constant-width block .ft C .ps -1 .vs -1 .. .\"----------------- .de )P \" epilogue for constant-width block .vs +1 .ps +1 .ft P .. .\"------------------------------------ .de (T \" prologue for constant-width table .(P .in +\\n(biu .. .\"----------------- .de )T \" epilogue for constant-width table .in -\\n(biu .)P .. .\"------------------------------------ .de (C \" begin constant-width list .(l .(P .. .\"----------------- .de )C \" end constant-width list .)P .)l .. .\"--------------------------------------------------------------------------- . ds II \s-1INGRES\s0 . ds PP \s-1POSTGRES\s0 . ds UU \s-1UNIX\s0 . ds PQ \s-1POSTQUEL\s0 . ds LP \s-1LIBPQ\s0 . ds PV \s-14.0\s0 . ds OF \s-1PICASSO\s0 .\"------------------------------------ .ps 11 .vs 13 .\"----------------- .nr pp \n(.s .nr sp \n(.s+1 \" +1-pt section headers .\"----------------- .nr bs 1v .nr ps 0.3v .nr ss 1v .\"----------------- .fo ''\\s+2%\\s0'' \" +2-pt page numbers in center footers .\"----------------- .nr $i .5i \" main text indented .nr so -\n($i \" section headers un-indented .\"------------------------------------ .rm xX \" scratch register .\"------------------------------------ .\"!Gexpand .\"--------------------------------------------------------------------------- .(l C .b \s+3The \*(PP User Manual\s0 .sp 2 .i Edited by Jon Rhein, Greg Kemnitz and The \*(PP Group EECS Dept. University of California, Berkeley .r .)l .sp 3 .\"--------------------------------------------------------------------------- .sh 1 "OVERVIEW" .lp This document is the user manual for the \*(PP database system under development at the University of California, Berkeley. This project, led by Professor Michael Stonebraker, is sponsored by the Defense Advanced Research Projects Agency (DARPA), the Army Research Office (ARO), the National Science Foundation (NSF), and ESL, Inc. .\"------------------------------------ .sh 2 "DISTRIBUTION" .lp This manual describes Version \*(PV of \*(PP. The \*(PP Group has compiled and tested Version \*(PV on the following platforms: .TS center tab(|); c | c l | l . architecture|operating system = DECstation (MIPS)|ULTRIX V4.2 SPARC|SunOS 4.1.2 Sequent Symmetry (386)|DYNIX V3.0 .TE .\"------------------------------------ .sh 2 "PERFORMANCE" .lp Version \*(PV has been tuned modestly. On the Wisconsin benchmark, one should expect performance about twice that of the public domain, University of California version of \*(II, a relational prototype from the late 1970's. .\"------------------------------------ .sh 2 "ACKNOWLEDGEMENTS" .lp \*(PP has been constructed by a team of undergraduate, graduate, and staff programmers. The contributors (in alphabetical order) consisted of James Bell, Jennifer Caetta, Jolly Chen, Ron Choi, Jeffrey Goh, Joey Hellerstein, Wei Hong, Anant Jhingran, Greg Kemnitz, Case Larsen, Jeff Meredith, Michael Olson, Lay-Peng Ong, Spyros Potamianos, Sunita Sarawagi and Cimarron Taylor. .lp For version \*(PV Jeff Meredith served as chief programmer and was responsible for overall coordination of the project and for individually implementing the .q "everything else" portion of the system. .lp The above implementation team contributed significantly to this manual, as did Claire Mosher, Chandra Ghosh, and Jim Frew. .\"--------------------------------------------------------------------------- .sh 1 "INTRODUCTION" .lp Traditional relational DBMSs support a data model consisting of a collection of named relations, each attribute of which has a specific type. In current commercial systems, possible types are floating point numbers, integers, character strings, money, and dates. It is commonly recognized that this model is inadequate for future data processing applications. .lp The relational model succeeded in replacing previous models in part because of its simplicity. The \*(PP data model offers substantial additional power by incorporating the following four additional basic constructs: .(l classes inheritance types functions .)l The \*(PP DBMS has been under construction since 1986. The initial concepts for the system were presented in [STON86] and the initial data model appeared in [ROWE87]. The first rule system that was implemented is discussed in [STON88] and the storage manager concepts are detailed in [STON87]. The first .q demo-ware was operational in 1987, and we released Version 1 of \*(PP to a few external users in June 1989. A critique of version 1 of \*(PP appears in [STON90]. Version 2 followed in June 1990, and it included a new rule system documented in [STON90B]. Version \*(PV, the current version of \*(PP, is about 200,000 lines of code in the C programming language. \*(PP is available free of charge, and is being used by approximately 200 sites around the world at this writing. .\"--------------------------------------------------------------------------- .sh 1 "ORGANIZATION" .lp This manual discusses the \*(PQ query language, including extensions such as user-defined types, operators, and both query language and programming language functions. Arrays of types and functions of an instance are discussed, as well as the \*(PP rule system. This manual concludes with a discussion on adding an operator class to \*(PP for use in access methods. .lp This manual describes the major concepts of the system and attempts to provide an accessible path into using the system. As such, it tries to give examples of the use of the major constructs, so a beginning user does not need to delve immediately into the Reference Manual. .\"--------------------------------------------------------------------------- .sh 1 "WHAT YOU SHOULD READ" .lp This manual is primarily intended to provide a broad overview of the system, as well as to illustrate how programmers would use functions to interact with the \*(PP .q backend. The \*(PP Reference Manual discusses additional aspects of the system, and provides full syntactic descriptions of every \*(PP and \*(PQ command in a format similar to that used in \*(UU .q "man pages." .lp If you are new to \*(PP, you should probably read this manual first, followed by the parts of the \*(PP Reference Manual necessary to build your application. In particular, you should read the section on \*(LP if you intend to build a client application around \*(PP, as this is not discussed at all in this manual. .\"--------------------------------------------------------------------------- .sh 1 "The \*(PQ Query Language" .lp \*(PQ is the query language used for interacting with \*(PP. Here, we give an overview of how to use \*(PQ to access data. In other sections, user extensions to \*(PQ will be discussed. .\"------------------------------------ .sh 2 "Creating a database" .lp Once \*(PP has been installed at your site by following the directions in the release notes, you can create a database named .cW foo using the following command: .(C % createdb foo .)C \*(PP allows you to create any number of databases at a given site and you automatically become the database administrator of the database just created. Database names must have an alphabetic first character and are limited to 16 characters in length. .lp Once you have constructed a database, there are four ways to interact with it: .ip \(bu You can run the \*(PP terminal monitor which allows you to interactively enter, edit, and execute commands in the query language \*(PQ. .ip \(bu You can interact with \*(PP from a C program by using the \*(LP library of subroutine and call facilities. This allows you to submit \*(PQ commands from C and get answers and status messages back to your program. This interface is discussed further in the \*(LP section of the Reference Manual. .ip \(bu You can use the .b "fast path" facility, which allows you to directly execute functions stored in the database. This facility is described in the Reference Manual under .q "Fast Path." .ip \(bu \*(PP is accessible from the \*(OF programming environment. \*(OF is a graphical user interface (GUI) toolkit that allows a user to build sophisticated DBMS-oriented applications. \*(OF is a separate research project described in a collection of reports [WANG88, SCHA90] and is not treated further in this manual. .lp The terminal monitor can be activated for the .cW foo database by typing the command\**: .(f \**You may first need to set the .cW POSTGRESHOME environment variable to the name of the \*(PP root directory at your site, if it is not the default .cW /usr/postgres . If the \*(PP you wish to access is on a remote host, then you will also need to set the .cW PGHOST environment variable to the name of the remote host. .)f .(C % monitor foo .)C (the .cW % '' `` is your \*(UU shell prompt.) You will be greeted by the following message: .(C Welcome to the C POSTGRES terminal monitor Go * .)C The .cW Go indicates the terminal monitor is listening to you and that you can type \*(PQ commands into a workspace maintained by the monitor. The monitor indicates it is listening by typing .cW * as a prompt. Printing the workspace can be performed by typing: .(C * \\p .)C and it can be passed to \*(PP for execution by typing: .(C * \\g .)C If you make a typing mistake, you can invoke the .cW vi text editor by typing: .(C * \\e .)C The workspace will be passed to the editor, and you have the full power of .cW vi to make any necessary changes. For more info on using .cW vi , type .(C % man vi .)C Once you exit .cW vi , your edited query will be in the monitor's query buffer and you can submit it to \*(PP by using the .cW \eg command described above. .lp To get out of the monitor and return to \*(UU, type .(C * \\q .)C and the monitor will respond: .(C I live to serve you. % .)C For a complete collection of monitor commands, see the manual page on .cW monitor in the \*(UU section of the Reference Manual. .lp If you are the database administrator for the database .cW foo , you can destroy it using the following \*(UU command: .(C % destroydb foo .)C Other DBA commands include .cW createuser and .cW destroyuser , which are discussed further in the \*(UU section of the Reference Manual. .\"------------------------------------ .sh 2 "Classes and the Query Language \*(PQ" .\"----------------- .sh 3 "Basic Capabilities" .lp The fundamental notion in \*(PP is that of a .b class, which is a named collection of instances of objects. Each instance has the same collection of named attributes, and each attribute is of a specific type. Furthermore, each instance has an installation-wide unique (never-changing) .b "object identifier" or .b oid . .\"----------------- .sh 3 "Creating a New Class" .lp (In order to try out the following \*(PQ examples, create the .cW foo database as described in the previous section, and start the terminal monitor.) .lp A user can create a new class by specifying the class name, along with all attribute names and their types: .(C * create EMP (name = text, salary = int4, age = int4, dept = char16) \\g * create DEPT (dname = char16, floor = int4, manager = text) \\g .)C The \*(PQ base types used above are a variable-length array of printable characters .cW text ), ( a 4-byte signed integer .cW int4 ), ( and a fixed-length array of 16 characters .cW char16 .)\** ( .(f \**See .q "Built-In Types" in the Reference Manual. .)f Spaces, tabs and newlines may be used freely in \*(PQ queries. .lp So far, the .cW create command looks exactly like the create statement in a traditional relational system. However, we will presently see that classes have properties that are extensions of the relational model, so we use a different word to describe them. .\"----------------- .sh 3 "Populating a Class with Instances" .ds xX \n($1.\n($2.\n($3 .lp To populate a class with instances, one can use the .cW append command: .(C * append EMP (name = "Joe", salary = 1400, age = 40, dept = "shoe") \\g * append EMP (name = "Sam", salary = 1200, age = 29, dept = "toy") \\g * append EMP (name = "Bill", salary = 1600, age = 36, dept = "candy") \\g .)C This will add 3 instances to .cW EMP , one for each .cW append command. .\"----------------- .sh 3 "Querying a Class" .lp The .cW EMP class can be queried with normal selection and projection queries. For example, to find the employees under 35 years of age, one would type: .(C * retrieve (EMP.name) where EMP.age < 35 \\g .)C and the output would be: .(T .TS allbox; l. name Sam .TE .)T Notice that parentheses are required around the .b "target list" of returned attributes (e.g., .cW EMP.name .) .lp \*(PQ allows you to return computations in the target list as long as they are given a name (e.g., .cW result ): .(C * retrieve (result = EMP.salary / EMP.age) where EMP.name = "Bill" \\g .)C .\"----------------- .sh 3 "Redirecting retrieve queries" .lp Any retrieve query can be redirected to a new class in the database, and arbitrary boolean operators .cW and , ( .cW or , .cW not ) are allowed in the qualification of any query: .(C * retrieve into temp (EMP.name) where EMP.age < 35 and EMP.salary > 1000 \\g .)C .\"----------------- .sh 3 "Joins" .lp To find the names of employees which are the same age, one could write: .(C * retrieve (E1.name, E2.name) from E1 in EMP, E2 in EMP where E1.age = E2.age and E1.name != E2.name \\g .)C In this case both E1 and E2 are .b surrogates for an instance of the class .cW EMP , and both range over all instances of the class. A \*(PQ query can contain an arbitrary number of class names and surrogates.\** .(f \**The semantics of such a join are that the qualification is a truth expression defined for the Cartesian product of the classes indicated in the query. For those instances in the Cartesian product for which the qualification is true, \*(PP must compute and return the target list. .)f .\"----------------- .sh 3 "Updates" .lp Updates are accomplished in \*(PQ using the .cW replace command: .(C * replace EMP (salary = E.salary) from E in EMP where EMP.name = "Joe" and E.name = "Sam" \\g .)C This command replaces the salary of Joe by that of Sam. .\"----------------- .sh 3 "Deletions" .lp Deletions are done using the .cW delete command: .(C * delete EMP where EMP.salary > 0 \\g .)C Since all employees have positive salaries, this command will leave the .cW EMP class empty. .\"----------------- .sh 3 "Arrays" .lp \*(PP supports both fixed-length and variable-length one-dimensional arrays. To illustrate their use, we first create a class with an array type. .(C * create SAL_EMP (name = char[], pay_by_quarter = int4[4]) \\g .)C The above query will create a class named .cW SAL_EMP with a variable-length array of .cW text strings .cW name ), ( and an array of 4 .cW int4 integers .cW pay_by_quarter ), ( which represents the employee's salary by quarter. Now we do some .cW append s; note that when appending to a non-character array, we enclose the values within braces and separate them by commas. .(C * append SAL_EMP (name = "bill", pay_by_quarter = "{10000, 10000, 10000, 10000}") \\g * append SAL_EMP (name = "jack", pay_by_quarter = "{10000, 15000, 15000, 15000}") \\g * append SAL_EMP (name = "joe", pay_by_quarter = "{20000, 25000, 25000, 25000}") \\g .)C \*(PP uses the FORTRAN numbering convention for arrays\(emthat is, \*(PP arrays start with array[1] and end with array[n]. .lp Now, we can run some queries on .cW SAL_EMP . This query retrieves the names of the employees whose pay changed in the second quarter: .(C * retrieve (SAL_EMP.name) where SAL_EMP.pay_by_quarter[1] != SAL_EMP.pay_by_quarter[2] \\g .)C This query retrieves the third quarter pay of all employees: .(C * retrieve (SAL_EMP.pay_by_quarter[3]) \\g .)C This query deletes everyone from .cW SAL_EMP whose name begins with the letter .q j. .cW SAL_EMP should now contain only the employee named .q bill : .(C * delete SAL_EMP where SAL_EMP.name[1] = 'j' \\g .)C Let's make sure (note that the attribute .cW all may be used as a shorthand for all attributes of a class): .(C * retrieve (SAL_EMP.all) \\g .TS allbox tab(|); l l. name|pay_by_quarter bill|{10000,10000,10000,10000} .TE .)C \*(PP supports arrays of base and user-defined types, as well as .q "arrays of arrays," as in the following example: .(C * create manager (name = char16, employees = text[]) \\g * append manager (name = "mike", employees = "{"wei", "greg", "jeff"}") \\g * append manager (name = "alice", employees = "{"bill", "joe"}") \\g * append manager (name = "marge", employees = "{"mike", "alice"}") \\g .)C This creates a class .cW manager , and provides a list of employees. .\"------------------------------------ .sh 2 "Advanced \*(PQ" .lp Now we have covered the basics of using \*(PQ to access your data. In this section we will discuss those features of \*(PP which distinguish it from other data managers, such as inheritance and time travel. In the next section we will cover how the user can extend the query language via query language functions and composite objects, as well as additional extensions to \*(PP using user defined types, operators, and programming language functions. .\"----------------- .sh 3 "Inheritance" .lp First, re-populate the .cW EMP class by repeating the .cW append commands in section \*(xX. Then, create a second class .cW STUD_EMP , and populate it as follows: .(C * create STUD_EMP (location = point) inherits (EMP) \\g * append STUD_EMP (name = "Sunita", salary = 1300, age = 41, dept = "electronics", location = "(3, 5)") \\g .)C In this case, an instance of .cW STUD_EMP .b inherits all data fields .cW name , ( .cW salary , .cW age , and .cW dept ) from its parent, .cW EMP . Furthermore, student employees have an extra field, .cW location , that shows their address as a coordinate pair. In \*(PP, a class can inherit from zero or more other classes,\** .(f \**i.e., the inheritance hierarchy is a directed acyclic graph. .)f and a query can reference either all instances of a class or all instances of a class plus all of its descendants. For example, the following query finds the employees over 39: .(C * retrieve (E.name) from E in EMP where E.age > 39 \\g .)C On the other hand, to find the names of all employees, including student employees, over age 40, the query is: .(C * retrieve (E.name) from E in EMP* where E.age > 39 \\g .)C which returns: .(T .TS allbox; l. name Joe Sunita .TE .)T Here the .cW * after .cW EMP indicates that the query should be run over .cW EMP and all classes below .cW EMP in the inheritance hierarchy. .lp Note that .cW location in .cW STUD_EMP is not a traditional relational data type. As we will see later, \*(PP can be customized with an arbitrary number of user-defined data types. .\"----------------- .sh 3 "Time Travel" .lp \*(PP supports the notion of .b "time travel" . This feature allows a user to run historical queries. For example, to find Sam's current salary, one would query: .(C * retrieve (E.salary) from E in EMP["now"] where E.name = "Sam" \\g .)C \*(PP will automatically find the version of Sam's record valid at the correct time and get the appropriate salary. .lp One can also give a time .b range . For example to see all the salaries that Sam has ever earned, one would query: .(C * retrieve (E.salary) from E in EMP["Jan 1 00:00:00 1970 GMT", "now"] where E.name = "Sam" \\g .)C If you have executed all of the examples so far, then the above query returns: .(T .TS allbox; l. salary 1200 1200 .TE .)T There are two salaries for Sam, since he was deleted from and then re-appended to the .cW EMP class. .lp The default beginning of a time range is the origin of the system clock (which just happens to be .cW "Jan 1 00:00:00 1970 GMT" '' `` on \*(UU systems), and the default end is the current time; thus, the above time range can be abbreviated as .cW [,] .'' `` .\"--------------------------------------------------------------------------- .sh 1 "User Extensions to \*(PQ" .lp Here, we will discuss user extensions to the \*(PQ query language, query language functions, composite types, and user defined types, functions and operators. .\"------------------------------------ .sh 2 "User Defined \*(PQ Functions" .lp \*(PQ provides two types of functions: .b "query language functions" (functions written in \*(PQ) and .b "programming language functions" (functions written in a separately-compiled programming language such as C.) In this section we will cover \*(PQ functions; programming language functions will be covered below with the discussion on user-defined types. .lp Any collection of commands in the \*(PQ query language can be packaged together and defined as a function, usually returning either a set of instances or a set of base types. For example, the following function .cW high_pay returns all employees in class .cW EMP whose salaries exceed 50,000: .(C * define function high_pay (language = "postquel", returntype = setof EMP) as retrieve (EMP.all) where EMP.salary > 50000 \\g .)C \*(PQ functions can also have parameters. The following function .cW large_pay allows the threshold salary to be specified as an argument: .(C * define function large_pay (language = "postquel", returntype = setof EMP) arg is (int4) as retrieve (EMP.all) where EMP.salary > $1 \\g .)C In addition to their obvious utility as .q aliases for commonly-used queries, \*(PQ functions are useful for creating composite types, as described below. .\"------------------------------------ .sh 2 "Composite Types" .lp Since \*(PQ functions return instances or sets of instances, they are the mechanism used to assign values to composite types. For example, consider extending the .cW EMP class with a .cW manager field. That is, for each instance of .cW EMP , we want to associate another instance of .cW EMP corresponding to the manager of the first instance. Specifically, we will define a \*(PQ function .cW manager : .(C * define function manager (language = "postquel", returntype = EMP) arg is (EMP) as "retrieve (E.all) from E in EMP where E.name = DEPT.manager and DEPT.name = $1.dept" \\g .)C The function .cW manager takes an instance as its only argument, so \*(PQ allows referencing into it with the use of the nested dot notation. Whenever such a function is defined over a class, a user can utilize the cascaded dot notation to reference into (i.e. access the fields of) the objects returned by the function. .lp The following query finds all the employees who work for Joe: .(C * retrieve (EMP.name) where EMP.manager.name = "Joe" \\g .)C This is exactly equivalent to: .(C * retrieve (EMP.name) where name(manager(EMP)) = "Joe" \\g .)C Here, we have essentially added an attribute to the .cW EMP class which is of type .cW EMP , i.e. it has a value which is an instance of the class .cW EMP . Since the value of .cW manager has a record-oriented structure, we call it a .b "composite object" . Consequently, the user can think of the function .cW manager as an attribute of .cW EMP and can reference it just like any other attribute, with the following two exceptions. First, one cannot do direct .cW append s\(emthat is, .(C * append emp (emp.manager.name = "Smith") \\g .)C .b won't work. Non-projected retrieves will also be rejected, i.e.: .(C * retrieve (emp.manager) \\g .)C will result in a warning from the \*(PQ language parser. .lp Note that .cW manager is defined as returning a single instance of .cW EMP. We can also write a \*(PQ function that returns sets of instances. For example, consider the function .(C * define function children (language = "postquel", returntype = setof KIDS) arg is (EMP) as "retrieve (KIDS.all) where $1.name = KIDS.dad or $1.name = KIDS.mom"\\g .)C The .cW children function is defined as returning a set of instances, rather than a single instance. Given the query .(C * retrieve(emp.name, emp.children.name) .)C if the query in the body of the .cW children function returns many instances, the retrieve query will return all of them, in a .q flattened form. If the query in the body of .cW manager returns more than one instance, the .cW manager function will return only one instance, arbitrarily chosen from the set returned by the query in the function's body. See the \*(PP Reference Manual's entry on the .cW "define function" command for further details and examples. .\"--------------------------------------------------------------------------- .sh 1 "User Defined Types, Operators, and Programming Language Functions" .lp The central concept of extending \*(PP lies in \*(PP's ability to .b "dynamically load" a binary object file created by the user. This allows \*(PP to call arbitrary user functions which can be written in a standard programming language. These functions can then be used: .ip \(bu to convert between .b internal (binary) and .b external (character string) representations of user-defined types; .ip \(bu as operators; and .ip \(bu to define ordering for indices on user-defined types. .lp \*(PP's concept of types includes .b built-in types and .b user-defined types. Built-in types are those required by the system to bootstrap itself. User-defined types are those created by the user in the manner described below. There is no intrinsic performance difference between using a system type or user-defined type, other than the overhead due to the complexity of the type itself. .\"------------------------------------ .sh 2 "Internal storage of types" .lp Internally, \*(PP regards a user-defined type as a .q "blob of memory" upon which user-defined functions impose structure and meaning. \*(PP will store and retrieve the data from disk and use user-defined functions to input, process, and output the data. .\"------------------------------------ .sh 2 "Functions needed for a user-defined type" .lp A completely defined user type requires the following user-defined functions: .ip \(bu .b input and .b output functions for the type: These functions determine how the type appears in strings (for input by the user and output to the user) and how the type is organized in memory. These at least are necessary to define the type. .ip \(bu .b operator functions for the type: These functions define the meanings of .q equal, .q "less than," .q "greater than," etc., for your type. .\"------------------------------------ .sh 2 "An Example User Defined Type" .lp In this discussion, we will be defining a .cW circle type, using functions written in the C programming language. .\"----------------- .sh 3 "Data structures for our type" .lp Before we do anything, we have to decide on what a circle looks like, both in string format and internally in memory. Circles have a center and a radius, so a reasonable string representation of a circle would be an ordered triple: .(l (center_x, center_y, radius) .)l where each element is a real number with arbitrary units, e.g.: .(C (5.0, 10.3, 3) .)C This is what the input to the circle input function looks like, and what the output from the circle output function looks like. .lp Now we have to come up with an internal representation for a circle in memory. The following declarations are legal and reasonable given the format we chose above: .(C typedef struct { double x, y; } POINT; typedef struct { POINT center; double r; } CIRCLE; .)C Memory containing values of type .cW CIRCLE will be written to disk and read from disk, so .cW CIRCLE must be both .b complete and .b contiguous ; that is, it cannot contain any pointers. The alternate declaration .(C typedef struct { POINT *center double r; } CIRCLE; .)C will .b NOT work, because only the address stored in .cW center would be written to disk, not the .cW POINT structure that .cW center presumably points to. \*(PP cannot detect this kind of coding error; you must guard against it yourself. .\"----------------- .sh 3 "Defining the input and output functions for our type" .lp Suppose in defining our type .q circle, we have a C source file called .cW circle.c , and a corresponding object code file .cW /usr/postgres/tutorial/circle.o . (All functions related to our .cW circle type have to be in the same object file.) For the sake of argument, suppose we our platform is a DECstation, where sizeof(double) is 8 bytes (this will be important later). .lp We will create source file .cW circle.c , containing C source code for the functions that support our .cW CIRCLE type. .cW circle.c contains three functions: .ip \(bu .cW circle_in , which is the input function for circles. It takes a string as an argument and returns a pointer to a .cW CIRCLE . .ip \(bu .cW circle_out , which is the output function for circles. It is takes a pointer to s .cW CIRCLE as input and returns a string. .ip The return value of .cW circle_in must be a legal argument to .cW circle_out , and vice versa. .ip \(bu .cW eq_area_circle , which is the equality function for circles. For the purposes of this discussion, circles are equal if their areas are equal. .lp The contents of .cW circle.c are: .(C #include #include #include #include "tmp/c.h" /* (always) */ #include "utils/geo-decls.h" /* for POINT declaration */ #include "utils/palloc.h" /* for palloc() declaration */ typedef struct { POINT center; double radius; } CIRCLE; #define LDELIM '(' #define RDELIM ')' #define NARGS 3 CIRCLE * circle_in(str) char *str; { char *p, *coord[NARGS]; int i; CIRCLE *result; if (str == NULL) return(NULL); for (i = 0, p = str; *p && i < NARGS && *p != RDELIM; p++) { if (*p == ',' || (*p == LDELIM && !i)) coord[i++] = p + 1; } if (i < NARGS - 1) return(NULL); result = (CIRCLE *) palloc(sizeof(CIRCLE)); result->center.x = atof(coord[0]); result->center.y = atof(coord[1]); result->radius = atof(coord[2]); return(result); } char * circle_out(circle) CIRCLE *circle; { char *result; if (circle == NULL) return(NULL); result = (char *) palloc(60); sprintf(result, "(%g, %g, %g)", circle->center.x, circle->center.y, circle->radius); return(result); } int eq_area_circle(circle1, circle2) CIRCLE *circle1, *circle2; { return(circle1->radius == circle2->radius); } .)C .lp Now that we have written these functions and compiled the source file,\** .(f \**You will need to supply an option like .cW -I$POSTGRESHOME/src/lib/H to your C compiler so it can find the \*(PP .cW .h '' `` files. Also, various platform-specific compiler options may be required to support \*(PP dynamic linking (for example, the DECstation ULTRIX compiler requires the .cW "-G0" '' `` option.) See .q "define function" in the Reference Manual for details. .)f we have to let \*(PP know that they exist. First, we run the following queries to define the input and output functions. These functions must be defined .b before we define the type. \*(PP will notify you that return type circle is not defined yet, but this is OK\**: .(f \**By default, user-defined C functions use addresses instead of values for all but .q small (<= 4-byte) argument and return types, so we can use the \*(PQ type .cW char16 as a placeholder for the C type .cW "char *" . .)f .(C * define function circle_in (language = "c", returntype = circle) arg is (char16) as "/usr/postgres/tutorial/circle.o" \\g * define function circle_out (language = "c", returntype = char16) arg is (circle) as "/usr/postgres/tutorial/circle.o" \\g .)C Note that the full pathname of the object code file must be specified, so you would change .cW /usr/postgres/tutorial to whatever is appropriate for your installation. .lp Now we can define the .cW circle type: .(C * define type circle (internallength = 24, input = circle_in, output = circle_out) \\g .)C where .cW internallength is the size of the .cW CIRCLE structure in bytes. For circles, the type members are three .cW double s, which on most platforms are 8 bytes each, with no additional alignment constraints. However, when defining your own types, you should .b not make assumptions about structure sizes, but instead write a test program that does a .(C printf("size is %d\en", sizeof (MYTYPE)); .)C on your type. .lp If .cW internallength is defined incorrectly, you will encounter strange errors which may crash the data manager itself. If this were to happen with our .cW CIRCLE type, we would have to do a .(C * remove type circle \\g .)C and then redefine the .cW circle type correctly. Note that we would .b not have to redefine our functions, since their behavior would not have changed. .\"----------------- .sh 3 "Defining an operator for our type" .lp Now that we have finished defining the .cW circle type, we can .cW create classes with circles in them, .cW append records to them with circles defined, and .cW retrieve the values of the entire list of records. But we can do nothing else until we have some circle operators. To do this, we make use of the concept of .b "operator overloading" , and in this case we will set the \*(PP equality operator .cW = '' `` to work for circles. First we have to tell \*(PP that our circle equality function exists: .(C * define function eq_area_circle (language = "c", returntype = bool) arg is (circle, circle) as "/usr/postgres/tutorial/circle.o" \\g .)C We will now bind this function to the equality symbol with the following query: .(C * define operator = (arg1 = circle, arg2 = circle, procedure = eq_area_circle) \\g .)C .\"----------------- .sh 3 "Using our type" .lp Let's create a class .cW tutorial that contains a .cW circle attribute, and run some queries against it: .(C * create tutorial(a = circle) \\g * append tutorial (a = "(1.0, 1.0, 10.0)"::circle) \\g * append tutorial (a = "(2.0, 2.0, 5.0)"::circle) \\g * append tutorial (a = "(0.0, 1.8, 10.0)"::circle) \\g * retrieve (tutorial.all) where tutorial.a = "(0.0, 0.0, 10.0)"::circle \\g .)C which returns: .(T .TS allbox; l. a (1.0, 1.0, 10.0) (0.0, 1.8, 10.0) .TE .)T Recall that we defined circles as being equal if their areas were equal. .lp Other operators (less than, greater than, etc.) can be defined in a similar way. Note that the .cW = '' `` symbol will still work for other types\(emit has merely had a new type added to the list of types it works on. Any string of .q "punctuation characters" other than brackets, braces, or parentheses can be used in defining an operator. .\"------------------------------------ .sh 2 "Additional info on creating a user-defined function" .\"----------------- .sh 3 "Use palloc and not malloc" .lp In order for \*(PP to correctly manage memory associated with processing your type, you must use the memory allocator .cW palloc and avoid standard \*(UU memory managers such as .cW malloc . If you do not, \*(PP will chew up ever increasing amounts of memory. .cW palloc has the same arguments as .cW malloc , that is .(C char *palloc(size) unsigned long size; .)C To free memory allocated with .cW palloc , use .cW pfree , which is analogous to the \*(UU library function .cW free : .(C void pfree(ptr) char *ptr; .)C .\"----------------- .sh 3 "Re-loading user functions" .lp In the process of creating a user-defined type, you may find it necessary to re-load a function in the course of debugging. This is .b not done automatically when you edit or re-compile the file, but .b is done if you quit and restart the data manager. .lp We would re-load our example functions by using the following command: .(C * load "/usr/postgres/tutorial/circle.o" \\g .)C .\"----------------- .sh 3 "Writing a Function of an Instance" .lp We've already discussed user functions which take \*(PP base or user defined types as arguments; in this section, we will discuss inheritable C functions or methods. .lp C language methods are useful particularly when we want to make a function .b inheritable ; that is, to have the function process every instance in an inheritance hierarchy of classes. .lp In using a function of an instance in qualifying an instance, \*(PP defines the .q "current instance" to be the instance being qualified at the moment your function is called. The instance itself will be passed in your function's parameter list as an opaque structure of type TUPLE, and you will use \*(PP library routines to access the data in the object as described below.\** .(f \**In \*(PP \*(PV, .cW TUPLE is defined as .cW "void *" . .)f .cW .lp Suppose we want to write a function to answer the query .(C * retrieve (EMP.all) where overpaid(EMP) \\g .)C In the query above, a reasonable .cW overpaid function might be: .(C bool overpaid(t) TUPLE t; /* the current instance */ { extern char *GetAttributeByName(); short salary, seniority, performance; salary = (short) GetAttributeByName(t, "salary"); seniority = (short) GetAttributeByName(t, "seniority"); performance = (short) GetAttributeByName(t, "performance"); return (salary > (seniority * performance)); } .)C .cW GetAttributeByName is the \*(PP system function that returns attributes out of the current instance. It has two arguments: the argument of type TUPLE passed into the function, and the name of the desired attribute. .cW GetAttributeByName will align data properly so you can cast its return value to the desired type. For example, if you have an attribute .cW name which is of the \*(PQ type .cW char16 , the .cW GetAttributeByName call would look like: .(C char *str; \&... str = (char *) GetAttributeByName(t, "name") .)C .lp To let \*(PP know about the .cW overpaid function, do: .(C * define function overpaid (language = "c", returntype = bool) arg is (EMP) as "/usr/postgres/tutorial/overpaid.o" \\g .)C .lp You can have additional complex, base or user-defined types as arguments to the inheritable function. Thus, .(C * retrieve (EMP.all) where overpaid2(EMP, DEPT, "bill", 8) \\g .)C could be written, and .cW overpaid2 would be declared: .(C bool overpaid2(emp, dept, name, number) TUPLE emp, dept; char *name; long number; .)C .\"------------------------------------ .sh 2 "Arrays of types" .lp As discussed above, \*(PP fully supports arrays of base types. Additionally, \*(PP supports arrays of user-defined types as well. When you define a type, \*(PP .b automatically provides support for arrays of that type. .\"----------------- .sh 3 "Arrays of user-defined types" .lp Using the .q circle example discussed above, we will create a class containing an array of circles: .(C * create circles (list = circle[]) \\g .)C and do some appends .(C * append circles (list = "{"(1.0, 1.0, 5.0)", "(2.0, 2.0, 10.0)"}") \\g * append circles (list = "{"(2.0, 3.0, 15.0)", "(2.0, 2.0, 10.0)"}") \\g * append circles (list = "{"(2.0, 3.0, 4.0)"}") \\g .)C We can now run queries like: .(C * retrieve (circles.list[1]) \\g .)C which returns the first element of each .cW list : .(T .TS allbox; l. list (1, 1, 5) (2, 3, 4) .TE .)T and .(C * retrieve (circles.all) where circles.list[1] = "(0.0, 0.0, 4.0)" \\g .)C which returns: .(T .TS allbox; l. list {"(2, 3, 4)"} .TE .)T Note the .cW {} s, indicating that an array has been retrieved, as opposed to a single element. .\"----------------- .sh 3 "Defining a new array type" .lp An array may be defined as an element of a class, as shown above, or it may be defined as a type in and of itself. This is useful for defining .b "arrays of arrays" . .lp The special built-in functions .cW array_in and .cW array_out are used by \*(PP to input and output arrays of any existing type. Here, we define an array of integers: .(C * define type int_array (element = int4, internallength = variable, input = array_in, output = array_out) \\g .)C The .cW element parameter indicates that this is an array, and setting .cW internallength to .cW variable indicates that the array is a variable-length attribute.\** .(f \**Note that any type using .cW array_in and .cW array_out .b must be variable-length. .)f .lp We can use our type defined above to create an array of integer arrays: .(C * define type int_arrays (element = int_array, internallength = variable, input = array_in, output = array_out) \\g * create stuff (a = int_arrays) \\g * append stuff (a = "{{1, 2, 3} , {4, 5}, {6, 7, 8}}") \\g * append stuff (a = "{{88, 99, 3}}") \\g * append stuff (a = "{{5, 4, 3} , {2, 2}}") \\g * retrieve (stuff.a[1]) where stuff.a[1][1] < stuff.a[1][2] \\g * retrieve (stuff.a) where stuff.a[3][1] < stuff.a[1][2] \\g * retrieve (s.all) from s in stuff where s.a[2][2] = stuff.a[1][1] \\g .)C We can also define operators for equality, less than, greater than, etc. which operate on our new array type as necessary. .\"----------------- .sh 3 "Creating an array type from scratch" .lp There are many situations in which the above scheme for creating an array type is inappropriate, particularly when it is necessary to define a fixed-length array. In this section, we will create an array of four longs called .cW quarterly , and a variable-length array of longs called .cW stats .\** .(f We assume .cW sizeof(long) == 4. .)f .lp The only special things we need to know when writing the input and output functions for .cW quarterly is that \*(PP will pass a .q simple (i.e. fixed-length) array of .cW long s to the output function and expect a simple array of .cW long s in return from the input function. A simple array suitable for .cW quarterly can be declared: .(C long quarterly[4]; .)C For the variable-length array .cW stats , the situation is a little more complicated. Because \*(PP will not know in advance how big the array is, \*(PP will expect the length of the array (in bytes) to be encoded in the first four bytes of the memory which contains the array. The expected structure is: .(C typedef struct { long length; unsigned char bytes[1]; /* Force contiguity */ } VAR_LEN_ATTR; .)C The input function for the .cW stats array will look something like: .(C VAR_LEN_ATTR * stats_in(s) char s; { VAR_LEN_ATTR *stats; long array_size, *arrayp, nbytes; /* * nbytes is the total number of bytes in stats, * INCLUDING the byte count at the beginning */ nbytes = array_size * sizeof(long) + sizeof(long); stats = (VAR_LEN_ATTR *) palloc(nbytes); stats->length = nbytes; arrayp = &(stats->bytes[0]); /* * put code here that loads interesting stuff into * arrayp[0] .. arrayp[array_size] */ return(stats); } .)C The output function for .cW stats will get the same .cW VAR_LEN_ATTR structure. .lp Now, assuming the functions are in .cW /usr/postgres/tutorial/stats.c and .cW /usr/postgres/tutorial/quarterly.c , we can define our two arrays. First we will define the fixed-size array .cW quarterly .\** .(f .cW internallength == 16 follows from our assumption about .cW sizeof(long) . .)f .(C * define function quarterly_in (language = "c", returntype = quarterly) arg is (char16) as "/usr/postgres/tutorial/quarterly.o" \\g * define function quarterly_out (language = "c", returntype = char16) arg is (quarterly) as "/usr/postgres/tutorial/quarterly.o" \\g * define type quarterly (element = int4, internallength = 16, input = quarterly_in, output = quarterly_out) \\g .)C Now we define the .cW stats array: .(C * define function stats_in (language = "c", returntype = stats) arg is (char16) as "/usr/postgres/tutorial/stats.o" \\g * define function stats_out (language = "c", returntype = char16) arg is (stats) as "/usr/postgres/tutorial/stats.o" \\g * define type stats (element = int4, internallength = variable, input = stats_in, output = stats_out) \\g .)C Now we can run some queries: .(C * create test (a = quarterly, b = stats) \\g * append test (a = "1 2 3 4"::quarterly, b = "5 6 7"::stats) \\g * append test (a = "1 3 2 4"::quarterly, b = "6 4"::stats) \\g * append test (a = "7 11 6 9"::quarterly, b = "1 2"::stats) \\g * retrieve (test.all) where test.a[4] = test.b[2] \\g .)C which returns: .(T .TS tab(|) allbox; l l. a|b 1324|64 .TE .)T .b NOTE that when you use your own functions to input and output array types, .b "your function" will define how to parse the external (string) representation. The braces notation is only a convention used by .cW array_in and .cW array_out and is .b not part of the formal \*(PQ definition. .\"------------------------------------ .sh 2 "Large Object types" .lp The types discussed to this point are all .b small objects\(emthat is, they are smaller than 8 Kbytes\** .(f \**8 * 1,024 == 8,192 bytes .)f in size. If you require a larger type for something like a document retrieval system or for storing bitmaps, you will need to use the \*(PP .b "large object" interface. The interface to large objects is quite similar to the \*(UU file system interface. The particulars are detailed in Section 7 of the \*(PP Reference Manual, which you should have available to consult as you read the following. .\"----------------- .sh 3 "Defining a large object" .lp Just like any other type, a large object type requires input and output functions. For the purposes of this discussion, we assume that two functions, .cW large_in and .cW large_out have been written using the large object interface, and that the compiled functions are in .cW /usr/postgres/tutorial/large.o . We also presume that we are using the .q "file as an ADT" interface for large objects discussed in the Reference Manual. .lp We define a large object which could be used for storing map data: .(C * define function large_in (language = "c", returntype = map) arg is (char16) as "/usr/postgres/tutorial/large.o" \\g * define function large_out (language = "c", returntype = char16) arg is (map) as "/usr/postgres/tutorial/large.o" \\g * define type map (internallength = variable, input = large_in, output = large_out) \\g .)C Note that large objects are .b always variable-length. .lp Now we can use our .cW map object: .(C * create maps (name = text, a = map) \\g * append maps (name = "earth", a = "/usr/postgres/maps/earth") \\g * append maps (name = "moon", a = "/usr/postgres/maps/moon") \\g .)C Notice that the above queries are identical in syntax to those we have been using all along to define types and such; the fact that this type is a large object is completely hidden in the large object interface and \*(PP storage manager. .\"----------------- .sh 3 "Writing functions and operators for large object types" .lp Like any other \*(PP type, you can define functions and operators for large object types. The only caveat is that, like any other functions which process a large object, they .b must use the large object interface described in Section 7 of the \*(PP Reference Manual. Possible queries which involve functions on large objects could include .(C * retrieve (emp.name) where beard(emp.picture) = "red" \\g * retrieve (mountain.name) where height(mountain.topomap) > 10000 \\g .)C Because all functionality is available to large objects, .b any aspect of \*(PP is available for use with them, including index access methods, if the appropriate operator classes have been defined. Operator classes for index access methods will be discussed later in this manual. .\"--------------------------------------------------------------------------- .sh 1 "The \*(PP Rule System" .lp The discussion in this section is intended to provide an overview of the \*(PP rule system and point the user at helpful references and examples. \*(PP actually has two rule systems, the .b instance-level rule system and the .b "query rewrite" rule system. .\"------------------------------------ .sh 2 "The Instance-level Rule System" .lp The instance-level rule system uses markers placed in each instance in a class to .q trigger rules. Examples of the instance-level rule system are explained and illustrated in .cW $POSTGRESHOME/demo , which is included with the \*(PP distribution. Additional discussion of the instance-level rule system can be found in the Reference Manual under .cW "define rule" . The theoretical foundations of the \*(PP rule system can be found in [STON90]. .\"------------------------------------ .sh 2 "The Query Rewrite Rule System" .lp The query rewrite rule system modifies queries to take rules into consideration, and then passes the modified query to the query optimizer for execution. It is very powerful, and can be used for many things such as query language procedures, views, and versions. Examples and discussion can be found in the demo in .cW $POSTGRESHOME/video , and further discussion is in the Reference Manual under .cW "define rule" . The power of this rule system is discussed in [ONG90] and [STON90]. .\"------------------------------------ .sh 2 "When to use either?" .lp Since each rule system is architected quite differently, they work best in different situations. The query rewrite system is best when rules affect most of the instances in a class, while the instance-level system is best when a rule affects only a few instances. .\"--------------------------------------------------------------------------- .sh 1 "Administering \*(PP" .lp In this section, we will discuss aspects of \*(PP of interest to those making extensive use of \*(PP, or who are the database administrator for a group of \*(PP users. .\"------------------------------------ .sh 2 "User administration" .lp The .cW createuser and .cW destroyuser enable and disable access to \*(PP by specific users on the host system. Please read the descriptions of these commands in the Reference Manual for specifics on their use. .\"------------------------------------ .sh 2 "Moving database directories out of $POSTGRESHOME/data/base" .lp By default, all \*(PP databases are stored in separate subdirectories under .cW $POSTGRESHOME/data/base/ .\** .(f \**Data for certain classes may stored elsewhere if a non-standard storage manager was specified when they were created. .)f To move a particular data base to an alternate directory (e.g., on a filesystem with more free space), do the following: .ip \(bu Create the database (if it doesn't already exist) using the .b createdb command. In the following steps we will assume the database is named .cW foo . .ip \(bu Copy the directory .cW $POSTGRESHOME/data/base/foo and it contents to its ultimate destination. It should still be owned by the .cW postgres user. .ip \(bu Remove the directory .cW $POSTGRESHOME/data/base/foo . .ip \(bu Make a symbolic link in .cW $POSTGRESHOME/data/base to the new directory. .\"------------------------------------ .sh 2 "Troubleshooting \*(PP" .lp Occasionally, \*(PP will fail with cryptic error messages that are due to relatively simple problems. The following are a list of \*(PP error messages and the likely fix. These messages are ones you would likely see in the monitor program. .(C Message: semget: No space left on device Explanation and Likely Fix: .)C Either the kernel has not been configured for System V shared memory, or some other program is using it up. On most machines, the \*(UU command .cW ipcs will show shared memory and semaphore usage. .lp To delete all shared memory and semaphores (may be necessary if a backend fails), run the .cW ipcclean command. Note, however, that .cW ipcclean deletes .b all semaphores belonging to the user running it, so the user should be certain that none of his/her non-\*(PP processes are using semaphores before running this command. .(C Message: Unable to get shared buffers Explanation and Likely Fix: .)C This message means that a \*(PP backend was expecting shared memory to be available and it was not. Usually this is due to .cW ipcclean being run while a .cW postmaster was also running. .(C Message: Can't connect to the backend (...) Explanation and Likely Fix: .)C This message means that you are running a \*(LP application but it could not link up with a .cW postmaster . If you see this error message, you should see if a .cW postmaster is truly running. If one is running, the problem is likely related to your network. .\"--------------------------------------------------------------------------- .sh 1 "REFERENCES" .\"------------------------------------ .xP [ONG90] Ong, L. and Goh, J., ``A Unified Framework for Version Modeling Using Production Rules in a Database System," Electronics Research Laboratory, University of California, Berkeley, ERL Memo M90/33, April 1990. .\"------------------------------------ .xP [ROWE87] Rowe, L. and Stonebraker, M., ``The POSTGRES Data Model,'' Proc. 1987 VLDB Conference, Brighton, England, Sept. 1987. .\"------------------------------------ .xP [SCHA90] Shapiro, L., ``Join Processing in Database Systems with Large Main Memories,'' ACM-TODS, Sept. 1986. .\"------------------------------------ .xP [STON86] (missing) .\"------------------------------------ .xP [STON87] Stonebraker, M., ``The POSTGRES Storage System,'' Proc. 1987 VLDB Conference, Brighton, England, Sept. 1987. .\"------------------------------------ .xP [STON88] (missing) .\"------------------------------------ .xP [STON90] Stonebraker, M. et. al., ``On Rules, Procedures, Caching and Views in Database Systems,'' Proc. 1990 ACM-SIGMOD Conference on Management of Data, Atlantic City, N.J., June 1990. .\"------------------------------------ .xP [STON90B] (missing) .\"------------------------------------ .xP [WANG88] (missing) .\"--------------------------------------------------------------------------- .uh "APPENDIX: User defined types and indices" .lp In this section, we will discuss how to extend \*(PP to use a user-defined type and associated functions with existing access methods. This way, you can define a BTREE or RTREE index on your own type. To do this, we will discuss how to define a new operator class in \*(PP for use with an existing access method. .lp Our example will be to add a new operator class to the BTREE access method. The new operator class will sort integers in ascending absolute value order. This tutorial will describe how to define the operator class. If you work the example, you will be able to define and use indices that sort integer keys by absolute value. .lp There are several \*(PP system classes that are important in understanding how the access methods work. These will be discussed, and then a sample procedure for adding a new set of operators to an existing access method will be shown as an example. .lp The .cW pg_am class contains one instance for every user defined access method. Support for the HEAP access method is built into \*(PP, but every other access method is described here. The schema is .TS center tab(|); lf(C)|l. amname|name of the access method _ amowner|object id of the owner's instance in pg_user _ amkind|not used at present, but set to 'o' as a place holder _ amstrategies|number of strategies for this access method (see below) _ amsupport|number of support routines for this access method (see below) _ am*|T{ procedure identifiers for interface routines to the access method. For example, .cW regproc ids for opening, closing, and getting instances from the access method appear here. T} .TE The object ID of the instance in .cW pg_am is used as a foreign key in lots of other classes. For BTREES, this object ID is 403. You don't need to add a new instance to this class; all you're interested in is the object ID of the access method instance you want to extend: .(C * retrieve (pg_am.oid) where pg_am.amname = "btree" \\g .TS allbox; l. oid 403 .TE .)C The .cW amstrategies attribute exists to standardize comparisons across data types. For example, BTREES impose a strict ordering on keys, less to greater. Since \*(PP allows the user to define operators, \*(PP cannot in general look at the name of an operator (eg, .cW > , .cW < ) and tell what kind of comparison it is. In fact, some access methods (like rtrees) don't impose a less-to-greater ordering, but some other ordering, like containment. \*(PP needs some consistent way of taking a scan qualification, looking at the operator, deciding if a usable index exists, and rewriting the query qualification in order to improve access speeds. This implies that \*(PP needs to know, for example, that .cW <= and .cW > partition a BTREE. Strategies is the way that we do this. .lp Defining a new set of strategies is beyond the scope of this discussion, but how the BTREE strategies work will be explained, since you'll need to know that to add a new operator class. In the .cW pg_am class, the .cW amstrategies attribute is the number of strategies defined for this access method. For BTREES, this number is 5. These strategies correspond to .TS center tab(|); l|l. less than|1 _ less than or equal|2 _ equal|3 _ greater than or equal|4 _ greater than|5 .TE The idea is that you'll add procedures corresponding to the comparisons above to the .cW pg_amop relation (see below). The access method code can use these numbers, regardless of data type, to figure out how to partition the BTREE, compute selectivity, and so on. Don't worry about the details of adding procedures yet; just understand that there's a set of these for .cW int2 , .cW int4 , .cW oid , and every other data type on which a BTREE can operate. .lp Strategies are used by all of the \*(PP access methods. Some access methods require other support routines in order to work. For example, the BTREE access method must be able to compare two keys and determine whether one is greater than, equal to, or less than the other. Similarly, the RTREE access method must be able to compute intersections, unions, and sizes of rectangles. These operations do not correspond to user qualifications in \*(PQ queries; they are administrative routines used by the access methods, internally. .lp In order to manage diverse support routines consistently across all \*(PP access methods, .cW pg_am includes a field called .cW amsupport . This field records the number of support routines used by an access method. For BTREES, this number is one\(emthe routine to take two keys and return \(mi\^1, 0, or \(pl\^1, depending on whether the first key is less than, equal to, or greater than the second. .lp The .cW amstrategies entry in .cW pg_am is just the number of strategies defined for the access method in question. The procedures for less than, less equal, and so on don't appear in .cW pg_am . Similarly, .cW amsupport is just the number of support routines required by the access method. The actual routines are listed elsewhere. .lp The next class of interest is .cW pg_opclass. This class exists only to associate a name with an .cW oid . In .cW pg_amop , every BTREE operator class has a set of procedures, one through five, above. Some existing opclasses are .cW int2_ops , .cW int4_ops , and .cW oid_ops . You need to add an instance with your opclass name (for example, .cW int4_abs_ops ) to .cW pg_opclass . The .cW oid of this instance is a foreign key in other classes. .(C * append pg_opclass (opcname = "int4_abs_ops") \\g * retrieve (cl.oid, cl.opcname) from cl in pg_opclass where cl.opcname = "int4_abs_ops" \\g .TS tab(|) allbox; l l. oid|opcname 17314|int4_abs_ops .TE .)C .b NOTE: The .cW oid for your .cW pg_opclass instance .b "may be different" ! You should substitute your value for 17314 wherever it appears in this discussion. .lp So now we have an access method and an operator class. We still need a set of operators; the procedure for defining operators was discussed earlier in this manual. For the .cW int4_abs_ops operator class on BTREES, the operators we require are: .(l absolute value less-than absolute value less-than-or-equal absolute value equal absolute value greater-than-or-equal absolute value greater-than .)l Suppose the code that implements the functions defined is stored in the file .cW /usr/postgres/tutorial/int4_abs.c . The code is .(C /* * int4_abs.c -- absolute value comparison functions * for int4 data */ #include "tmp/c.h" #define ABS(a) a = ((a < 0) ? -a : a) bool int4_abs_lt(a, b) int32 a, b; { ABS(a); ABS(b); return (a < b); } bool int4_abs_le(a, b) int32 a, b; { ABS(a); ABS(b); return (a <= b); } bool int4_abs_eq(a, b) int32 a, b; { ABS(a); ABS(b); return (a == b); } bool int4_abs_ge(a, b) int32 a, b; { ABS(a); ABS(b); return (a >= b); } bool int4_abs_gt(a, b) int32 a, b; { ABS(a); ABS(b); return (a > b); } .)C There are a couple of important things that are happening below. First, note that operators for less, less equal, equal, greater equal, and greater for .cW int4 are being defined. All of these operators are already defined for .cW int4 under the names .cW < , .cW <= , .cW = , .cW >= , and .cW > . The new operators behave differently, of course. In order to guarantee that \*(PP uses these new operators rather than the old ones, they need to be named differently from the old ones. This is a key point: you can overload operators in \*(PP, but only if the operator isn't already defined for the argument types. .lp That is, if you have .cW < defined for .cW int4 , ( .cW int4 ), you can't define it again. \*(PP .b doesn't check this when you define your operator, so be careful. To avoid this problem, odd names will be used for the operators. If you get this wrong, the access methods are likely to crash when you try to do scans. .lp The other important point is that all the functions return .b boolean values; the access methods rely on this fact. .(C * define function int4_abs_lt (language = "c", returntype = bool) arg is (int4, int4) as "/usr/postgres/tutorial/int4_abs.o" \\g * define function int4_abs_le (language = "c", returntype = bool) arg is (int4, int4) as "/usr/postgres/tutorial/int4_abs.o" \\g * define function int4_abs_eq (language = "c", returntype = bool) arg is (int4, int4) as "/usr/postgres/tutorial/int4_abs.o" \\g * define function int4_abs_ge (language = "c", returntype = bool) arg is (int4, int4) as "/usr/postgres/tutorial/int4_abs.o" \\g * define function int4_abs_gt (language = "c", returntype = bool) arg is (int4, int4) as "/usr/postgres/tutorial/int4_abs.o" \\g .)C Now define the operators that use them. As noted, the operator names must be unique for two .cW int4 operands. You can do a query on .cW pg_operator : .(C * retrieve (pg_operator.all) \\g .)C to see if your name is taken for the types you want. The important things here are the procedure (which are the C functions defined above) and the restriction and join selectivity functions. You should just use the ones used below\(emnote that there are different such functions for the less-than, equal, and greater-than cases. These .b must be supplied, or the access method will die when it tries to use the operator. You should copy the names for .cW restrict and .cW join , but use the procedure names you defined in the last step. .(C * define operator <<& (arg1 = int4, arg2 = int4, procedure=int4_abs_lt, associativity = left, restrict = intltsel, join = intltjoinsel) \\g * define operator <=& (arg1 = int4, arg2 = int4, procedure = int4_abs_le, associativity = left, restrict = intltsel, join = intltjoinsel) \\g * define operator ==& (arg1 = int4, arg2 = int4, procedure = int4_abs_eq, associativity = left, restrict = eqsel, join = eqjoinsel) \\g * define operator >=& (arg1 = int4, arg2 = int4, procedure = int4_abs_ge, associativity = left, restrict = intgtsel, join = intgtjoinsel) \\g * define operator >>& (arg1 = int4, arg2 = int4, procedure = int4_abs_gt, associativity = left, restrict = intgtsel, join = intgtjoinsel) \\g .)C Notice that five operators corresponding to less, less equal, equal, greater, and greater equal are defined. .lp We're just about finished. the last thing we need to do is to update the .cW pg_amop relation. To do this, we need the following attributes: .TS center tab(|); lf(C)|l. amopid|T{ the .cW oid of the .cW pg_am instance for BTREE (== 400, see above) T} _ amopclaid|T{ the .cW oid of the .cW pg_opclass instance for .cW int4_abs_ops (== whatever you got instead of 17314, see above) T} _ amopopr|T{ the .cW oid s of the operators for the opclass (which we'll get in just a minute) T} _ T{ amopselect, .br amopnpages T}|cost functions. .TE The cost functions are used by the query optimizer to decide whether or not to use a given index in a scan. Fortunately, these already exist. The two functions we'll use are .cW btreesel, which estimates the selectivity of the btree, and .cW btreenpage, which estimates the number of pages a search will touch in the tree. .lp So we need the .cW oid s of the operators we just defined. We'll look up the names of all the operators that take two .cW int4 s, and pick ours out: .(C * retrieve (o.oid, o.oprname) from o in pg_operator, t in pg_type where o.oprleft = t.oid and o.oprright = t.oid and t.typname = "int4" \\g .)C which returns: .(T .TS tab(|) allbox; l l. oid|oprname 96|\\= 97|< 514|* 518|!= 521|> 523|<= 525|>= 528|/ 530|% 551|+ 555|- 17321|<<& 17322|<=& 17323|==* 17324|>=& 17325|>>& .TE .)T (Note that your .cW oid numbers may be different.) The operators we are interested in are those with .cW oid s 17321 through 17325. The values you get will probably be different, and you should substitute them for the values below. We can look at the operator names and pick out the ones we just added. (Of course, there are lots of other queries we could used to get the oids we wanted.) .lp Now we're ready to update .cW pg_amop with our new operator class. The most important thing in this entire discussion is that the operators are ordered, from less equal through greater equal, in .cW pg_amop . Recall that the BTREE instance's .cW oid is 400 and .cW int4_abs_ops is .cW oid 17314. Then we add the instances we need: .(C * append pg_amop (amopid = "400"::oid, /* btree oid */ amopclaid = "17314"::oid, /* pg_opclass tuple */ amopopr = "17321"::oid, /* <<& tup oid */ amopstrategy = "1"::int2, /* 1 is <<& */ amopselect = "btreesel"::regproc, amopnpages = "btreenpage"::regproc) \\g * append pg_amop (amopid = "400"::oid, amopclaid = "17314"::oid, amopopr = "17322"::oid, amopstrategy = "2"::int2, amopselect = "btreesel"::regproc, amopnpages = "btreenpage"::regproc) \\g * append pg_amop (amopid = "400"::oid, amopclaid = "17314"::oid, amopopr = "17323"::oid, amopstrategy = "3"::int2, amopselect = "btreesel"::regproc, amopnpages = "btreenpage"::regproc) \\g * append pg_amop (amopid = "400"::oid, amopclaid = "17314"::oid, amopopr = "17324"::oid, amopstrategy = "4"::int2, amopselect = "btreesel"::regproc, amopnpages = "btreenpage"::regproc) \\g * append pg_amop (amopid = "400"::oid, amopclaid = "17314"::oid, amopopr = "17325"::oid, amopstrategy = "5"::int2, amopselect = "btreesel"::regproc, amopnpages = "btreenpage"::regproc) \\g .)C NOTE the order: .q less is 1, .q "less equal" is 2, .q equal is 3, .q "greater equal" is 4, and .q greater is 5. .lp Okay, now it's time to test the new opclass. First we'll create and populate a class: .(C * create pairs (name = char16, number = int4) \\g * append pairs (name = "mike", number = -10000) \\g * append pairs (name = "greg", number = 3000) \\g * append pairs (name = "lay peng", number = 5000) \\g * append pairs (name = "jeff", number = -2000) \\g * append pairs (name = "mao", number = 7000) \\g * append pairs (name = "cimarron", number = -3000) \\g * retrieve (pairs.all) \\g .TS tab(|) allbox; l l. name|number mike|-10000 greg|3000 lay peng|5000 jeff|-2000 mao|7000 cimarron|-3000 .TE .)C Okay, looks pretty random. Define an index using the new opclass: .(C * define index pairsind on pairs using btree (number int4_abs_ops) \\g .)C Now run a query that doesn't use one of our new operators. What we're trying to do here is to run a query that .b won't use our index, so that we can tell the difference when we see a query that .b does use the index. This query won't use the index because the operator we use in the qualification isn't one that appears in the list of strategies for our index. .(C * retrieve (pairs.all) where pairs.number < 9000 \\g .TS tab(|) allbox; l l. name|number mike|-10000 greg|3000 lay peng|5000 jeff|-2000 mao|7000 cimarron|-3000 .TE .)C Yup, just as random; that didn't use the index. Okay, let's run a query that .b does use the index: .(C * retrieve (pairs.all) where pairs.number <<& 9000 \\g .TS tab(|) allbox; l l. name|number jeff|-2000 cimarron|-3000 greg|3000 lay peng|5000 mao|7000 .TE .)C Note that the .cW number values are in order of increasing absolute value (as they should be, since the index was used for this scan) and that we got the right answer\(emthe instance for .cW mike doesn't appear, because \(mi10000 >=& 9000.