.\".he '\*(dA'DRAFT'\*(tI'		\" comment out in production version
.\"========================================================
.\"------------------------------------
.\" /home1/frew/s2k/postgres/manual/RCS/manual.me,v 1.10 1992/07/07 17:44:44 frew Exp
.\"------------------------------------
.\" XXX standard disclaimer belongs here....
.\"---------------------------------------------------------------------------
.de cW					\" arg3arg1arg2, constant-width arg1
\&\\$3\\fC\\$1\\fP\\$2
.\"\&\\$3\\fC\\s-1\\$1\\s0\\fP\\$2
..
.\"------------------------------------
.de xP					\" ip for references
.ip \\$1 \\w'[STON90B]'u+2n
..
.\"------------------------------------
.de (P					\" prologue for constant-width block
.ft C
.ps -1
.vs -1
..
.\"-----------------
.de )P					\" epilogue for constant-width block
.vs +1
.ps +1
.ft P
..
.\"------------------------------------
.de (T					\" prologue for constant-width table
.(P
.in +\\n(biu
..
.\"-----------------
.de )T					\" epilogue for constant-width table
.in -\\n(biu
.)P
..
.\"------------------------------------
.de (C					\" begin constant-width list
.(l
.(P
..
.\"-----------------
.de )C					\" end constant-width list
.)P
.)l
..
.\"---------------------------------------------------------------------------
.    ds II \s-1INGRES\s0
.    ds PP \s-1POSTGRES\s0
.    ds UU \s-1UNIX\s0
.    ds PQ \s-1POSTQUEL\s0
.    ds LP \s-1LIBPQ\s0
.    ds PV \s-14.0\s0
.    ds OF \s-1PICASSO\s0
.\"------------------------------------
.ps 11
.vs 13
.\"-----------------
.nr pp \n(.s
.nr sp \n(.s+1				\" +1-pt section headers
.\"-----------------
.nr bs 1v
.nr ps 0.3v
.nr ss 1v
.\"-----------------
.fo ''\\s+2%\\s0''			\" +2-pt page numbers in center footers
.\"-----------------
.nr $i .5i				\" main text indented
.nr so -\n($i				\" section headers un-indented
.\"------------------------------------
.rm xX					\" scratch register
.\"------------------------------------
.\"!Gexpand
.\"---------------------------------------------------------------------------
.(l C
.b
\s+3The \*(PP User Manual\s0
.sp 2
.i
Edited by Jon Rhein, Greg Kemnitz and The \*(PP Group
EECS Dept.
University of California, Berkeley
.r
.)l
.sp 3
.\"---------------------------------------------------------------------------
.sh 1 "OVERVIEW"
.lp
This document is the user manual for the \*(PP database
system under development at the University of California,
Berkeley.
This project, led by Professor Michael Stonebraker, is
sponsored by the Defense Advanced Research Projects Agency
(DARPA), the Army Research Office (ARO), the National Science
Foundation (NSF), and ESL, Inc.
.\"------------------------------------
.sh 2 "DISTRIBUTION"
.lp
This manual describes Version \*(PV of \*(PP.
The \*(PP Group has compiled and tested Version \*(PV on the following
platforms:
.TS
center tab(|);
c | c
l | l .
architecture|operating system
=
DECstation (MIPS)|ULTRIX V4.2
SPARC|SunOS 4.1.2
Sequent Symmetry (386)|DYNIX V3.0
.TE
.\"------------------------------------
.sh 2 "PERFORMANCE"
.lp
Version \*(PV has been tuned modestly.
On the
Wisconsin benchmark, one should expect performance about twice
that of the public domain, University of California version of
\*(II, a relational prototype from the late 1970's.
.\"------------------------------------
.sh 2 "ACKNOWLEDGEMENTS"
.lp
\*(PP has been constructed by a team of undergraduate,
graduate, and staff programmers.
The contributors (in
alphabetical order) consisted of James Bell, Jennifer Caetta, Jolly Chen,
Ron Choi, Jeffrey Goh, Joey Hellerstein, Wei Hong,
Anant Jhingran, Greg Kemnitz, Case Larsen, Jeff Meredith, Michael Olson,
Lay-Peng Ong, Spyros Potamianos, Sunita Sarawagi and Cimarron Taylor.
.lp
For version \*(PV Jeff Meredith served as chief programmer and was
responsible for overall coordination of the project and for individually
implementing the
.q "everything else"
portion of the system.
.lp
The above implementation team contributed significantly to
this manual, as did Claire Mosher, Chandra Ghosh, and Jim Frew.
.\"---------------------------------------------------------------------------
.sh 1 "INTRODUCTION"
.lp
Traditional relational DBMSs support a data model consisting
of a collection of named relations, each attribute of which has a
specific type.
In current commercial systems, possible types are
floating point numbers, integers, character strings, money, and
dates.
It is commonly recognized that this model is inadequate
for future data processing applications.
.lp
The relational model succeeded in
replacing previous models in part because of its simplicity.
The \*(PP data model offers substantial
additional power by incorporating the following four additional basic
constructs:
.(l
classes
inheritance
types
functions
.)l
The \*(PP DBMS has been under construction since 1986.
The initial concepts for the system were presented in [STON86]
and the initial data model appeared in [ROWE87].
The first rule
system that was implemented is discussed in [STON88] and the
storage manager concepts are detailed in [STON87].
The first
.q demo-ware
was operational in 1987, and we released Version 1 of
\*(PP to a few external users in June 1989.
A critique of
version 1 of \*(PP appears in [STON90].
Version 2 followed in
June 1990, and it included a new rule system documented in
[STON90B].
Version \*(PV, the current version of \*(PP,
is about 200,000 lines of code in the C programming language.
\*(PP is available free of charge, and is being used by approximately
200 sites around the world at this writing.
.\"---------------------------------------------------------------------------
.sh 1 "ORGANIZATION"
.lp
This manual discusses the \*(PQ query language, including extensions such
as user-defined types, operators, and both query language and programming
language functions.
Arrays of types and functions of an instance are
discussed, as well as the \*(PP rule system.
This manual concludes with
a discussion on adding an operator class to \*(PP for use in access methods.
.lp
This manual describes the major
concepts of the system
and attempts to provide an accessible path into using the system.
As such, it tries to give examples of the use of the major
constructs, so a beginning user does not need to delve
immediately into the Reference Manual.
.\"---------------------------------------------------------------------------
.sh 1 "WHAT YOU SHOULD READ"
.lp
This manual is primarily intended to provide a broad overview of the system,
as well as to illustrate how programmers would use functions to interact with
the \*(PP
.q backend.
The \*(PP Reference Manual discusses additional aspects
of
the system, and provides full syntactic descriptions of every \*(PP and \*(PQ
command in a format similar to that used in \*(UU
.q "man pages."
.lp
If you are new to \*(PP, you should probably read this manual first, followed
by the parts of the \*(PP Reference Manual necessary to build your application.
In particular, you should read the section on \*(LP if you intend to build
a client application around \*(PP, as this is not discussed at all in this
manual.
.\"---------------------------------------------------------------------------
.sh 1 "The \*(PQ Query Language"
.lp
\*(PQ is the query language used for interacting with \*(PP.
Here, we
give an overview of how to use \*(PQ to access data.
In other sections,
user extensions to \*(PQ will be discussed.
.\"------------------------------------
.sh 2 "Creating a database"
.lp
Once \*(PP has been installed at your site by following
the directions in the release notes, you can create a database named
.cW foo
using the following command:
.(C
% createdb foo
.)C
\*(PP allows you to create any number of databases at a
given site and you automatically become the database
administrator of the database just created.
Database names must
have an alphabetic first character and are limited to 16
characters in length.
.lp
Once you have constructed a database, there are four ways to
interact with it:
.ip \(bu
You can run the \*(PP terminal
monitor which allows you to interactively enter, edit, and
execute commands in the query language \*(PQ.
.ip \(bu
You can
interact with \*(PP from a C program by using the \*(LP
library of subroutine and call facilities.
This allows you to submit
\*(PQ commands from C and get answers and status messages back
to your program.
This interface is discussed further in the
\*(LP section of the Reference Manual.
.ip \(bu
You can
use the
.b "fast path"
facility,
which
allows you to directly execute functions stored in the database.
This facility is described in the Reference Manual under
.q "Fast Path."
.ip \(bu
\*(PP is accessible from the \*(OF programming
environment.
\*(OF is a graphical user interface (GUI) toolkit
that allows a user to build sophisticated DBMS-oriented
applications.
\*(OF is a separate research project described
in a collection of reports
[WANG88, SCHA90] and is not treated further in this manual.
.lp
The terminal monitor can be activated for
the
.cW foo
database by
typing the command\**:
.(f
\**You may first need to set the
.cW POSTGRESHOME
environment variable to the name of the \*(PP root directory at your site,
if it is not the default
.cW /usr/postgres .
If the \*(PP you wish to access is on a remote host, then you will also need to
set the
.cW PGHOST
environment variable to the name of the remote host.
.)f
.(C
% monitor foo
.)C
(the
.cW % '' ``
is your \*(UU shell prompt.)
You will be greeted by the following message:
.(C
Welcome to the C POSTGRES terminal monitor

Go
*
.)C
The
.cW Go
indicates the terminal monitor is listening to you and
that you can type \*(PQ commands into a workspace maintained
by the monitor.
The monitor indicates it is listening by typing
.cW *
as a prompt.
Printing the workspace can be performed by
typing:
.(C
* \\p
.)C
and it can be passed to \*(PP for execution by typing:
.(C
* \\g
.)C
If you make a typing mistake, you can invoke the
.cW vi
text editor
by typing:
.(C
* \\e
.)C
The workspace will be passed to the editor, and you have the full
power of
.cW vi
to make any necessary changes.
For more info on using
.cW vi ,
type
.(C
% man vi
.)C
Once you exit
.cW vi ,
your edited query will be in the monitor's query
buffer and you can submit it to \*(PP by using the
.cW \eg
command
described above.
.lp
To get out of the monitor and return to \*(UU, type
.(C
* \\q
.)C
and the monitor will respond:
.(C
I live to serve you.
%
.)C
For a complete collection of monitor commands, see the manual page
on
.cW monitor
in the \*(UU section of the Reference Manual.
.lp
If you are the database administrator for the database
.cW foo ,
you
can destroy it using the following \*(UU command:
.(C
% destroydb foo
.)C
Other DBA commands include
.cW createuser
and
.cW destroyuser ,
which are discussed further in the \*(UU section of the Reference Manual.
.\"------------------------------------
.sh 2 "Classes and the Query Language \*(PQ"
.\"-----------------
.sh 3 "Basic Capabilities"
.lp
The fundamental notion in \*(PP is that of a
.b class,
which is a named collection of instances of objects.
Each
instance has the same collection of named attributes, and each
attribute is of a specific type.
Furthermore, each instance has
an installation-wide unique (never-changing)
.b "object identifier"
or
.b oid .
.\"-----------------
.sh 3 "Creating a New Class"
.lp
(In order to try out the following \*(PQ examples, create the
.cW foo
database
as described in the previous section, and start the terminal
monitor.)
.lp
A user can create a new class by specifying the class name,
along with all attribute names and their types:
.(C
* create EMP (name = text, salary = int4,
              age = int4, dept = char16) \\g

* create DEPT (dname = char16, floor = int4,
               manager = text) \\g
.)C
The \*(PQ base types used above are
a variable-length array of printable characters
.cW text ), (
a 4-byte signed integer
.cW int4 ), (
and a fixed-length array of 16 characters
.cW char16 .)\** (
.(f
\**See
.q "Built-In Types"
in the Reference Manual.
.)f
Spaces, tabs and newlines may be used freely in \*(PQ queries.
.lp
So far, the
.cW create
command looks exactly like the create
statement in a traditional relational system.
However, we will presently see
that classes have properties that are extensions of the
relational model, so we use a different word to describe them.
.\"-----------------
.sh 3 "Populating a Class with Instances"
.ds xX \n($1.\n($2.\n($3
.lp
To populate a class with instances, one can use the
.cW append
command:
.(C
* append EMP (name = "Joe", salary = 1400,
              age = 40, dept = "shoe") \\g

* append EMP (name = "Sam", salary = 1200,
              age = 29, dept = "toy") \\g

* append EMP (name = "Bill", salary = 1600,
              age = 36, dept = "candy") \\g
.)C
This will add 3 instances to
.cW EMP ,
one for each
.cW append
command.
.\"-----------------
.sh 3 "Querying a Class"
.lp
The
.cW EMP
class can be queried with normal selection and
projection queries.
For example, to find the employees under 35 years of age,
one would type:
.(C
* retrieve (EMP.name) where EMP.age < 35 \\g
.)C
and the output would be:
.(T
.TS
allbox;
l.
name
Sam
.TE
.)T
Notice that parentheses are required around
the
.b "target list"
of returned attributes
(e.g.,
.cW EMP.name .)
.lp
\*(PQ allows you to return
computations in the target list as long as they are given a name
(e.g.,
.cW result ):
.(C
* retrieve (result = EMP.salary / EMP.age)
      where EMP.name = "Bill" \\g
.)C
.\"-----------------
.sh 3 "Redirecting retrieve queries"
.lp
Any retrieve query can be redirected to a
new class in the database, and arbitrary boolean operators
.cW and , (
.cW or ,
.cW not )
are allowed in the qualification of any query:
.(C
* retrieve into temp (EMP.name)
      where EMP.age < 35 and EMP.salary > 1000 \\g
.)C
.\"-----------------
.sh 3 "Joins"
.lp
To find the names of employees which are the same age, one
could write:
.(C
* retrieve (E1.name, E2.name)
      from E1 in EMP, E2 in EMP
      where E1.age = E2.age and E1.name != E2.name \\g
.)C
In this case both E1 and E2 are
.b surrogates
for an instance of the
class
.cW EMP ,
and both range over all instances of the class.
A \*(PQ
query can contain an arbitrary number of class names and
surrogates.\**
.(f
\**The semantics of such a join are that
the qualification is a truth expression defined
for the Cartesian product of the classes indicated in the query.
For those instances in the Cartesian product for which the
qualification is true, \*(PP must compute and return the
target list.
.)f
.\"-----------------
.sh 3 "Updates"
.lp
Updates are accomplished in \*(PQ using the
.cW replace
command:
.(C
* replace EMP (salary = E.salary)
      from E in EMP
      where EMP.name = "Joe" and E.name = "Sam" \\g
.)C
This command replaces the salary of Joe by that of Sam.
.\"-----------------
.sh 3 "Deletions"
.lp
Deletions are done using the
.cW delete
command:
.(C
* delete EMP where EMP.salary > 0 \\g
.)C
Since all employees have positive salaries, this command will
leave the
.cW EMP
class empty.
.\"-----------------
.sh 3 "Arrays"
.lp
\*(PP supports both fixed-length and variable-length
one-dimensional arrays.
To illustrate their use, we first create a class with an array type.
.(C
* create SAL_EMP (name = char[],
                  pay_by_quarter = int4[4]) \\g
.)C
The above query will create a class named
.cW SAL_EMP
with a variable-length array of
.cW text
strings
.cW name ), (
and an array of 4
.cW int4
integers
.cW pay_by_quarter ), (
which represents the employee's salary by quarter.
Now we do some
.cW append s;
note that when appending to a non-character array, we enclose the values
within braces and separate them by commas.
.(C
* append SAL_EMP (name = "bill",
      pay_by_quarter = "{10000, 10000, 10000, 10000}") \\g

* append SAL_EMP (name = "jack",
      pay_by_quarter = "{10000, 15000, 15000, 15000}") \\g

* append SAL_EMP (name = "joe",
      pay_by_quarter = "{20000, 25000, 25000, 25000}") \\g
.)C
\*(PP uses the FORTRAN numbering convention for arrays\(emthat is, \*(PP
arrays start with array[1] and end with array[n].
.lp
Now, we can run some
queries on
.cW SAL_EMP .
This query retrieves the names of the employees whose pay changed in the
second quarter:
.(C
* retrieve (SAL_EMP.name)
      where SAL_EMP.pay_by_quarter[1] !=
            SAL_EMP.pay_by_quarter[2] \\g
.)C
This query retrieves the third quarter pay of all employees:
.(C
* retrieve (SAL_EMP.pay_by_quarter[3]) \\g
.)C
This query deletes everyone from
.cW SAL_EMP
whose name begins with the letter
.q j.
.cW SAL_EMP
should now contain only the employee named
.q bill :
.(C
* delete SAL_EMP where SAL_EMP.name[1] = 'j' \\g
.)C
Let's make sure (note that the attribute
.cW all
may be used as a shorthand for all attributes of a class):
.(C
* retrieve (SAL_EMP.all) \\g
.TS
allbox tab(|);
l l.
name|pay_by_quarter
bill|{10000,10000,10000,10000}
.TE
.)C
\*(PP supports arrays of base and user-defined types, as well as
.q "arrays of arrays,"
as in the following example:
.(C
* create manager (name = char16, employees = text[]) \\g

* append manager (name = "mike",
                  employees = "{"wei", "greg", "jeff"}") \\g

* append manager (name = "alice",
                  employees = "{"bill", "joe"}") \\g

* append manager (name = "marge",
                  employees = "{"mike", "alice"}") \\g
.)C
This creates a class
.cW manager ,
and provides a list of employees.
.\"------------------------------------
.sh 2 "Advanced \*(PQ"
.lp
Now we have covered the basics of using \*(PQ to access your data.
In this section we will discuss those features of \*(PP which
distinguish it from other data managers, such as
inheritance
and
time travel.
In the next section we will cover how the user can extend
the query language via
query language functions
and
composite objects,
as well
as additional extensions to \*(PP using
user defined types,
operators,
and
programming language functions.
.\"-----------------
.sh 3 "Inheritance"
.lp
First, re-populate the
.cW EMP
class by repeating the
.cW append
commands in section \*(xX.
Then, create a second class
.cW STUD_EMP ,
and populate it as follows:
.(C
* create STUD_EMP (location = point) inherits (EMP) \\g

* append STUD_EMP (name = "Sunita", salary = 1300,
                   age = 41, dept = "electronics",
                   location = "(3, 5)") \\g
.)C
In this case, an instance of
.cW STUD_EMP
.b inherits
all data fields
.cW name , (
.cW salary ,
.cW age ,
and
.cW dept )
from its parent,
.cW EMP .
Furthermore, student employees have an extra field,
.cW location ,
that shows their address as a coordinate pair.
In
\*(PP, a class can inherit from zero or more other classes,\**
.(f
\**i.e.,
the inheritance hierarchy is a directed acyclic graph.
.)f
and a query can reference either all
instances of a class or all instances of a class plus all of its
descendants.
For example, the following query finds the
employees over 39:
.(C
* retrieve (E.name) from E in EMP where E.age > 39 \\g
.)C
On the other hand, to find the names of all
employees, including student employees, over age 40, the query is:
.(C
* retrieve (E.name) from E in EMP* where E.age > 39 \\g
.)C
which returns:
.(T
.TS
allbox;
l.
name
Joe
Sunita
.TE
.)T
Here the
.cW *
after
.cW EMP
indicates that the query should be run over
.cW EMP
and all classes below
.cW EMP
in the inheritance hierarchy.
.lp
Note that
.cW location
in
.cW STUD_EMP
is not a traditional
relational data type.
As we will see later, \*(PP can be
customized with an arbitrary number of user-defined data types.
.\"-----------------
.sh 3 "Time Travel"
.lp
\*(PP supports the notion of
.b "time travel" .
This
feature allows a user to run historical queries.
For example, to
find Sam's current salary, one would query:
.(C
* retrieve (E.salary) from E in EMP["now"]
      where E.name = "Sam" \\g
.)C
\*(PP will automatically find the version of Sam's record
valid at the correct time and get the appropriate salary.
.lp
One can also
give a time
.b range .
For example to see all the salaries that Sam has
ever earned, one would query:
.(C
* retrieve (E.salary)
      from E in EMP["Jan 1 00:00:00 1970 GMT", "now"]
      where E.name = "Sam" \\g
.)C
If you have executed all of the examples so far, then the above query returns:
.(T
.TS
allbox;
l.
salary
1200
1200
.TE
.)T
There are two salaries for Sam, since he was deleted from and then
re-appended to the
.cW EMP
class.
.lp
The default beginning of a time range is the origin of the system clock
(which just happens to be
.cW "Jan 1 00:00:00 1970 GMT" '' ``
on \*(UU systems),
and the default end is the current time;
thus, the above time range can be abbreviated as
.cW [,] .'' ``
.\"---------------------------------------------------------------------------
.sh 1 "User Extensions to \*(PQ"
.lp
Here, we will discuss user extensions to the \*(PQ query language,
query language functions, composite types, and user defined types, functions
and operators.
.\"------------------------------------
.sh 2 "User Defined \*(PQ Functions"
.lp
\*(PQ provides two types of functions:
.b "query language functions"
(functions written in \*(PQ)
and
.b "programming language functions"
(functions written in a separately-compiled programming language such as C.)
In this section we will cover \*(PQ functions; programming language
functions will be covered below with the discussion on user-defined types.
.lp
Any collection of commands in the \*(PQ
query language can be packaged together and defined as a
function, usually returning either a set of instances or a set of base types.
For example, the following function
.cW high_pay
returns all employees in class
.cW EMP
whose salaries exceed 50,000:
.(C
* define function high_pay
      (language = "postquel", returntype = setof EMP)
      as retrieve (EMP.all) where EMP.salary > 50000 \\g
.)C
\*(PQ functions can also have parameters.
The following function
.cW large_pay
allows the threshold salary to be specified as an argument:
.(C
* define function large_pay
      (language = "postquel", returntype = setof EMP)
      arg is (int4)
      as retrieve (EMP.all) where EMP.salary > $1 \\g
.)C
In addition to their obvious utility as
.q aliases
for commonly-used queries,
\*(PQ functions are useful for creating composite types, as described
below.
.\"------------------------------------
.sh 2 "Composite Types"
.lp
Since \*(PQ functions return instances or sets of instances,
they are the mechanism used to assign values to composite
types.
For example, consider extending the
.cW EMP
class with a
.cW manager
field.
That is, for each instance of
.cW EMP ,
we want to associate another
instance of
.cW EMP 
corresponding to the manager of the first instance.
Specifically, we will
define a \*(PQ function
.cW manager :
.(C
* define function manager
      (language = "postquel", returntype = EMP)
      arg is (EMP)
      as "retrieve (E.all) from E in EMP
              where E.name = DEPT.manager
              and DEPT.name = $1.dept" \\g
.)C
The function
.cW manager
takes an instance as its only argument, so \*(PQ allows referencing
into it with the use of the nested dot notation.
Whenever such a
function is defined over a class, a user can utilize the cascaded dot
notation to reference into (i.e. access the fields of) the objects
returned by the function.
.lp
The following query finds all the employees who work for
Joe:
.(C
* retrieve (EMP.name) where EMP.manager.name = "Joe" \\g
.)C
This is exactly equivalent to:
.(C
* retrieve (EMP.name)
      where name(manager(EMP)) = "Joe" \\g
.)C
Here, we have essentially added an attribute to the
.cW EMP
class which is of
type
.cW EMP ,
i.e. it has a value which is an instance of
the class
.cW EMP .
Since the value of
.cW manager
has a record-oriented structure, we call it a
.b "composite object" .
Consequently, the user can think of the function
.cW manager
as an attribute of
.cW EMP
and can reference it just like any other
attribute, with the following two exceptions.
First, one cannot do
direct
.cW append s\(emthat
is,
.(C
* append emp (emp.manager.name = "Smith") \\g
.)C
.b won't
work.
Non-projected
retrieves will also be rejected,
i.e.:
.(C
* retrieve (emp.manager) \\g
.)C
will result
in a warning from the \*(PQ language parser.
.lp
Note that 
.cW manager
is defined as returning a single instance of
.cW EMP.
We can also write a \*(PQ function that returns sets of
instances.
For example, consider the function
.(C
* define function children
      (language = "postquel", returntype = setof KIDS)
      arg is (EMP)
      as "retrieve (KIDS.all)
              where $1.name = KIDS.dad
              or $1.name = KIDS.mom"\\g
.)C
The
.cW children
function is defined as returning a set of instances, rather than a
single instance.
Given the query
.(C
* retrieve(emp.name, emp.children.name)
.)C
if the query in the body of the
.cW children
function returns many instances, the retrieve query will return all of
them, in a
.q flattened
form.
If the query in the body of
.cW manager
returns more than one instance,
the
.cW manager
function will return only one instance, arbitrarily chosen from the
set returned by the query in the function's body.
See the \*(PP
Reference Manual's entry on the 
.cW "define function"
command for further details and examples.

.\"---------------------------------------------------------------------------
.sh 1 "User Defined Types, Operators, and Programming Language Functions"
.lp
The central concept of extending \*(PP lies in \*(PP's ability to
.b "dynamically load"
a binary object file created by the user.
This allows \*(PP to call
arbitrary user functions which can be written in a standard programming
language.
These functions can then be used:
.ip \(bu
to convert between
.b internal
(binary) and
.b external
(character string) representations of
user-defined
types;
.ip \(bu
as operators; and
.ip \(bu
to
define ordering for indices on user-defined types.
.lp
\*(PP's concept of types includes
.b built-in
types and
.b user-defined
types.
Built-in types are those required by the system to bootstrap itself.
User-defined types are those created by the user in the manner described
below.
There is no intrinsic performance difference between using a system
type or user-defined type, other than the overhead due to the complexity of
the type itself.
.\"------------------------------------
.sh 2 "Internal storage of types"
.lp
Internally,
\*(PP regards a user-defined type as a
.q "blob of memory"
upon which
user-defined functions impose structure and meaning.
\*(PP will store and retrieve the data from disk and use user-defined
functions to input, process, and output the data.
.\"------------------------------------
.sh 2 "Functions needed for a user-defined type"
.lp
A completely defined user type requires the following user-defined functions:
.ip \(bu
.b input
and
.b output
functions for the type:
These functions determine how the type appears in strings
(for input by the user and output to the user) and how the type is organized
in memory.
These at least are necessary to define the type.
.ip \(bu
.b operator
functions for the type:
These functions define the meanings of
.q equal,
.q "less than,"
.q "greater than,"
etc., for your type.
.\"------------------------------------
.sh 2 "An Example User Defined Type"
.lp
In this discussion, we will be defining a
.cW circle
type, using functions written in the C programming language.
.\"-----------------
.sh 3 "Data structures for our type"
.lp
Before we do anything, we have to decide on what a circle looks
like, both in string format and internally in memory.
Circles
have a center and a radius, so a reasonable string representation of a circle
would be an ordered triple:
.(l
(center_x, center_y, radius)
.)l
where each element is a real number with arbitrary units,
e.g.:
.(C
(5.0, 10.3, 3)
.)C
This is what
the input to the circle input function looks like,
and what the output from the circle output function looks
like.
.lp
Now we have to come up with an internal representation for
a circle in memory.
The following
declarations are legal
and reasonable given the format we chose above:
.(C
typedef struct {
    double x, y;
} POINT;

typedef struct {
    POINT center;
    double r;
} CIRCLE;
.)C
Memory containing values of type
.cW CIRCLE
will be written to
disk and read from disk, so
.cW CIRCLE
must be both
.b complete
and
.b contiguous ;
that is, it cannot contain any pointers.
The alternate declaration
.(C
typedef struct {
    POINT *center
    double r;
} CIRCLE;
.)C
will
.b NOT
work,
because only the address stored in
.cW center
would be written to disk, not the
.cW POINT
structure that
.cW center
presumably points to.
\*(PP cannot detect this kind of coding error;
you must guard against it yourself.
.\"-----------------
.sh 3 "Defining the input and output functions for our type"
.lp
Suppose in defining our type
.q circle,
we have a C source file called
.cW circle.c ,
and a corresponding object code file
.cW /usr/postgres/tutorial/circle.o .
(All functions related to our
.cW circle
type have to be in the same
object file.)
For the sake of argument, suppose we our platform is a DECstation,
where sizeof(double) is 8 bytes (this will be important later).
.lp
We will create source file
.cW circle.c ,
containing C source code for the functions that support our
.cW CIRCLE
type.
.cW circle.c
contains
three functions:
.ip \(bu
.cW circle_in ,
which is the input function for circles.
It
takes a string as an argument and returns a pointer to a
.cW CIRCLE .
.ip \(bu
.cW circle_out ,
which is the output function for circles.
It is
takes a pointer to s
.cW CIRCLE
as input and returns a string.
.ip
The return value of
.cW circle_in
must be a legal argument to
.cW circle_out ,
and vice versa.
.ip \(bu
.cW eq_area_circle ,
which is the equality function for
circles.
For the purposes of this discussion, circles are equal
if their areas are equal.
.lp
The contents of
.cW circle.c
are:
.(C
#include <math.h>
#include <stdio.h>
#include <string.h>

#include "tmp/c.h"           /* (always)                 */
#include "utils/geo-decls.h" /* for POINT declaration    */
#include "utils/palloc.h"    /* for palloc() declaration */

typedef struct {
    POINT  center;
    double radius;
} CIRCLE;

#define LDELIM '('
#define RDELIM ')'
#define NARGS  3

CIRCLE *
circle_in(str)
    char   *str;
{
    char   *p, *coord[NARGS];
    int    i;
    CIRCLE *result;

    if (str == NULL) return(NULL);

    for (i = 0, p = str;
         *p && i < NARGS && *p != RDELIM;
         p++)
    {
        if (*p == ',' || (*p == LDELIM && !i))
            coord[i++] = p + 1;
    }

    if (i < NARGS - 1) return(NULL);

    result = (CIRCLE *) palloc(sizeof(CIRCLE));

    result->center.x = atof(coord[0]);
    result->center.y = atof(coord[1]);
    result->radius = atof(coord[2]);

    return(result);
}

char *
circle_out(circle)
    CIRCLE *circle;
{
    char   *result;

    if (circle == NULL) return(NULL);

    result = (char *) palloc(60);

    sprintf(result, "(%g, %g, %g)",
            circle->center.x, circle->center.y,
            circle->radius);

    return(result);
}

int
eq_area_circle(circle1, circle2)
    CIRCLE *circle1, *circle2;
{
    return(circle1->radius == circle2->radius);
}
.)C
.lp
Now that we have written these functions and compiled the source file,\**
.(f
\**You will need to supply an option like
.cW -I$POSTGRESHOME/src/lib/H
to your C compiler so it can find the \*(PP
.cW .h '' ``
files.
Also, various platform-specific compiler options may be required to support
\*(PP dynamic linking (for example, the DECstation ULTRIX compiler requires
the
.cW "-G0" '' ``
option.)
See
.q "define function"
in the Reference Manual for details.
.)f
we have to let \*(PP know that they exist.
First, we
run the following queries to define the input and
output functions.
These functions must be defined
.b before
we define the
type.
\*(PP will
notify you that return type circle is not defined yet, but this
is OK\**:
.(f
\**By default,
user-defined C functions use addresses instead of values for all but
.q small
(<= 4-byte) argument and return types,
so we can use the \*(PQ type
.cW char16
as a placeholder for the C type
.cW "char *" .
.)f
.(C
* define function circle_in
      (language = "c", returntype = circle)
      arg is (char16)
      as "/usr/postgres/tutorial/circle.o" \\g

* define function circle_out
      (language = "c", returntype = char16)
      arg is (circle)
      as "/usr/postgres/tutorial/circle.o" \\g
.)C
Note that the full pathname of the object code file must be specified,
so you would change
.cW /usr/postgres/tutorial
to whatever is appropriate for your installation.
.lp
Now we can define the
.cW circle
type:
.(C
* define type circle
      (internallength = 24,
       input = circle_in, output = circle_out) \\g
.)C
where
.cW internallength
is the size of the
.cW CIRCLE
structure
in bytes.
For circles, the type members are three
.cW double s,
which on most platforms are 8 bytes each, with no additional alignment
constraints.
However,
when defining your own types, you should
.b not
make assumptions about structure sizes,
but instead
write a test program
that does a
.(C
printf("size is %d\en", sizeof (MYTYPE));
.)C
on your type.
.lp
If
.cW internallength
is defined incorrectly, you will encounter strange errors which may crash
the data manager itself.
If this were to happen with our
.cW CIRCLE
type,
we would have to do a
.(C
* remove type circle \\g
.)C
and then redefine the
.cW circle
type correctly.
Note that we would
.b not
have to redefine our functions,
since their behavior would not have changed.
.\"-----------------
.sh 3 "Defining an operator for our type"
.lp
Now that we have finished defining the
.cW circle
type, we can
.cW create
classes with circles in them,
.cW append
records to them
with circles defined, and
.cW retrieve
the values of the entire list
of records.
But we can do nothing else until we have some circle
operators.
To do this, we make use of the concept of
.b "operator overloading" ,
and in this case we will set the \*(PP equality
operator
.cW = '' ``
to work for circles.
First we have to tell \*(PP
that our circle equality function exists:
.(C
* define function eq_area_circle
      (language = "c", returntype = bool)
      arg is (circle, circle)
      as "/usr/postgres/tutorial/circle.o" \\g
.)C
We will now bind this function to the
equality symbol with the following query:
.(C
* define operator =
      (arg1 = circle, arg2 = circle,
       procedure = eq_area_circle) \\g
.)C
.\"-----------------
.sh 3 "Using our type"
.lp
Let's create a class
.cW tutorial
that contains a
.cW circle
attribute,
and run some queries against it:
.(C
* create tutorial(a = circle) \\g

* append tutorial (a = "(1.0, 1.0, 10.0)"::circle) \\g

* append tutorial (a = "(2.0, 2.0, 5.0)"::circle) \\g

* append tutorial (a = "(0.0, 1.8, 10.0)"::circle) \\g

* retrieve (tutorial.all)
      where tutorial.a = "(0.0, 0.0, 10.0)"::circle \\g
.)C
which returns:
.(T
.TS
allbox;
l.
a
(1.0, 1.0, 10.0)
(0.0, 1.8, 10.0)
.TE
.)T
Recall that we defined circles as being equal if their areas were
equal.
.lp
Other operators (less than, greater than, etc.) can be defined in
a similar way.
Note that the
.cW = '' ``
symbol will still work for
other types\(emit has merely had a new type added to the list of
types it works on.
Any string of
.q "punctuation characters"
other than
brackets,
braces, or parentheses can be used in defining an operator.
.\"------------------------------------
.sh 2 "Additional info on creating a user-defined function"
.\"-----------------
.sh 3 "Use palloc and not malloc"
.lp
In order for \*(PP to correctly manage memory associated with processing your
type, you
must use the memory allocator
.cW palloc
and avoid standard \*(UU memory managers
such as
.cW malloc .
If you do not, \*(PP will chew up ever increasing amounts of
memory.
.cW palloc
has the same arguments as
.cW malloc ,
that is
.(C
char *palloc(size)
unsigned long size;
.)C
To free memory allocated with
.cW palloc ,
use
.cW pfree ,
which is analogous to
the \*(UU library function
.cW free :
.(C
void pfree(ptr)
char *ptr;
.)C
.\"-----------------
.sh 3 "Re-loading user functions"
.lp
In the process of creating a user-defined type, you may find it necessary to
re-load a function in the course of debugging.
This is
.b not
done automatically when you edit or re-compile the file, but
.b is
done if you quit and restart the data manager.
.lp
We would re-load our example functions by using the following command:
.(C
* load "/usr/postgres/tutorial/circle.o" \\g
.)C
.\"-----------------
.sh 3 "Writing a Function of an Instance"
.lp
We've already discussed user functions which take \*(PP base or user defined
types as arguments; in this section, we will discuss inheritable C functions
or methods.
.lp
C language methods are useful particularly when we want to make a function
.b inheritable ;
that is, to have the function process every instance in an inheritance
hierarchy of classes.
.lp
In using a function of an instance in qualifying an instance,
\*(PP defines the
.q "current instance"
to be the instance being qualified at the moment your function is
called.
The instance itself will be passed in your function's
parameter list as an opaque structure of type TUPLE, and you will use
\*(PP library routines to access the data in the object as described
below.\**
.(f
\**In \*(PP \*(PV,
.cW TUPLE
is defined as
.cW "void *" .
.)f
.cW 
.lp
Suppose we want to write a function to answer the query
.(C
* retrieve (EMP.all) where overpaid(EMP) \\g
.)C
In the query above, a reasonable
.cW overpaid
function might be:
.(C
bool
overpaid(t) 
TUPLE t;   /* the current instance */
{
    extern char *GetAttributeByName();
    short salary, seniority, performance;

    salary = (short) GetAttributeByName(t, "salary");
    seniority = (short) GetAttributeByName(t, "seniority");
    performance = (short) GetAttributeByName(t, "performance");

    return (salary > (seniority * performance));
}
.)C
.cW GetAttributeByName
is the \*(PP system function that returns
attributes out of the current instance.
It has two arguments: the argument of type TUPLE passed into the function,
and the name of the desired attribute.
.cW GetAttributeByName
will align data properly so you can cast its return value to
the desired type.
For example, if you have an attribute
.cW name
which is of the \*(PQ type
.cW char16 ,
the
.cW GetAttributeByName
call would look like:
.(C
char *str;
\&...
str = (char *) GetAttributeByName(t, "name")
.)C
.lp
To let \*(PP know about the
.cW overpaid
function,
do:
.(C
* define function overpaid
      (language = "c", returntype = bool)
      arg is (EMP)
      as "/usr/postgres/tutorial/overpaid.o" \\g
.)C
.lp
You can have additional complex, base or user-defined types as
arguments to the inheritable function.
Thus,
.(C
* retrieve (EMP.all)
      where overpaid2(EMP, DEPT, "bill", 8) \\g
.)C
could be written, and
.cW overpaid2
would be declared:
.(C
bool
overpaid2(emp, dept, name, number)
    TUPLE emp, dept;
    char *name;
    long number;
.)C
.\"------------------------------------
.sh 2 "Arrays of types"
.lp
As discussed above, \*(PP fully supports arrays of base types.
Additionally,
\*(PP supports arrays of user-defined types as well.
When you define a type,
\*(PP
.b automatically
provides support for arrays of that type.
.\"-----------------
.sh 3 "Arrays of user-defined types"
.lp
Using the
.q circle
example discussed above, we will create a class containing
an array of
circles:
.(C
* create circles (list = circle[]) \\g
.)C
and do some appends
.(C
* append circles (list = "{"(1.0, 1.0, 5.0)",
                         "(2.0, 2.0, 10.0)"}") \\g

* append circles (list = "{"(2.0, 3.0, 15.0)",
                         "(2.0, 2.0, 10.0)"}") \\g

* append circles (list = "{"(2.0, 3.0, 4.0)"}") \\g
.)C
We can now run queries like:
.(C
* retrieve (circles.list[1]) \\g
.)C
which returns the first element of each
.cW list :
.(T
.TS
allbox;
l.
list
(1, 1, 5)
(2, 3, 4)
.TE
.)T
and
.(C
* retrieve (circles.all)
      where circles.list[1] = "(0.0, 0.0, 4.0)" \\g
.)C
which returns:
.(T
.TS
allbox;
l.
list
{"(2, 3, 4)"}
.TE
.)T
Note the
.cW {} s,
indicating that an array has been retrieved, as opposed to a single element.
.\"-----------------
.sh 3 "Defining a new array type"
.lp
An array may be defined as an element of a class, as shown above, or it may
be defined as a type in and of itself.
This is useful for defining
.b "arrays of arrays" .
.lp
The special built-in functions
.cW array_in
and
.cW array_out
are used by \*(PP to input and output arrays of any existing type.
Here,
we define an array of integers:
.(C
* define type int_array
      (element = int4, internallength = variable,
       input = array_in, output = array_out) \\g
.)C
The
.cW element
parameter indicates that this is an array, and setting
.cW internallength
to
.cW variable
indicates that the array is a variable-length attribute.\**
.(f
\**Note
that any type using
.cW array_in
and
.cW array_out
.b must
be variable-length.
.)f
.lp
We can use our type defined above to create an array of integer arrays:
.(C
* define type int_arrays
      (element = int_array, internallength = variable,
       input = array_in, output = array_out) \\g

* create stuff (a = int_arrays) \\g

* append stuff (a = "{{1, 2, 3} , {4, 5}, {6, 7, 8}}") \\g

* append stuff (a = "{{88, 99, 3}}") \\g

* append stuff (a = "{{5, 4, 3} , {2, 2}}") \\g

* retrieve (stuff.a[1])
      where stuff.a[1][1] < stuff.a[1][2] \\g

* retrieve (stuff.a)
      where stuff.a[3][1] < stuff.a[1][2] \\g

* retrieve (s.all) from s in stuff
      where s.a[2][2] = stuff.a[1][1] \\g
.)C
We can also define operators for equality, less than, greater than, etc. which
operate on our new array type as necessary.
.\"-----------------
.sh 3 "Creating an array type from scratch"
.lp
There are many situations in which the above scheme for creating an array
type is inappropriate, particularly when it is necessary to define a
fixed-length array.
In this section, we will create an array of four longs called
.cW quarterly ,
and a variable-length array of longs called
.cW  stats .\**
.(f
We assume
.cW sizeof(long)
== 4.
.)f
.lp
The only special things we need to know when writing the input and output
functions for
.cW quarterly
is that \*(PP will pass a
.q simple
(i.e. fixed-length) array of
.cW long s
to the
output function and expect a simple array of
.cW long s
in return from the input
function.
A simple array suitable for
.cW quarterly
can be declared:
.(C
long quarterly[4];
.)C
For the variable-length array
.cW stats ,
the situation is a little more
complicated.
Because \*(PP will not know in advance how big the array is,
\*(PP will expect the length of the array (in bytes) to be encoded in the
first four bytes of the memory which contains the array.
The expected
structure is:
.(C
typedef struct {
    long length;
    unsigned char bytes[1]; /* Force contiguity */
} VAR_LEN_ATTR;
.)C
The input function for the
.cW stats
array will look something like:
.(C
VAR_LEN_ATTR *
stats_in(s)
    char s;
{
    VAR_LEN_ATTR *stats;
    long array_size, *arrayp, nbytes;

    /*
     * nbytes is the total number of bytes in stats,
     * INCLUDING the byte count at the beginning
     */
    nbytes = array_size * sizeof(long)  +  sizeof(long);

    stats = (VAR_LEN_ATTR *) palloc(nbytes);

    stats->length = nbytes;

    arrayp = &(stats->bytes[0]);

    /*
     * put code here that loads interesting stuff into
     * arrayp[0] .. arrayp[array_size]
     */

    return(stats);
}
.)C
The output function for
.cW stats
will get the same
.cW VAR_LEN_ATTR
structure.
.lp
Now,
assuming the functions are in
.cW /usr/postgres/tutorial/stats.c
and
.cW /usr/postgres/tutorial/quarterly.c ,
we
can define our two arrays.
First we will define the fixed-size array
.cW quarterly .\**
.(f
.cW internallength
== 16 follows from our assumption about
.cW sizeof(long) .
.)f
.(C
* define function quarterly_in
      (language = "c", returntype = quarterly)
      arg is (char16)
      as "/usr/postgres/tutorial/quarterly.o" \\g

* define function quarterly_out
      (language = "c", returntype = char16)
      arg is (quarterly)
      as "/usr/postgres/tutorial/quarterly.o" \\g

* define type quarterly
      (element = int4, internallength = 16,
       input = quarterly_in, output = quarterly_out) \\g
.)C
Now we define the
.cW stats
array:
.(C
* define function stats_in
      (language = "c", returntype = stats)
      arg is (char16)
      as "/usr/postgres/tutorial/stats.o" \\g

* define function stats_out
      (language = "c", returntype = char16)
      arg is (stats)
      as "/usr/postgres/tutorial/stats.o" \\g

* define type stats
      (element = int4, internallength = variable,
       input = stats_in, output = stats_out) \\g
.)C
Now we can run some queries:
.(C
* create test (a = quarterly, b = stats) \\g

* append test (a = "1 2 3 4"::quarterly,
               b = "5 6 7"::stats) \\g

* append test (a = "1 3 2 4"::quarterly,
               b = "6 4"::stats) \\g

* append test (a = "7 11 6 9"::quarterly,
               b = "1 2"::stats) \\g

* retrieve (test.all) where test.a[4] = test.b[2] \\g
.)C
which returns:
.(T
.TS
tab(|) allbox;
l l.
a|b
1324|64
.TE
.)T
.b NOTE
that when you use your own functions to input and output array types,
.b "your function"
will define how to parse the external (string) representation.
The braces
notation is only a
convention used by
.cW array_in
and
.cW array_out
and is
.b not
part of the formal \*(PQ definition.
.\"------------------------------------
.sh 2 "Large Object types"
.lp
The types discussed to this point are all
.b small
objects\(emthat is, they are
smaller than 8 Kbytes\**
.(f
\**8 * 1,024 == 8,192 bytes
.)f
in size.
If you require a larger type for something like
a document retrieval system or for storing bitmaps, you will need to use
the \*(PP
.b "large object"
interface.
The interface to large objects is quite similar to the
\*(UU file system interface.
The particulars are detailed in
Section 7 of the \*(PP Reference Manual, which you should have available to
consult as you read the following.
.\"-----------------
.sh 3 "Defining a large object"
.lp
Just like any other type, a large object type requires input and output
functions.
For the purposes of this discussion, we assume that two functions,
.cW large_in
and
.cW large_out
have been written using the large object interface, and that the compiled
functions
are in
.cW /usr/postgres/tutorial/large.o .
We also presume that we are using
the
.q "file as an ADT"
interface for large objects discussed in the Reference
Manual.
.lp
We define a large object which could be used for storing map data:
.(C
* define function large_in
      (language = "c", returntype = map)
      arg is (char16)
      as "/usr/postgres/tutorial/large.o" \\g

* define function large_out
      (language = "c", returntype = char16)
      arg is (map)
      as "/usr/postgres/tutorial/large.o" \\g

* define type map
      (internallength = variable,
       input = large_in, output = large_out) \\g
.)C
Note that
large objects are
.b always
variable-length.
.lp
Now we can use our
.cW map
object:
.(C
* create maps (name = text, a = map) \\g

* append maps (name = "earth",
               a = "/usr/postgres/maps/earth") \\g

* append maps (name = "moon",
               a = "/usr/postgres/maps/moon") \\g
.)C
Notice that the above queries are identical in syntax to those we have been
using
all along to define types and such; the fact that this type is a large object
is completely hidden in the large object interface and \*(PP storage manager.
.\"-----------------
.sh 3 "Writing functions and operators for large object types"
.lp
Like any other \*(PP type, you can define functions and operators for large
object types.
The only caveat is that, like any other functions which process
a large object, they
.b must
use the large object interface described in Section 7 of the \*(PP Reference
Manual.
Possible queries which involve functions on large objects could include
.(C
* retrieve (emp.name) where beard(emp.picture) = "red" \\g

* retrieve (mountain.name)
      where height(mountain.topomap) > 10000 \\g
.)C
Because all functionality is available
to large objects,
.b any
aspect of \*(PP is available for use with them, including index access
methods, if the appropriate operator classes have been defined.
Operator
classes for index access methods will be discussed later in this manual.
.\"---------------------------------------------------------------------------
.sh 1 "The \*(PP Rule System"
.lp
The discussion in this section is intended to provide an overview of the
\*(PP rule system and point the user at helpful references and examples.
\*(PP actually has two rule systems, the
.b instance-level
rule system and the
.b "query rewrite"
rule system.
.\"------------------------------------
.sh 2 "The Instance-level Rule System"
.lp
The instance-level rule system uses markers placed in each instance in a
class to
.q trigger
rules.
Examples of the instance-level rule system are
explained and illustrated in
.cW $POSTGRESHOME/demo ,
which is included with the
\*(PP distribution.
Additional discussion of the instance-level rule system
can be found in the Reference Manual under
.cW "define rule" .
The theoretical
foundations
of the \*(PP rule system can be found in [STON90].
.\"------------------------------------
.sh 2 "The Query Rewrite Rule System"
.lp
The query rewrite rule system modifies queries to take rules into
consideration, and then passes the modified query to the query optimizer for
execution.
It is very powerful, and can be used for many things such as
query language procedures, views, and versions.
Examples and discussion can
be found in the demo in
.cW $POSTGRESHOME/video ,
and further discussion is in the
Reference Manual under
.cW "define rule" .
The power of this rule system is
discussed in
[ONG90] and [STON90].
.\"------------------------------------
.sh 2 "When to use either?"
.lp
Since each rule system is architected quite differently, they work best in
different situations.
The query rewrite system is best when rules affect
most of the instances in a class, while the instance-level system is best when
a rule affects only a few instances.
.\"---------------------------------------------------------------------------
.sh 1 "Administering \*(PP"
.lp
In this section, we will discuss aspects of \*(PP of interest to those
making extensive use of \*(PP, or who are the database administrator for
a group of \*(PP users.
.\"------------------------------------
.sh 2 "User administration"
.lp
The
.cW createuser
and
.cW destroyuser
enable and disable access to \*(PP by specific users on the host system.
Please read the descriptions of these
commands in the Reference Manual for specifics on their use.
.\"------------------------------------
.sh 2 "Moving database directories out of $POSTGRESHOME/data/base"
.lp
By default,
all \*(PP databases are stored in separate subdirectories under
.cW $POSTGRESHOME/data/base/ .\**
.(f
\**Data for certain classes may stored elsewhere if
a non-standard storage manager was specified
when they were created.
.)f
To move a particular data base to an alternate directory (e.g., on a
filesystem with more free space),
do the following:
.ip \(bu
Create the database
(if it doesn't already exist)
using the
.b createdb
command.
In the following steps
we will assume the database is named
.cW foo .
.ip \(bu
Copy
the directory
.cW $POSTGRESHOME/data/base/foo
and it contents
to its ultimate destination.
It should still be owned by the
.cW postgres
user.
.ip \(bu
Remove the directory
.cW $POSTGRESHOME/data/base/foo .
.ip \(bu
Make a symbolic link in
.cW $POSTGRESHOME/data/base
to the new directory.
.\"------------------------------------
.sh 2 "Troubleshooting \*(PP"
.lp
Occasionally, \*(PP will fail with cryptic error messages that are due to
relatively simple problems.
The following are a list of \*(PP error
messages and the likely fix.
These messages are ones you would likely see
in the
monitor
program.
.(C
Message: semget: No space left on device

Explanation and Likely Fix:
.)C
Either the kernel has not been configured for System V shared memory, or some
other program is using it up.
On most machines, the \*(UU command
.cW ipcs
will
show shared memory and semaphore usage.
.lp
To delete all shared memory and
semaphores (may be necessary if a backend fails), run the
.cW ipcclean
command.
Note, however, that
.cW ipcclean
deletes
.b all
semaphores belonging
to the user running it, so the user should be certain that none of his/her
non-\*(PP
processes are using semaphores before running this command.
.(C
Message: Unable to get shared buffers

Explanation and Likely Fix:
.)C
This message means that a \*(PP backend was expecting shared memory to be
available and it was not.
Usually this is due to
.cW ipcclean
being run while a
.cW postmaster
was also running.
.(C
Message: Can't connect to the backend (...)

Explanation and Likely Fix:
.)C
This message means that you are running a \*(LP application but it could
not link up with a
.cW postmaster .
If you see this error message, you should
see if a
.cW postmaster
is truly running.
If one is running, the problem is
likely related to your network.
.\"---------------------------------------------------------------------------
.sh 1 "REFERENCES"
.\"------------------------------------
.xP [ONG90]
Ong, L. and Goh, J.,
``A Unified Framework for Version Modeling Using Production Rules in a Database
System,"
Electronics Research Laboratory,
University of California, Berkeley,
ERL Memo M90/33,
April 1990.
.\"------------------------------------
.xP [ROWE87]
Rowe, L. and Stonebraker, M.,
``The POSTGRES Data Model,''
Proc. 1987 VLDB Conference,
Brighton, England,
Sept. 1987.
.\"------------------------------------
.xP [SCHA90]
Shapiro, L.,
``Join Processing in Database Systems with Large Main Memories,''
ACM-TODS,
Sept. 1986.
.\"------------------------------------
.xP [STON86]
(missing)
.\"------------------------------------
.xP [STON87]
Stonebraker, M.,
``The POSTGRES Storage System,''
Proc. 1987 VLDB Conference,
Brighton, England,
Sept. 1987.
.\"------------------------------------
.xP [STON88]
(missing)
.\"------------------------------------
.xP [STON90]
Stonebraker, M. et. al.,
``On Rules, Procedures, Caching and Views in Database Systems,''
Proc. 1990 ACM-SIGMOD Conference on Management of Data,
Atlantic City, N.J.,
June 1990.
.\"------------------------------------
.xP [STON90B]
(missing)
.\"------------------------------------
.xP [WANG88]
(missing)
.\"---------------------------------------------------------------------------
.uh "APPENDIX: User defined types and indices"
.lp
In this section, we will discuss how to extend \*(PP to use a user-defined
type and associated functions with existing access methods.
This way, you
can define a BTREE or RTREE index on your own type.
To do this, we will
discuss how to define a new operator class in \*(PP
for use with an existing access method.
.lp
Our example will be to add a new operator class to the BTREE access method.
The new operator class will sort integers in ascending absolute value
order.
This tutorial will describe how to define the operator class.
If you work the example,
you will be able to define and use indices that sort integer
keys by absolute value.
.lp
There are several \*(PP system classes that are important in understanding how
the
access methods work.
These will be discussed, and then a sample procedure
for adding a new set of operators to an existing access method will be shown
as an example.
.lp
The
.cW pg_am
class contains one instance for every user defined access method.
Support for the HEAP access method is built into \*(PP,
but every other access method is described here.
The schema is
.TS
center tab(|);
lf(C)|l.
amname|name of the access method
_
amowner|object id of the owner's instance in pg_user
_
amkind|not used at present, but set to 'o' as a place holder
_
amstrategies|number of strategies for this access method (see below)
_
amsupport|number of support routines for this access method (see below)
_
am*|T{
procedure identifiers for interface
routines to the access method.
For
example,
.cW regproc
ids for opening, closing,
and getting instances from the access
method appear here.
T}
.TE
The object ID of the instance in
.cW pg_am
is used as a foreign key in lots of
other classes.
For BTREES, this object ID is 403.
You don't need to add
a new instance to this class; all you're interested in is the object ID of
the access method instance you want to extend:
.(C
* retrieve (pg_am.oid) where pg_am.amname = "btree" \\g
.TS
allbox;
l.
oid
403
.TE
.)C
The
.cW amstrategies
attribute exists to standardize comparisons across data
types.
For example, BTREES impose a strict ordering on keys, less to
greater.
Since \*(PP allows the user to define operators, \*(PP cannot
in general look at the name of an operator (eg,
.cW > ,
.cW < )
and tell what kind
of comparison it is.
In fact, some access methods (like rtrees) don't impose
a less-to-greater ordering, but some other ordering, like containment.
\*(PP needs some consistent way of taking a scan qualification, looking at
the operator, deciding if a usable index exists, and rewriting the query
qualification in order to improve access speeds.
This implies
that \*(PP needs to know, for example, that
.cW <=
and
.cW >
partition
a BTREE.
Strategies is the way that we do this.
.lp
Defining a new set of strategies is beyond the scope of this discussion, but
how the BTREE strategies work will be explained, since you'll need to know that
to add a new operator class.
In the
.cW pg_am
class, the
.cW amstrategies
attribute is the number of strategies defined for this access method.
For BTREES, this
number is 5.
These strategies correspond to
.TS
center tab(|);
l|l.
less than|1
_
less than or equal|2
_
equal|3
_
greater than or equal|4
_
greater than|5
.TE
The idea is that you'll add procedures corresponding to the comparisons
above to the
.cW pg_amop
relation (see below).
The access method code can use
these numbers, regardless of data type, to figure out how to partition the
BTREE, compute selectivity, and so on.
Don't worry about the details of adding
procedures yet; just understand that there's a set of these for
.cW int2 ,
.cW int4 ,
.cW oid ,
and every other data type on which a BTREE can operate.
.lp
Strategies are used by all of the \*(PP access methods.
Some access methods require other support routines in order to work.
For example,
the BTREE access method must be able to compare two keys and determine
whether one is greater than, equal to, or less than the other.
Similarly,
the RTREE access method must be able to compute intersections,
unions,
and sizes of rectangles.
These operations do not correspond to user qualifications in
\*(PQ queries;
they are administrative routines used by the access methods,
internally.
.lp
In order to manage diverse support routines consistently across
all \*(PP access methods,
.cW pg_am
includes a field called
.cW amsupport .
This field records the number of support routines used by
an access method.
For BTREES, this number is one\(emthe routine to take two keys and return
\(mi\^1,
0,
or
\(pl\^1,
depending on whether the first key is less than,
equal to,
or greater than the second.
.lp
The
.cW amstrategies
entry in
.cW pg_am
is just the number of strategies defined
for the access method in question.
The procedures for less than, less equal, and so on don't appear in
.cW pg_am .
Similarly,
.cW amsupport
is just the number of support routines required by the access method.
The actual routines are listed elsewhere.
.lp
The next class of interest is
.cW pg_opclass.
This class exists only to associate a name with an
.cW oid .
In
.cW pg_amop ,
every BTREE operator class has a set of procedures,
one through five, above.
Some existing opclasses are
.cW int2_ops ,
.cW int4_ops , and
.cW oid_ops .
You need to add an instance with your opclass name (for example,
.cW int4_abs_ops )
to
.cW pg_opclass .
The
.cW oid
of this instance is a foreign key in other classes.
.(C
* append pg_opclass (opcname = "int4_abs_ops") \\g

* retrieve (cl.oid, cl.opcname) from cl in pg_opclass
      where cl.opcname = "int4_abs_ops" \\g
.TS
tab(|) allbox;
l l.
oid|opcname
17314|int4_abs_ops
.TE
.)C
.b NOTE:
The
.cW oid
for your
.cW pg_opclass
instance
.b "may be different" !
You should
substitute your value for 17314
wherever it appears in this discussion.
.lp
So now we have an access method and an operator class.
We still need a set of operators;
the procedure for defining operators was discussed earlier in this manual.
For the
.cW int4_abs_ops
operator class on BTREES,
the operators we require are:
.(l
absolute value less-than
absolute value less-than-or-equal
absolute value equal
absolute value greater-than-or-equal
absolute value greater-than
.)l
Suppose the code that implements the functions defined is stored in the file
.cW /usr/postgres/tutorial/int4_abs.c .
The code is
.(C
/*
 * int4_abs.c -- absolute value comparison functions
 *               for int4 data
 */

#include "tmp/c.h"

#define ABS(a) a = ((a < 0) ? -a : a)

bool int4_abs_lt(a, b) int32 a, b;
     { ABS(a); ABS(b); return (a < b); }

bool int4_abs_le(a, b) int32 a, b;
     { ABS(a); ABS(b); return (a <= b); }

bool int4_abs_eq(a, b) int32 a, b;
     { ABS(a); ABS(b); return (a == b); }

bool int4_abs_ge(a, b) int32 a, b;
     { ABS(a); ABS(b); return (a >= b); }

bool int4_abs_gt(a, b) int32 a, b;
     { ABS(a); ABS(b); return (a > b); }
.)C
There are a couple of important things that are happening below.
First, note that operators for less, less equal, equal, greater equal,
and greater for
.cW int4
are being defined.
All of these operators are already defined
for
.cW int4
under the names
.cW < ,
.cW <= ,
.cW = ,
.cW >= ,
and
.cW > .
The new operators behave differently, of course.
In order to guarantee that \*(PP uses
these new operators rather than the old ones,
they need to be named differently from the old ones.
This is a key point:  you can overload operators in
\*(PP, but only if the operator isn't already defined for the argument
types.
.lp
That is, if you have
.cW <
defined for
.cW int4 , (
.cW int4 ),
you can't define it again.
\*(PP
.b doesn't
check this when you define your operator,
so be careful.
To avoid this problem,
odd names will be used for the operators.
If you get this wrong,
the access methods are likely to crash when you try to do scans.
.lp
The other important point is that all the functions return
.b boolean
values;
the access methods rely on this fact.
.(C
* define function int4_abs_lt
      (language = "c", returntype = bool)
      arg is (int4, int4)
      as "/usr/postgres/tutorial/int4_abs.o" \\g

* define function int4_abs_le
      (language = "c", returntype = bool)
      arg is (int4, int4)
      as "/usr/postgres/tutorial/int4_abs.o" \\g

* define function int4_abs_eq
      (language = "c", returntype = bool)
      arg is (int4, int4)
      as "/usr/postgres/tutorial/int4_abs.o" \\g

* define function int4_abs_ge
      (language = "c", returntype = bool)
      arg is (int4, int4)
      as "/usr/postgres/tutorial/int4_abs.o" \\g

* define function int4_abs_gt
      (language = "c", returntype = bool)
      arg is (int4, int4)
      as "/usr/postgres/tutorial/int4_abs.o" \\g
.)C
Now define the operators that use them.
As noted,
the operator names must be unique for two
.cW int4
operands.
You can do a query on
.cW pg_operator :
.(C
* retrieve (pg_operator.all) \\g
.)C
to see if your name is taken for the types you want.
The important things here are the procedure
(which are the C functions defined above)
and the restriction and join selectivity functions.
You should just use the ones used below\(emnote that there are different such
functions for the less-than,
equal,
and greater-than cases.
These
.b must
be supplied,
or the access method will die when it tries to use the operator.
You should copy the names for
.cW restrict
and
.cW join ,
but use the procedure names you defined in the last step.
.(C
* define operator <<&
      (arg1 = int4, arg2 = int4, procedure=int4_abs_lt,
       associativity = left, restrict = intltsel,
       join = intltjoinsel) \\g

* define operator <=&
      (arg1 = int4, arg2 = int4, procedure = int4_abs_le,
       associativity = left, restrict = intltsel,
       join = intltjoinsel) \\g

* define operator ==&
      (arg1 = int4, arg2 = int4, procedure = int4_abs_eq,
       associativity = left, restrict = eqsel,
       join = eqjoinsel) \\g

* define operator >=&
      (arg1 = int4, arg2 = int4, procedure = int4_abs_ge,
       associativity = left, restrict = intgtsel,
       join = intgtjoinsel) \\g

* define operator >>&
      (arg1 = int4, arg2 = int4, procedure = int4_abs_gt,
       associativity = left, restrict = intgtsel,
       join = intgtjoinsel) \\g
.)C
Notice that five operators corresponding to less, less equal, equal, greater,
and greater equal are defined.
.lp
We're just about finished. the last thing we
need to do is to update the
.cW pg_amop
relation.
To do this, we need the
following attributes:
.TS
center tab(|);
lf(C)|l.
amopid|T{
the
.cW oid
of the
.cW pg_am
instance for BTREE (== 400, see above)
T}
_
amopclaid|T{
the
.cW oid
of the
.cW pg_opclass
instance for
.cW int4_abs_ops
(== whatever you got instead
of 17314,
see above)
T}
_
amopopr|T{
the
.cW oid s
of the operators for the
opclass (which we'll get in just a
minute)
T}
_
T{
amopselect,
.br
amopnpages
T}|cost functions.
.TE
The cost functions are used by the query optimizer to decide whether
or not to use a given index in a scan.
Fortunately, these already
exist.
The two functions we'll use are
.cW btreesel,
which estimates the
selectivity of the btree, and
.cW btreenpage,
which estimates the number of
pages a search will touch in the tree.
.lp
So we need the
.cW oid s
of the operators we just defined.
We'll look up the
names of all the operators that take two
.cW int4 s,
and pick ours out:
.(C
* retrieve (o.oid, o.oprname)
      from o in pg_operator, t in pg_type
      where o.oprleft = t.oid and o.oprright = t.oid
          and t.typname = "int4" \\g
.)C
which returns:
.(T
.TS
tab(|) allbox;
l l.
oid|oprname
96|\\=
97|<
514|*
518|!=
521|>
523|<=
525|>=
528|/
530|%
551|+
555|-
17321|<<&
17322|<=&
17323|==*
17324|>=&
17325|>>&
.TE
.)T
(Note
that your
.cW oid
numbers may be different.)
The operators we are interested in are those with
.cW oid s
17321 through 17325.
The values you get will probably be different, and you should substitute
them for the values below.
We can look at the operator names and pick out the ones we just added.
(Of course, there are lots of other queries we could used to get the oids
we wanted.)
.lp
Now we're ready to update
.cW pg_amop
with our new operator class.
The most
important thing in this entire discussion is that the operators are
ordered, from less equal through greater equal, in
.cW pg_amop .
Recall that
the BTREE instance's
.cW oid
is 400
and
.cW int4_abs_ops
is
.cW oid
17314.
Then we
add the instances we need:
.(C
* append pg_amop
      (amopid = "400"::oid,          /* btree oid        */
       amopclaid = "17314"::oid,     /* pg_opclass tuple */
       amopopr = "17321"::oid,       /* <<& tup oid      */
       amopstrategy = "1"::int2,     /* 1 is <<&         */
       amopselect = "btreesel"::regproc,
       amopnpages = "btreenpage"::regproc) \\g

* append pg_amop (amopid = "400"::oid,
                  amopclaid = "17314"::oid,
                  amopopr = "17322"::oid,
                  amopstrategy = "2"::int2,
                  amopselect = "btreesel"::regproc,
                  amopnpages = "btreenpage"::regproc) \\g

* append pg_amop (amopid = "400"::oid,
                  amopclaid = "17314"::oid,
                  amopopr = "17323"::oid,
                  amopstrategy = "3"::int2,
                  amopselect = "btreesel"::regproc,
                  amopnpages = "btreenpage"::regproc) \\g

* append pg_amop (amopid = "400"::oid,
                  amopclaid = "17314"::oid,
                  amopopr = "17324"::oid,
                  amopstrategy = "4"::int2,
                  amopselect = "btreesel"::regproc,
                  amopnpages = "btreenpage"::regproc) \\g

* append pg_amop (amopid = "400"::oid,
                  amopclaid = "17314"::oid,
                  amopopr = "17325"::oid,
                  amopstrategy = "5"::int2,
                  amopselect = "btreesel"::regproc,
                  amopnpages = "btreenpage"::regproc) \\g
.)C
NOTE the order:
.q less
is 1,
.q "less equal"
is 2,
.q equal
is 3,
.q "greater equal"
is 4,
and
.q greater
is 5.
.lp
Okay, now it's time to test the new opclass.
First we'll create and
populate a class:
.(C
* create pairs (name = char16, number = int4) \\g

* append pairs (name = "mike", number = -10000) \\g

* append pairs (name = "greg", number = 3000) \\g

* append pairs (name = "lay peng", number = 5000) \\g

* append pairs (name = "jeff", number = -2000) \\g

* append pairs (name = "mao", number = 7000) \\g

* append pairs (name = "cimarron", number = -3000) \\g

* retrieve (pairs.all) \\g
.TS
tab(|) allbox;
l l.
name|number
mike|-10000
greg|3000
lay peng|5000
jeff|-2000
mao|7000
cimarron|-3000
.TE
.)C
Okay, looks pretty random.
Define an index using the new opclass:
.(C
* define index pairsind on pairs
      using btree (number int4_abs_ops) \\g
.)C
Now run a query that doesn't use one of our new operators.
What we're
trying to do here is to run a query that
.b won't
use our index, so that
we can tell the difference when we see a query that
.b does
use the index.
This query won't use the index because the operator we use in the qualification
isn't one that appears in the list of strategies for our index.
.(C
* retrieve (pairs.all) where pairs.number < 9000 \\g
.TS
tab(|) allbox;
l l.
name|number
mike|-10000
greg|3000
lay peng|5000
jeff|-2000
mao|7000
cimarron|-3000
.TE
.)C
Yup, just as random; that didn't use the index.
Okay, let's run a query
that
.b does
use the index:
.(C
* retrieve (pairs.all) where pairs.number <<& 9000 \\g
.TS
tab(|) allbox;
l l.
name|number
jeff|-2000
cimarron|-3000
greg|3000
lay peng|5000
mao|7000
.TE
.)C
Note that the
.cW number
values are in order of increasing absolute value
(as they should be, since the index was used for this scan) and that
we got the right answer\(emthe instance for
.cW mike
doesn't appear, because \(mi10000 >=& 9000.