Postgres Architecture and Historical Lore
        1. Introduction
          In the beginning, there was Ingres. And then, after ways to improve the relational model made them-
        selves known, and as ways to implement object-oriented techniques of data organization without com-
        pletely throwing out time-honored ways of doing things became necessary, Postgres came into being.
          The original release of Postgres was partially written in Franz Lisp and partially written in C. How-
        ever, this proved to be a rather difﬁcult environment in which to work and debug, and was quite a problem
        for any potential users - it was painfully slow, and required expensive third-party licenses for the Lisp
        environment. Because of this, Postgres was ported to C, and the ﬁrst C-only release went out in late 1988.
          Subsequent releases of the Postgres software have stressed improvements and complete rewrites of
        various features including the Postgres rule system, the transaction system, the executor, access methods,
        as well as several ports to different machines.
          In the following discussions, a general overview of the internal structure of Postgres will be
        presented, together with references to directories where these structures may be found. This is intended to
        be a rather quick "tour" of the Postgres code, stressing general principles and ideas rather than details such
        as function names. For that level of detail, the code itself is the best (and only) reference. It must be
        admitted here that precision is being avoided as much to keep this document from rapidly getting out of
        date as much as anything else.
        2. General overview of Postgres at the communication level
          Thethree main programs in the Postgres environment are the following:
            User Applications
            The"postmaster"
            ThePostgres backend
        2.1. User Applications
          AUser Application is any program that uses LIBPQ to send and recieve data from a Postgres back-
        end. A User Application can run on any host that can access the Postgres server from a TCP-IP network.
        User applications included in the Postgres distribution include the terminal monitor (monitor), the database
        creation program (createdb), and the database deletion program (destroydb).
        2.2. The postmaster
          The Postmaster handles most of the network initialization and connection activities. Once the Post-
        master has handled the network stuff for a user application, it forks off a Postgres backend. Once this is
        done, the Postmaster is no longer involved, and goes back to waiting for connections from new user appli-
        cations.
          The ﬁrst time the Postmaster is invoked, it allocates shared memory and semaphores used by
        Postgres backends for locking and buffer pools.
          The postmaster and the backend have to run on the database server, and the database directories
        should be on local disks.
        2.3. The Postgres backend
          The Postgres backend is the true database engine. Once it has been forked by a postmaster, it is
        ready to recieve queries from and send back answers to the user application.
          For every user application, there is one Postgres backend forked by the Postmaster.
        3. Overview of the Postgres Main Modules
          In this section, the general program ﬂow in Postgres will be discussed, with a brief description of
        each program module, together with instructions on where to ﬁnd the source code for that module. All
        modules discussed here are part of the Postgres backend, and are relatively tightly coupled at this time.
        ThePostgres main modules include:
            o The Parser
            o The Query Rewrite Rule System
                                                       o The Planner/Optimizer
                                                       o The Executor
                                                       o The Instance Level Rule System
                                                       o The Access Methods
                                                       o Housekeeping functions
                                                       o Lock managers, cache utilities, and other miscellaneous functions.
                                                In the execution of a Retrieve query, the following general ﬂow is followed. Appends and other
                                   queries that access data (as opposed to "housekeeping" queries such as createdb, copy, etc) follow a similar
                                   ﬂow.
                                                                     ----------------
                                                        |   1. Parser |
                                                        ----------------
                                                                 |
                                                                 | (Initial Query Parse Tree)
                                                                 |
                                                                 v
                                                +-------------------------------+
                                                | 2. Query Rewrite Rule System |
                                                +-------------------------------+
                                                                 |
                                                                 | (Re-written Query Parse Tree)
                                                                 |
                                                                 |
                                                                 v
                                                     +-----------------------+
                                                     | 3. Planner/Optimizer |
                                                     +-----------------------+
                                                                 |
                                                                 | (Execution Plan Tree)
                                                                 |
                                                                 v
                                                       +------------------+
                                                       | 4. Executor                    |
                                                       +------------------+
                                                                 |
                                                                 | (Open request)
                                                                 |
                                                                 |
                                                                 |             (get instances until done)
                                                                 v
                                                        +-------------------+                                   +-----------------+
                                                        | 5. Access Methods |<-------------| 4. Executor                                                   |
                                                        +-------------------+                                   +-----------------+
                                                                 |                                           ˆ
                                                                 | (instance)                                       |
                                                                 |                                           | (instance)
                                                                 v                                            |
                                                        +-------------------------------+                                     |
                                                        | 6. Instance Level Rule System |---------+
                                                        +-------------------------------+
                                   3.1. The Parser
                                                The Parser parses a Postquel query and generates a parse-tree. As long as a correct parse-tree is gen-
                                   erated, other parsers for other query languages (ie SQL) can be "dropped in" to the Postgres system. The
                                   Postquel query parser lives in:
                         ˜/src/parser
                3.2. The Query Rewrite rule system
                      The Query Rewrite rule system is essentially part of the parser. It generally works in the following
                manner: If there is a query-rewrite rule on this class of instances, add the rule to the initial user query
                through the use of boolean algebra. If not, just let the query go through.
                      The Query Rewrite rule system is used to implement not only obvious rules, but also versions, views,
                and postquel procedures. The Query Rewrite system lives in:
                         ˜/src/rewrite
                3.3. The Planner/Optimizer
                      The Planner/Optimizer takes a parse-tree and, using various cost functions and heuristics, generates
                an execution plan. The Planner/Optimizer also "drives" the executor in that it calls it once to initialize
                things and then calls it subsequently to fetch instances. The planner lives in:
                         ˜/src/planner
                3.4. The Executor
                      The ﬁrst time the Executor is invoked by the Planner, it initializes itself, the access methods, and the
                instance-level rule system. In subsequent calls by the Planner, it walks the execution plan, fetching
                instances by calling the access methods and checking them against the query qualiﬁcation (which is part of
                the execution plan) to see if they are part of the "answer" to the user query. Before doing this, however, it
                checks to see if a fetched instance triggers a instance-level rule. The Executor lives in:
                         ˜/src/executor
                3.5. The Access Methods
                      These are the low-level routines that hit the disk and handle any indices (ie btrees, rtrees) that the
                user may have deﬁned. The Heap access method is used as the primary access method, and other access
                methods are deﬁned on indices. The Executor calls these routines to fetch instances - the Access Methods
                then use a scanning mechanism deﬁned by the Planner to get them and fetch them back to the executor.
                TheAccess Methods primarily live in:
                         ˜/src/access/common    (Code used by all access methods)
                         ˜/src/access/heap    (Heap access method - the lowest level)
                         ˜/src/access/{index index-btree index-ftree index-rtree}
                                        (Code to handle indices, and
                                         Various different types of indices)
                         ˜/src/access/transam  (The Postgres Transaction System)
                3.6. The Instance-level Rule System
                      The Instance-level rule system operates in the following manner: in the "system data" part of each
                instance, there is a ﬁeld for instance level "rule locks". If a fetched instance has a rule lock, the associated
                rule(s) is executed. Each instance-level rule has an associated execution plan, so the executor will be ran
                from within the instance-level rule system. The modiﬁed instance, if any, is then handed back to the exe-
                cutor for further processing. The Instance-level Rule System lives in:
                         ˜/src/rules/prs2 (Postgres Rule System 2 - the bulk of the rule system)
                         ˜/src/rules/stubs
                3.7. Housekeeping Functions
                      For queries that are not directly related to retrieving or appending, such as creating databases,
                deﬁning new operators, rules, and registering user functions, the parser simply bypasses the planner and
                executor and directly calls utility routines to handle these special commands. The code for these routines is
                    in:
                                ˜/src/commands
                    3.8. Lock managers, cache utilities, and other miscellaneous functions
                           There are numerous other functions that are used in Postgres; these include functions to manage sys-
                    tem attributes, lock managers, buffer and cache managers, hash table handlers, as well as the system built-
                    in functions for pre-deﬁned types. For lack of anything better to call these, these live in the UTIL module,
                    and live in:
                                ˜/src/utils/adt  (built-in functions)
                                ˜/src/utils/cache (cache handlers)
                                ˜/src/utils/error (error handlers)
                                ˜/src/utils/fmgr (the "Function Manager" - handles ADT’s and user functions)
                                ˜/src/utils/hash (hash table handlers)
                                ˜/src/utils/init (initialization code)
                                ˜/src/utils/mmgr (memory manager code)
                                ˜/src/utils/sort (sort code)
                                ˜/src/utils/time (time range qualiﬁcation handlers)
                    3.9. ’Main’ Programs
                           The "main programs" for the Postgres backend, the Postmaster, and other utilities like the vacuum
                    daemon, etc live in
                                ˜/src/support (everything but the backend)
                                ˜/src/tcop   (the backend main program - in postgres.c)
                    4. Postgres internal data structures
                           Since Postgres was originally written in Lisp, and for a time was a Lisp-C hybrid, many of its inter-
                    nal data structures are rather opaque to the C programmer. However, they are not overly difﬁcult to under-
                    stand and use once the basic ideas behind them are made clear. The discussion in this section will attempt
                    to make these ideas clear, without dwelling excessively on details and trivia.
                    4.1. The CONS-cell abstraction
                           Internally, Postgres passes data about primarily in trees made of Lisp-like CONS-cell structures.
                    These CONS-cells can point to interesting data only, to interesting data and another CONS-cell, or to two
                    other CONS-cells. In particualar, parse-trees and execution plans are encoded in these CONS-cell struc-
                    tures. Utility functions for handling the Lisp-like structures are in
                                ˜/src/lib/C
                    4.2. The Postgres Node System
                           Acollection of data structures that is central to most Postgres operations is the Postgres "node sys-
                    tem". This is a collection of data structures that encode various directives for the executor and planner, as
                    well as numerous other "utility" purposes. Nodes are declared using a C++-like syntax that is used by a
                    shell script to generate initialization and accessor functions. C preprocessor magic is used to ultimately
                    turn this declaration into an ordinary C typedef.
                    4.2.1. An Example Node
                           AnexampleofaPostgresnodeisthefollowing:
                                class (Oper) public (Expr) {
                                /* private: */
                                          inherits(Expr);
                                          InstanceId          opno;
                                          InstanceId          opid;
                                          bool            oprelationlevel;
                                          InstanceId          opresulttype;
                                          int            opsize;
                                 FunctionCachePtr  op_fcache;
                         /* public: */
                         };
                         (this particular node is from ˜/src/lib/H/nodes/primnodes.h)
                     This particular node is used in execution plans to indicate that an operator must be executed. The
                "inherits" macro is used to inherit ﬁelds from the previously deﬁned Expr (expression) node, and the expli-
                cit ﬁelds are ﬁelds that are unique to the Oper node itself. Declarations for all Postgres nodes are in header
                ﬁles in
                         ˜/src/lib/H/nodes
                     As stated earlier, shell scripts read the node declarations and generate initializer and accessor func-
                tions for them. As long as the above declaration scheme is used, these scripts will work for new nodes as
                well. These scripts are in
                         ˜/src/lib/Gen
                4.2.2. Usage of the Node System
                     The different nodes are primarily used to give directives to the Executor in the execution plan.
                Nodes are used to indicate the scan type to be used, whether sorting is to be used or not, and are used to
                indicate how query qualiﬁcations are to be handled.
                4.2.3. Printing Nodes and Node Structures
                     The best place to ﬁgure out what a structure using nodes and CONS-cells is like is to look carefully
                at the code in
                         ˜/src/lib/C
                     The code in this directory prints node structures into strings and dumps them into strings. This code
                is primarily used by the Instance Level Rule System to save and restore execution plans that are associated
                with rules, which it stores on disk in a system class in string format. The function
                         LispDisplay(CONS-cell, 0)
                     displays the contents of a CONS-cell structure in a human readable but rather unfriendly format.
                4.3. Other Important Data Structures
                     Aside from the CONS-cell and Node structures, there are very few other important global data struc-
                tures, although there are numerous local structures that are important if one is modifying variosu functions.
                One exception is the HeapTuple data structure and its associated structures, since it is the type returned by
                virtually all the access methods, including those that handle system classes. The declaration for the Heap-
                Tuple type is in
                         ˜/src/access/htup.h
                5. The Postgres Function Manager
                     An integral part of Postgres is its ability to use user-deﬁned functions and operators. The Postgres
                Function Manager is responsible for handling this aspect of Postgres.
                5.1. The Function Manager
                     The Function Manager is used by all parts of Postgres that examine the internals of user data. It con-
                sists of several parts:
                         o The Function Manager Interface
                         o Data structures containing information about dynamically loaded ﬁles
                         o The dynamic loader
        5.1.1. The Function Manager Interface
          The basic reason for the existance of the Function Manager is that due to the nature of user-deﬁned
        functions, calls to them cannot be hardcoded anywere in the Postgres backend. A mechanism must exist
        which obtains addresses of user-deﬁned and builtin functions (there is very little difference between these),
        handles the task of calling user-deﬁned functions with appropriate arguments, and returning an appropriate
        value. The Function Manager Interface provides the abstraction from the details of calling user-deﬁned
        functions from the rest of Postgres. The source to the Function Manager Interface is in
            ˜/src/utils/fmgr/fmgr.c
        5.1.2. Data Structures used for Function Execution
          In the Function Manager, there are two in-memory data structures used for looking up function
        addresses. One is an array of builtin functions that is sorted by function OID. Another is a list of dynami-
        cally loaded ﬁles, which contains the addresses of each function in these ﬁles. Functions that populate and
        access these data structures are in
            ˜/src/utils/fmgr/dfmgr.c
            ˜/src/port/*/dynloader.c
          and the data structures themselves are deﬁned in
            ˜/src/lib/H/utils/dynamic_loader.h
            ˜/src/lib/H/utils/fmgr.h
            ˜/src/lib/H/utils/builtins.h
            ˜/src/lib/H/catalog/pg_type.h
          and
            ˜/src/utils/fmgr/fmgr.c
        5.1.3. The Dynamic Loader
          The Dynamic Loader loads a user-deﬁned function from a ﬁle and determines the names and
        addresses of each function in that ﬁle. It then populates appropriate data structures with this information.
        Thesource to the Dynamic Loader is in
            ˜/src/utils/fmgr/dfmgr.c
            ˜/src/port/*/dynloader.c
        5.2. General Algorithm
          The Postgres Function Manager uses the following general algorithm for determining the addresses
        of functions. Once the address is found, the Function Manager Interface has functions which actually call
        the function.
          Generalalgorithm
          Is function in builtin list? If it is, return its address from this list
          If the function is not in the list of builtins, it is a dynamically loaded function. If so, look for the
        function name in the list of dynamically loaded functions. If it is, return its address.
          If the function has not yet been found, dynamically load it and determine its address. Return this
        address.
        6. Howdoesauser-deﬁnedfunctionget executed?
          The program ﬂow discussed in Section 3 - with the exception of the rule systems - is essentially simi-
        lar to that of any data manager. How, then, is Postgres so different? The main differences between
        Postgres and other data managers is in its concept of what data is. In Postgres, data is whatever the user
        says it is - the Postgres backend itself imposes no arbitrary deﬁnitions on data other than that it is a blob of
        memoryofaknownsize.
          The user deﬁnes what data is by deﬁning functions and binding them to various operators. These
        operators are then bound to user-deﬁned types, and thus the system is complete. In this section, how a
        user-deﬁned function is executed will be discussed.
          For the sake of this discussion (since operators will be discussed later), assume that Postgres is
        operating on the following query
            * retrieve (emp.all) where overpaid(emp.salary)
          and that overpaid(x) is a user-deﬁned function taking an integer as an argument.
          Initially, the Parser will walk this query, and generate a parse tree. While doing this, it will examine
        the following system classes:
            o PG_PROC-informationaboutPostgresfunctions is stored here.
            o PG_TYPE-informationaboutPostgrestypes is stored here.
          PG_PROCcontains information about the C ﬁle containing the function (if any), its return value, its
        number of arguments, its argument types, and other information. The Parser will also check PG_TYPE to
        make sure that emp.salary is an appropriate argument for overpaid(). Once everything has been found to
        be proper, appropriate nodes containing information about the function will be inserted into the parsetree.
          After the Planner has turned the parsetree into an execution plan, the Executor will walk the plan,
        executing overpaid() on each instance fetched from the "emp" class.
          The ﬁrst time overpaid() is executed, the Executor will call the Postgres Function Manager to obtain
        the address of the function. (See discussion in Section 5 for details on the Function Manager). In subse-
        quent calls, it will simply give Function Manager the address of the function so it can be called.
          Auserfunction like overpaid() is "registered" in Postgres using the Postquel
            DEFINECFUNCTION
          query.
        7. HowdoesaUserDeﬁnedOperatorgetexecuted?
          When a user deﬁnes an operator, a user-deﬁned function is associated with it. Therefore, execution
        of an operator is little different than execution of a function, with only one major exception: the Planner
        knowshowtooptimizeoperator queries but does not know how to deal with functions.
          Themajorsystemclass dealing with operators is
            PG_OPERATOR
          Additionally, the same classes discussed in Section 6 are used in the execution of operators. Binding
        of user functions to operators is handled with the Postquel
            DEFINEOPERATOR
          query.
        8. HowisaUserDeﬁnedTypehandled?
          AUser Deﬁned Type minimally consists of two functions, the input function for the type, and the
        output function for the type. Additionally, ordering (less than, greater than) and equality functions are
        necessary if the type is to be used in qualiﬁed queries, or if an index is to be deﬁned on the type.
          In the system class
            PG_TYPE
          is information related to user-deﬁned types, such as
            o The names of the registered functions for input and output of the type.
            o The size of the type
          ThePostquel query
            DEFINETYPE
          populates the PG_TYPE class.