Delivery-Date: Fri, 06 May 94 16:56:15 -0700
Resent-From: POSTGRES mailing list <postman@postgres.Berkeley.EDU>
Resent-Message-Id: <199405062120.OAA12838@nobozo.CS.Berkeley.EDU>
Sender: owner-postman@postgres.Berkeley.EDU
Date: Fri, 6 May 1994 16:20:46 -0500
From: Brent <Brent.Burton@math.tamu.edu>
Message-Id: <199405062120.QAA10082@fourier.math.tamu.edu>
To: postgres@postgres.Berkeley.EDU
Subject: DB design with large objects
Cc: brentb@fourier.math.tamu.edu
Resent-To: postgres-redist@postgres.Berkeley.EDU
Resent-Date: Fri, 06 May 94 14:20:55 -0700
Resent-XMts: smtp


Hello, all.

My present task is to take an existing raw text file and install
it into the database system.  We are using Postgres 4.1 on SunOS 4.1.3.
I have a few questions that I hope people can help with.  Some
are database design concerns while others are postgres-specific.

The database I'm manipulating is for the Math Reviews DB managed
by AMS.  The records consist of indexing and bibliographic information
for a paper.  Also, the text of the review as well as TeX-format names
are part of each record.

There are 4 main parts of the database:
1) Indexing/control information (unique MR #'s, bibliographic info, etc)
2) Plain text (ASCII) fields
3) TeX-format fields
4) The text of the review.

Parts 2 and 3 are cousins: they contain the same information but 3)
is TeX-formatted.  Part 4, being the raw text of the review, can be
up to 50k.  All told, there are 102 or so attributes to manage.

The fields are textual information and the AMS file specification for
the raw text database gives maximum lengths of the fields.  If I used
character arrays for the attributes and used their max lengths, one
record could occupy over 19k, not counting the review text.  However,
I'm planning on using postgres' text (variable length) type.

Database Design
---------------
Since each paper is identifiable by a unique MR number, I can use
this as a key.  I was thinking about making 4 classes for this database,
corresponding with the above 4 attribute groups.  The MR would link
the class instances.

Does this sound reasonable?

Further, by splitting the text like this, I can shrink each instance
below PG's 8k barrier.  I think this may work.  Oh, the review text
will be a problem -- see below.

Managing Review Text
--------------------
The goal of this project is to have a browsable database for the faculty
here, and provide a nice X interface for it.  Ideally, I want to add
the capability for a user to fetch a paper's information, display the
raw text immediately, and simultaneously send the TeX information to a
tempfile, TeX it, then preview it with xdvi.

With that said, how do I get around the 8k limit to manage the review
text?  From reading the manual, it seems I have to define my own data
type and write some code to link that together.  That's not a problem,
but does that mean the review text should be stored *outside* of the
database proper, and perhaps referenced via a pathname from the database?

One idea I had was to take the text (for example) from paper #92a:00001
and stash it into /usr/local/lib/MR/reviews/mr.92a:00001.  Then, that
path would be an attribute's value.

I assume another tactic would be to split the review text into chunks
(< 8k), assign each chunk a sequence number and store that into a class.
Thus, the attributes for this class would be ( MR#, Seq#, Text).  I.e.:
  92a:00001   1   The quick brown ....
  92a:00001   2   fox jumped over ....
  92a:00001   3   the lazy dog.
  93b:19483   1   ...
I don't like this scheme because it's too cumbersome.


I would appreciate any comments that people may have about this.  This
is my first real database design problem and is complicated by the
8k limit (or so I think).

Feel free to email me for any more information.

-Brent

===============================================================================
    To add/remove yourself from the POSTGRES mailing list: send mail with 
    the subject line ADD or DEL to "postgres-request@postgres.Berkeley.EDU"

    If this fails, send mail to "post_questions@postgres.Berkeley.EDU" and
    a human will deal with it.  DO NOT post to the "postgres" mailing list.
===============================================================================