DB design with large objects

agora inbox for postgres@postgres.berkeley.edu  
help / color / mirror / Atom feed

From: Brent <Brent.Burton@math.tamu.edu>
To: postgres@postgres.Berkeley.EDU
Cc: brentb@fourier.math.tamu.edu
Subject: DB design with large objects
Date: Fri, 6 May 1994 16:20:46 -0500
Message-ID: <199405062120.QAA10082@fourier.math.tamu.edu> (raw)

Hello, all.

My present task is to take an existing raw text file and install
it into the database system.  We are using Postgres 4.1 on SunOS 4.1.3.
I have a few questions that I hope people can help with.  Some
are database design concerns while others are postgres-specific.

The database I'm manipulating is for the Math Reviews DB managed
by AMS.  The records consist of indexing and bibliographic information
for a paper.  Also, the text of the review as well as TeX-format names
are part of each record.

There are 4 main parts of the database:
1) Indexing/control information (unique MR #'s, bibliographic info, etc)
2) Plain text (ASCII) fields
3) TeX-format fields
4) The text of the review.

Parts 2 and 3 are cousins: they contain the same information but 3)
is TeX-formatted.  Part 4, being the raw text of the review, can be
up to 50k.  All told, there are 102 or so attributes to manage.

The fields are textual information and the AMS file specification for
the raw text database gives maximum lengths of the fields.  If I used
character arrays for the attributes and used their max lengths, one
record could occupy over 19k, not counting the review text.  However,
I'm planning on using postgres' text (variable length) type.

Database Design
---------------
Since each paper is identifiable by a unique MR number, I can use
this as a key.  I was thinking about making 4 classes for this database,
corresponding with the above 4 attribute groups.  The MR would link
the class instances.

Does this sound reasonable?

Further, by splitting the text like this, I can shrink each instance
below PG's 8k barrier.  I think this may work.  Oh, the review text
will be a problem -- see below.

Managing Review Text
--------------------
The goal of this project is to have a browsable database for the faculty
here, and provide a nice X interface for it.  Ideally, I want to add
the capability for a user to fetch a paper's information, display the
raw text immediately, and simultaneously send the TeX information to a
tempfile, TeX it, then preview it with xdvi.

With that said, how do I get around the 8k limit to manage the review
text?  From reading the manual, it seems I have to define my own data
type and write some code to link that together.  That's not a problem,
but does that mean the review text should be stored *outside* of the
database proper, and perhaps referenced via a pathname from the database?

One idea I had was to take the text (for example) from paper #92a:00001
and stash it into /usr/local/lib/MR/reviews/mr.92a:00001.  Then, that
path would be an attribute's value.

I assume another tactic would be to split the review text into chunks
(< 8k), assign each chunk a sequence number and store that into a class.
Thus, the attributes for this class would be ( MR#, Seq#, Text).  I.e.:
  92a:00001   1   The quick brown ....
  92a:00001   2   fox jumped over ....
  92a:00001   3   the lazy dog.
  93b:19483   1   ...
I don't like this scheme because it's too cumbersome.

I would appreciate any comments that people may have about this.  This
is my first real database design problem and is complicated by the
8k limit (or so I think).

Feel free to email me for any more information.

-Brent

===============================================================================
    To add/remove yourself from the POSTGRES mailing list: send mail with 
    the subject line ADD or DEL to "postgres-request@postgres.Berkeley.EDU"

    If this fails, send mail to "post_questions@postgres.Berkeley.EDU" and
    a human will deal with it.  DO NOT post to the "postgres" mailing list.
===============================================================================

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: postgres@postgres.berkeley.edu
  Cc: Brent.Burton@math.tamu.edu, brentb@fourier.math.tamu.edu
  Subject: Re: DB design with large objects
  In-Reply-To: <199405062120.QAA10082@fourier.math.tamu.edu>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox