Return-Path: owner-postman Delivery-Date: Fri, 06 May 94 16:56:15 -0700 Return-Path: owner-postman Received: from localhost (localhost [127.0.0.1]) by nobozo.CS.Berkeley.EDU (8.6.4/8.6.3) with SMTP id OAA12838 for postgres-redist; Fri, 6 May 1994 14:20:55 -0700 Resent-From: POSTGRES mailing list Resent-Message-Id: <199405062120.OAA12838@nobozo.CS.Berkeley.EDU> X-Authentication-Warning: nobozo.CS.Berkeley.EDU: Host localhost didn't use HELO protocol Sender: owner-postman@postgres.Berkeley.EDU X-Return-Path: owner-postman Received: from fourier.math.tamu.edu (fourier.math.tamu.edu [128.194.7.40]) by nobozo.CS.Berkeley.EDU (8.6.4/8.6.3) with ESMTP id OAA12828 for ; Fri, 6 May 1994 14:20:53 -0700 Received: (brentb@localhost) by fourier.math.tamu.edu (8.6.8/8.6.4) id QAA10082; Fri, 6 May 1994 16:20:46 -0500 Date: Fri, 6 May 1994 16:20:46 -0500 From: Brent Message-Id: <199405062120.QAA10082@fourier.math.tamu.edu> To: postgres@postgres.Berkeley.EDU Subject: DB design with large objects Cc: brentb@fourier.math.tamu.edu Resent-To: postgres-redist@postgres.Berkeley.EDU Resent-Date: Fri, 06 May 94 14:20:55 -0700 Resent-XMts: smtp Hello, all. My present task is to take an existing raw text file and install it into the database system. We are using Postgres 4.1 on SunOS 4.1.3. I have a few questions that I hope people can help with. Some are database design concerns while others are postgres-specific. The database I'm manipulating is for the Math Reviews DB managed by AMS. The records consist of indexing and bibliographic information for a paper. Also, the text of the review as well as TeX-format names are part of each record. There are 4 main parts of the database: 1) Indexing/control information (unique MR #'s, bibliographic info, etc) 2) Plain text (ASCII) fields 3) TeX-format fields 4) The text of the review. Parts 2 and 3 are cousins: they contain the same information but 3) is TeX-formatted. Part 4, being the raw text of the review, can be up to 50k. All told, there are 102 or so attributes to manage. The fields are textual information and the AMS file specification for the raw text database gives maximum lengths of the fields. If I used character arrays for the attributes and used their max lengths, one record could occupy over 19k, not counting the review text. However, I'm planning on using postgres' text (variable length) type. Database Design --------------- Since each paper is identifiable by a unique MR number, I can use this as a key. I was thinking about making 4 classes for this database, corresponding with the above 4 attribute groups. The MR would link the class instances. Does this sound reasonable? Further, by splitting the text like this, I can shrink each instance below PG's 8k barrier. I think this may work. Oh, the review text will be a problem -- see below. Managing Review Text -------------------- The goal of this project is to have a browsable database for the faculty here, and provide a nice X interface for it. Ideally, I want to add the capability for a user to fetch a paper's information, display the raw text immediately, and simultaneously send the TeX information to a tempfile, TeX it, then preview it with xdvi. With that said, how do I get around the 8k limit to manage the review text? From reading the manual, it seems I have to define my own data type and write some code to link that together. That's not a problem, but does that mean the review text should be stored *outside* of the database proper, and perhaps referenced via a pathname from the database? One idea I had was to take the text (for example) from paper #92a:00001 and stash it into /usr/local/lib/MR/reviews/mr.92a:00001. Then, that path would be an attribute's value. I assume another tactic would be to split the review text into chunks (< 8k), assign each chunk a sequence number and store that into a class. Thus, the attributes for this class would be ( MR#, Seq#, Text). I.e.: 92a:00001 1 The quick brown .... 92a:00001 2 fox jumped over .... 92a:00001 3 the lazy dog. 93b:19483 1 ... I don't like this scheme because it's too cumbersome. I would appreciate any comments that people may have about this. This is my first real database design problem and is complicated by the 8k limit (or so I think). Feel free to email me for any more information. -Brent =============================================================================== To add/remove yourself from the POSTGRES mailing list: send mail with the subject line ADD or DEL to "postgres-request@postgres.Berkeley.EDU" If this fails, send mail to "post_questions@postgres.Berkeley.EDU" and a human will deal with it. DO NOT post to the "postgres" mailing list. ===============================================================================