agora inbox for postgres@postgres.berkeley.edu  
help / color / mirror / Atom feed
From: Joerg Stadler <joerg@itap.physik.uni-stuttgart.de>
To: postgres@postgres.Berkeley.EDU
Subject: Beginners questions
Date: Tue, 10 Oct 1995 13:31:01 +0100
Message-ID: <199510101231.NAA03056@titania> (raw)

Hello postgres users,

my task is to implement a (insitiution wide) database that handles
BibTeX citation entries. Here are some BibTex examples:

=========================== examples ====================================

TechReport{dongarra93,
  author =       "Dongarra, Jack J.",
  title =        "Performance of Various Computers Using Standar
                  Linear Equations Software",
  institution =  "Computer Science Department, University of Tennessee
                  and Mathematical Sciences Section, Oak Ridge
                  National Laboratory",
  year =         1993,
  key =          "Ma Sonstiges",
  note =         "per email zu erhalten von netlib@ornl.gov"
}

@BOOK{wall90,
   AUTHOR = {Larry Wall and Randall Schwartz},
   EDITOR = {O'Reilly },
   PUBLISHER = {O'Reilly},
   TITLE = {Programming perl},
   YEAR = {1990},
   KEYWORD = {Ma Sonstiges}
}

@ARTICLE{black54,
   AUTHOR = {Black,, P.J.},
   JOURNAL = {Acta. Cryst.},
   KEY = {Qk Approximanten},
   NUMBER = {8},
   PAGES = {1955},
   TITLE = {{T}he {S}tructure of {FeAl$_{3}$}},
   YEAR = {1954}
}

=========================== examples end ===============================


 I somehow found out about postgres and decided to give it a
try. 

Now, as seen from the examples, the BibTeX format is field-oriented,
but you can't tell in advance how many different fields there
are. This depends on the type of the citation, and also users sre
allowed to define thier own fields. 

Since I can't map BibTeX field names directly to postgres attributes,
I came up with the following database structure:

create ENTRIES ( citekey=text,   // Unique key for each citation: e.g. Wall1990b
                 author =text,   // Name of first author: e.g. Wall 
                 year   =int4,   // Year of publication: e.g. 1990
                 uid    =int4,   // Who added this citation?
                 ctype  =text )  // ARTICLE, BOOK, ....

create BIBDATA ( citekey=text,   // See above
                 field  =text,   // BibTeX field name: AUTHOR, TITLE etc.
                 data   =text )  // BibTeX field's contents

There is one entry in the ENTRIES relation for every citation and
there is one entry in the BIBDATA relation for every field in a given
citation.

I wrote some pgperl scripts that add, delete, replace and retrieve
(given a citekey) citations from this database. This works quite
well. 

But one (most important) question remains: How do I formulate
efficient queries that search multiple data fields in BIBDATA?

If I just wan't to know about one field, everything works fine. I do:

	retrieve (ENTRIES.citekey)
	where BIBDATA.citekey=ENTRIES.citekey 
	  and BIBDATA.field="AUTHOR" and BIBDATA.data~"Wall"

Adding a second search condition works, but is terribly slow:

	retrieve (ENTRIES.citekey)
	from B1, B2 in BIBDATA
	where B1.citekey=ENTRIES.citekey 
	  and B2.citekey=ENTRIES.citekey 
	  and B1.field="AUTHOR" and B1.data~"Wall"
	  and B2.field="TITLE"  and B2.data~"Programming"

Three conditions don't work at all: The query

	retrieve (ENTRIES.citekey)
	from B1, B2, B3 in BIBDATA
	where B1.citekey=ENTRIES.citekey 
	  and B2.citekey=ENTRIES.citekey 
	  and B3.citekey=ENTRIES.citekey
	  and B1.field="AUTHOR" and B1.data~"Wall"
	  and B2.field="TITLE"  and B2.data~"Programming"
	  and B3.field="YEAR"   and B3.data~"1990"

takes forever and gives a (wrong) null result. 

I'm no expert in database systems and this maybe too naive an
approach. 

My questions are:

Is there a better (more efficient) way to formulate these
queries? How can I have queries with more than two search conditions?
Is there a better way to store BibTeX databases in postgres?

I'm grateful for any hints.

Greetings from southern Germany

joerg

J"org Stadler                                joerg@itap.physik.uni-stuttgart.de
Inst. f. Theoretische und Angewandte Physik  Phone: 0711 / 685 - 5268 
Universit"at Stuttgart                       Pfaffenwaldring 57 Room 6.346
70550 Stuttgart				     finger for pgp public key



==============================================================================
   To add/remove yourself to/from the POSTGRES mailing list: send mail with 
   the subject line ADD or DEL to "postgres-request@postgres.Berkeley.EDU".
   If this fails, send mail to "post_questions@postgres.Berkeley.EDU" and
   a human will deal with it.  DO NOT post to the "postgres" mailing list.
==============================================================================
              URL: http://s2k-ftp.CS.Berkeley.EDU:8000/postgres/



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: postgres@postgres.berkeley.edu
  Cc: joerg@itap.physik.uni-stuttgart.de
  Subject: Re: Beginners questions
  In-Reply-To: <199510101231.NAA03056@titania>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox