Return-Path: owner-postman Received: from localhost.Berkeley.EDU (localhost.Berkeley.EDU [127.0.0.1]) by nobozo.CS.Berkeley.EDU (8.6.10/8.6.3) with SMTP id FAA05671 for postgres-redist; Tue, 10 Oct 1995 05:34:28 -0700 Resent-From: POSTGRES mailing list Resent-Message-Id: <199510101234.FAA05671@nobozo.CS.Berkeley.EDU> X-Authentication-Warning: nobozo.CS.Berkeley.EDU: Host localhost.Berkeley.EDU didn't use HELO protocol Sender: owner-postman@postgres.Berkeley.EDU X-Return-Path: owner-postman Received: from dfunms.rus.uni-stuttgart.de (dfunms.rus.uni-stuttgart.de [129.69.1.162]) by nobozo.CS.Berkeley.EDU (8.6.10/8.6.3) with SMTP id FAA25230 for ; Tue, 10 Oct 1995 05:31:10 -0700 Received: from titania (titania.itap.physik.uni-stuttgart.de) by dfunms.rus.uni-stuttgart.de with SMTP id AA29397 (5.65c8/DFUE-M1.0 for ); Tue, 10 Oct 1995 13:31:02 +0100 Received: (from joerg@localhost) by titania (940816.SGI.8.6.9/8.6.11) id NAA03056; Tue, 10 Oct 1995 13:31:01 +0100 Date: Tue, 10 Oct 1995 13:31:01 +0100 From: Joerg Stadler Message-Id: <199510101231.NAA03056@titania> To: postgres@postgres.Berkeley.EDU Subject: Beginners questions Reply-To: joerg.stadler@itap.physik.uni-stuttgart.de Resent-To: postgres-redist@postgres.Berkeley.EDU Resent-Date: Tue, 10 Oct 95 05:34:28 -0700 Resent-XMts: smtp Hello postgres users, my task is to implement a (insitiution wide) database that handles BibTeX citation entries. Here are some BibTex examples: =========================== examples ==================================== TechReport{dongarra93, author = "Dongarra, Jack J.", title = "Performance of Various Computers Using Standar Linear Equations Software", institution = "Computer Science Department, University of Tennessee and Mathematical Sciences Section, Oak Ridge National Laboratory", year = 1993, key = "Ma Sonstiges", note = "per email zu erhalten von netlib@ornl.gov" } @BOOK{wall90, AUTHOR = {Larry Wall and Randall Schwartz}, EDITOR = {O'Reilly }, PUBLISHER = {O'Reilly}, TITLE = {Programming perl}, YEAR = {1990}, KEYWORD = {Ma Sonstiges} } @ARTICLE{black54, AUTHOR = {Black,, P.J.}, JOURNAL = {Acta. Cryst.}, KEY = {Qk Approximanten}, NUMBER = {8}, PAGES = {1955}, TITLE = {{T}he {S}tructure of {FeAl$_{3}$}}, YEAR = {1954} } =========================== examples end =============================== I somehow found out about postgres and decided to give it a try. Now, as seen from the examples, the BibTeX format is field-oriented, but you can't tell in advance how many different fields there are. This depends on the type of the citation, and also users sre allowed to define thier own fields. Since I can't map BibTeX field names directly to postgres attributes, I came up with the following database structure: create ENTRIES ( citekey=text, // Unique key for each citation: e.g. Wall1990b author =text, // Name of first author: e.g. Wall year =int4, // Year of publication: e.g. 1990 uid =int4, // Who added this citation? ctype =text ) // ARTICLE, BOOK, .... create BIBDATA ( citekey=text, // See above field =text, // BibTeX field name: AUTHOR, TITLE etc. data =text ) // BibTeX field's contents There is one entry in the ENTRIES relation for every citation and there is one entry in the BIBDATA relation for every field in a given citation. I wrote some pgperl scripts that add, delete, replace and retrieve (given a citekey) citations from this database. This works quite well. But one (most important) question remains: How do I formulate efficient queries that search multiple data fields in BIBDATA? If I just wan't to know about one field, everything works fine. I do: retrieve (ENTRIES.citekey) where BIBDATA.citekey=ENTRIES.citekey and BIBDATA.field="AUTHOR" and BIBDATA.data~"Wall" Adding a second search condition works, but is terribly slow: retrieve (ENTRIES.citekey) from B1, B2 in BIBDATA where B1.citekey=ENTRIES.citekey and B2.citekey=ENTRIES.citekey and B1.field="AUTHOR" and B1.data~"Wall" and B2.field="TITLE" and B2.data~"Programming" Three conditions don't work at all: The query retrieve (ENTRIES.citekey) from B1, B2, B3 in BIBDATA where B1.citekey=ENTRIES.citekey and B2.citekey=ENTRIES.citekey and B3.citekey=ENTRIES.citekey and B1.field="AUTHOR" and B1.data~"Wall" and B2.field="TITLE" and B2.data~"Programming" and B3.field="YEAR" and B3.data~"1990" takes forever and gives a (wrong) null result. I'm no expert in database systems and this maybe too naive an approach. My questions are: Is there a better (more efficient) way to formulate these queries? How can I have queries with more than two search conditions? Is there a better way to store BibTeX databases in postgres? I'm grateful for any hints. Greetings from southern Germany joerg J"org Stadler joerg@itap.physik.uni-stuttgart.de Inst. f. Theoretische und Angewandte Physik Phone: 0711 / 685 - 5268 Universit"at Stuttgart Pfaffenwaldring 57 Room 6.346 70550 Stuttgart finger for pgp public key ============================================================================== To add/remove yourself to/from the POSTGRES mailing list: send mail with the subject line ADD or DEL to "postgres-request@postgres.Berkeley.EDU". If this fails, send mail to "post_questions@postgres.Berkeley.EDU" and a human will deal with it. DO NOT post to the "postgres" mailing list. ============================================================================== URL: http://s2k-ftp.CS.Berkeley.EDU:8000/postgres/