Resent-From: POSTGRES mailing list <postman@postgres.Berkeley.EDU>
Resent-Message-Id: <199402072237.OAA12955@nobozo.CS.Berkeley.EDU>
Sender: owner-postman@postgres.Berkeley.EDU
Message-Id: <199402072236.OAA05351@faerie.CS.Berkeley.EDU>
From: aoki@postgres.Berkeley.EDU (Paul M. Aoki)
To: Dale Arntson <dale@hush.lib.uchicago.edu>
Cc: postgres@postgres.Berkeley.EDU
Subject: Re: Problem with Postgres finding intersection of lists 
Reply-To: aoki@postgres.Berkeley.EDU (Paul M. Aoki)
In-reply-to: Your message of Mon, 7 Feb 1994 12:56:41 -0600 
	     <199402071856.MAA28137@hush.lib.uchicago.edu> 
Date: Mon, 07 Feb 1994 14:36:55 -0800
Resent-To: postgres-dist@postgres.Berkeley.EDU
Resent-Date: Mon, 07 Feb 94 14:37:06 -0800
Resent-XMts: smtp

Dale Arntson <dale@hush.lib.uchicago.edu> writes:
> retrieve unique (c.txt) from
> c in cite,
> i1 in invert,
> i2 in invert,
> i3 in invert,
> i4 in invert,
> i5 in invert
> where
> i1.wd = "russell" and
> i2.wd = "principia" and
> i3.wd = "mathematica" and
> i4.wd = "cambridge" and
> i5.wd = "press" and
> i1.rn = i2.rn and
> i1.rn = i3.rn and
> i1.rn = i4.rn and
> i1.rn = i5.rn and
> c.rn = i1.rn

i executed this as-is and it blows out the query optimizer (it runs
out of memory -- it uses a variant of the exponential-complexity 
optimization algorithm from IBM's System R and, because it was
originally written in lisp, leaks memory like a sieve).

on a wild guess, i then re-ran it on our development system using 
chained join clauses, i.e.,

i1.rn = i2.rn and
i2.rn = i3.rn and
i3.rn = i4.rn and
i4.rn = i5.rn and
c.rn = i1.rn

and the query completed relatively quickly.  (i also made your keyword 
table bigger and threw an index over the columns so that the cost-based
optimizer would be *sure* to use the index.. scans of base tables are
*very* slow.  where they can be used, indices are always always always 
a good idea.)

the optimizer is not super-smart and doesn't compute all of the possible
join clauses it could invent based on the closure of these clauses 
(i1=i2,i1=i3...i3=i4,i3=i5,...).  writing the query this way basically
forces a particular join order, which is ok in this particular case 
because they're all on the same table anyway.  but the original query 
has too many join order possibilities for the optimizer to handle.. it 
got to 140MB on our alpha server before dying..
--
  Paul M. Aoki  |  CS Div., Dept. of EECS, UCB  |  aoki@postgres.Berkeley.EDU
                |  Berkeley, CA 94720           |  ...!uunet!ucbvax!aoki