Return-Path: owner-postman
Received: from localhost (localhost [127.0.0.1]) by nobozo.CS.Berkeley.EDU (8.6.4/8.6.3) with SMTP id LAA05017 for postgres-dist; Fri, 25 Mar 1994 11:04:19 -0800
Resent-From: POSTGRES mailing list <postman@postgres.Berkeley.EDU>
Resent-Message-Id: <199403251904.LAA05017@nobozo.CS.Berkeley.EDU>
Sender: owner-postman@postgres.Berkeley.EDU
X-Return-Path: owner-postman
Received: from faerie.CS.Berkeley.EDU (faerie.CS.Berkeley.EDU [128.32.149.14]) by nobozo.CS.Berkeley.EDU (8.6.4/8.6.3) with ESMTP id LAA05008 for <postgres@postgres.Berkeley.EDU>; Fri, 25 Mar 1994 11:04:18 -0800
Received: from localhost (localhost [127.0.0.1]) by faerie.CS.Berkeley.EDU (8.6.4/8.1B) with SMTP id LAA02384; Fri, 25 Mar 1994 11:04:12 -0800
Message-Id: <199403251904.LAA02384@faerie.CS.Berkeley.EDU>
X-Authentication-Warning: faerie.CS.Berkeley.EDU: Host localhost didn't use HELO protocol
From: aoki@postgres.Berkeley.EDU (Paul M. Aoki)
To: suresh@nlm.nih.gov (Suresh Srinivasan)
Cc: postgres@postgres.Berkeley.EDU, rodgers@hume.nlm.nih.gov
Subject: Re: Retrieve question 
Reply-To: aoki@postgres.Berkeley.EDU (Paul M. Aoki)
In-reply-to: Your message of Fri, 25 Mar 94 10:05:56 EST 
	     <9403251505.AA10406@hume.csb> 
Date: Fri, 25 Mar 94 11:04:12 -0800
X-Sender: aoki@postgres.Berkeley.EDU
Resent-To: postgres-dist@postgres.Berkeley.EDU
X-Mts: smtp
Resent-Date: Fri, 25 Mar 94 11:04:19 -0800
Resent-XMts: smtp

suresh@nlm.nih.gov (Suresh Srinivasan) writes:
> It seems that if the query is qualified by
> even a simple Boolean expression, the search
> defaults to a sequential scan rather than
> use the index.

you can have fairly complex AND/NOT expressions and use indices.
the key in your query is the OR.  OR is a classic hard spot for 
optimizers because you really have to look closely at how clauses 
partition the data and believe in your stats.

a lot of classic papers, such as the system r paper, don't 
consider OR because their estimates/stats are assumed to be 
so poor that it doesn't make sense to do so.  ours fall into
the same category -- we have *no* key value stats.  if you keep 
better stats (like histograms) on key values and are willing 
to code ten million corner cases (as do ask/ingres and rdb)
you can do better.

you may think, well, a=b OR a=c will *always* be better done
by an index scan.  what if the column has very few distinct
values?  then, even with equality clauses, using the index
may not make a lot of sense.  so even "no brainer" cases have
problems.

notice that this problem is much, much harder in an extensible 
system where many types have no ordering, selectivity functions
may be really bizarre (at least, intelligent ones) for new access
methods like r-trees, etc.  hard-coded assumptions about b-trees 
and key ordering make no sense.  what stats should you keep for
polygons and r-trees?
--
  Paul M. Aoki  |  CS Div., Dept. of EECS, UCB  |  aoki@postgres.Berkeley.EDU
                |  Berkeley, CA 94720           |  ...!uunet!ucbvax!aoki
