Sender: owner-postgres95@postgres.Berkeley.EDU
X-Return-Path: andrew_yu
Received: from lounix4.conc.tdsnet.com (root@a07.conc.tdsnet.com [204.246.2.200]) by nobozo.CS.Berkeley.EDU (8.6.10/8.6.3) with ESMTP id FAA00177 for <postgres95@nobozo.CS.Berkeley.EDU>; Wed, 8 Nov 1995 05:33:35 -0800
Received: from lounix4.conc.tdsnet.com (lou@localhost [127.0.0.1]) by lounix4.conc.tdsnet.com (8.6.12/8.6.9) with SMTP id IAA00126; Wed, 8 Nov 1995 08:34:14 -0500
Date: Wed, 8 Nov 1995 08:34:11 -0500 (EST)
From: Lou Sortman <lou@ncinfo.iog.unc.edu>
To: Robert Patrick <Robert_Patrick@methi.ndim.edrc.cmu.edu>
cc: "Bryon S. Lape" <blape@utk.edu>, postgres95@postgres.Berkeley.EDU
Subject: Re: (fwd) 
In-Reply-To: <199511072216.OAA31902@nobozo.CS.Berkeley.EDU>
Message-ID: <Pine.LNX.3.91.951108081803.108B-100000@lounix4.conc.tdsnet.com>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Resent-To: postgres95-redist
Resent-Date: Wed, 08 Nov 95 05:33:39 -0800
Resent-From: pglite
Resent-XMts: smtp

On Tue, 7 Nov 1995, Robert Patrick wrote:

> Date: Tue, 07 Nov 1995 17:16:06 -0500
> From: Robert Patrick <Robert_Patrick@methi.ndim.edrc.cmu.edu>
> To: "Bryon S. Lape" <blape@utk.edu>, postgres95@nobozo.CS.Berkeley.EDU
> Subject: Re: (fwd) 
> 
> > >
> > > SELECT * 
> > >   FROM foo 
> > >  WHERE foo.bar ~ '.*[^a-zA-Z1-9]user_word[^a-zA-Z1-9].*'
> > >     OR foo.bar ~ 'user_word[^a-zA-Z1-9].*'
> > >     OR foo.bar ~ '.*[^a-zA-Z1-9]user_word'
> > 
> >         I tried the using the OR's and it returned too much.  Using the
> > first line of reg exp does seem to work though.
> 
> Dang it, I did it again...
> 
> SELECT * 
>   FROM foo 
>  WHERE foo.bar ~ '.*[^a-zA-Z1-9]user_word[^a-zA-Z1-9].*'
>     OR foo.bar ~ '^user_word[^a-zA-Z1-9].*'
>     OR foo.bar ~ '.*[^a-zA-Z1-9]user_word$'

First of all, I think you probably want [0-9] rather than [1-9].
Also, I think that the .* at the beginning of these patterns are superfluous.
The regexes aren't anchored to the beginning or the end unless you use ^ 
or $.
Regexes are bad for performance anyway, but the extra symbols can't be 
helping.
I sent Bryon a regex which, I believe, combines the above three regexes 
(with the to changes above) into one.

'(^|[^a-zA-Z0-9])user_word([^a-zA-Z1-9]|$)'

Hopefully, the slightly more complex regex will run faster than three 
simpler ones.

Of course, I've now embarassed myself publicly if it doesn't work, but 
I'm pretty sure it did the last time I tried it.

A note about regexes:
I was playing with regexes in Postgres95 fairly extensively awhile back.  
I found that when my regexes got beyond a certain size, my search results 
were erroneous.  As I recall, my queries came back empty.  I dug through 
the code and found that the buffer for compiled regular expressions is a 
constant (256) and my expressions compiled to be longer than that.  
Doubling the constant just staved off the inevitable.  At least on my 
system, specifing a NULL pointer as the buffer address causes the regular 
expression compiling function to allocate a buffer of the correct size to 
hold the expression.  That solved the problem for me.  I told Andrew 
about this (I think it was him) and he said that this feature does not 
exist in all of the systems that Postgres95 runs on, so it is not a 
change that can be made to the distribution. (an #ifdef might be nice)

===============================================================================
  To unsubscribe from the Postgres95 mailing list, send mail with the subject
  line "DEL" to "postgres95-request@postgres.Berkeley.EDU". 
============  URL: http://s2k-ftp.CS.Berkeley.EDU:8000/postgres95/  ===========
