Return-Path: pg_adm@postgres.berkeley.edu
Received: by postgres.Berkeley.EDU (5.61/1.29)
	id AA16082; Wed, 29 Jul 92 17:04:11 -0700
Message-Id: <9207300004.AA16082@postgres.Berkeley.EDU>
From: Spyros Potamianos <potamian@hplsjps.hpl.hp.com>
Subject: Re: no subject (file transmission)
To: postgres@postgres.berkeley.edu
Sender: pg_adm@postgres.berkeley.edu
To: rogers@cis.ksu.edu
Date: Wed, 29 Jul 92 17:13:18 PDT
Cc: postgres@postgres.Berkeley.EDU
In-Reply-To: <9207291859.AA12616@postgres.Berkeley.EDU>; from "David S Rogers" at Jul 29, 92 2:11 pm
Mailer: Elm [revision: 70.30]

> Hello,
> 	I have noticed a bug concerning the rules system, given the following
> rule and query incorrect results are obtained.
> 
> * define rule rule1 is 
>     on retrieve to emp.salary do
>     append emplog (name=pg_username(), query = "retrieve", field = "salary")
>     \g
> 
> * retrieve (emp.name)\g
> 
> 	The results are that rule1 is triggered for each record even though
> emp.salary has not been retrieved.

You are right. What happens, is that whenever a tuple is returned by the
access methods, the rule manager fires ALL the "on retrieve" rules defined
in this tuple (i.e. it assumes that all the attributes of the tuple
are being retrieved).
The alternative of checking which attributes will be projected later on
and discard the appropriate rules, would have complicated the
implementation (and semantics!) beyond belief.

To understand why, consider the following query,

    retrieve (emp.name)
    where emp.desk = "wood"

and the following rule:

    RULE 1:
    on retrieve to emp.salary
    do instead retrieve(salary=1000)

The query only retrieves the "name" attribute and uses the "wood"
attribute in the qualification. It would seem that the "salary" is not
retrieved and therefore RULE 1 must not be activated.
However if we add a second rule:

    RULE 2:
    on retrieve to emp.desk
    where current.salary = 1000
    do instead retrieve(desk = "wood")

then in order to calculate the value of "desk", we have to first
calculate the value of "salary" and therefore activate RULE 1.

It is possible to have a complex data structure on each attribute of
each tuple describing which "on retrieve" rules affect it, and fire these
rules only when absolutely necessary, but as this would have been too
complex we decided to go with the not so elegant but simplest solution of
blindly activate all "on retrieve" rules as soon as a tuple is retrieved
from disk.

> 	Also if a second rule is defined thus:
> 
> * define rule rule2 is
>     on retrieve to emp.salary where emp.salary>2000 do
>     instead nothing\g
> 

ooops! I presume that you wanted to write:
    on retrieve to emp.salary where current.salary>2000 do
    instead nothing		    ^^^^^^^

The "emp" of the "on" part of the rule, and the "emp" of the "where" part
are considered to be different tuple variables (yes, I know, it's not very
intuitive....). As a result your query means:

    activate the rule when we retrieve the salary of an employee
    and there is at least one employee with a salary
    greater than 2000

while mine (which uses 'current' instead of 'emp') means

    activate the rule when we retrieve the salary of an employee
    and HIS salary is greater than 2000

>     and followed by:
> 
> * retrieve (emp.name)\g
> 
> 	The result is 58 appends done to emplog for database of only 10 
> objects/employees.  This level of complexity seems unreasonable. 
> 
> 	rules	elements	appends
> 	0	10		0
> 	1	10		10
> 	2	10		58
> 	3	10		383
> 
> I've been juggling the rules and number of elements of the database trying
> to estimate an order of complexity and, although I'm not positive, it appears 
> the worst case scenario has complexity of:
> 
> 				(r-1)
> 			      (2     )
>                              n
> 
> where n is the number of tuples in the database and 
> r is the number of rules.

Well, the way you have written your rule, for every tuple retrieved
it will be activated and try to see if there is any employee with a salary
greater than 2000, so its starts a new scan of 'emp'! (think of it as a
nested-loop join). This scan will stop the first time we retrieve a tuple
that satisfies the qualification (salary > 2000) (that's why the numbers
of appends you got are not multiples of 10).
And of course if you have more similar rules, they will be activated too,
and in turn activate the other ones etc. etc. Things will become VERY
slow, but at least they will terminate (there is a simple loop detection
mechanism)

> 
> If this is the case, a large set of rules (10+) may be impractical even with 
> small databases.
> 
> Also is there a way to delete and/or view rules once they are entered?

You can remove rules with the following command:
    remove rule rule_name
(note: do not put quotes around the rule's name)

If you want to view a rule, try "retrieve (pg_prs2rule.all)".
The "prs2text" column of this relation contains the text of the
"define rule" command that was used to insert the rule.

> Thank You for your time and effort.
> 
> -Dave Rogers

Hope that helps!

Spyros Potamianos
The Ex POSTGRES-Rule-Person!
potamian@hpl.hp.com
