MIME-Version: 1.0
References: <20230601235909.0e1572c27e59112f9d0cbe86@sraoss.co.jp>
 <20230601034703.9e4f81f5d92ae6e3949b84d2@sraoss.co.jp> <CACJufxE8Kct5=c1KKi_JgP09WApCk0Myk6=a5xo05_J_mB7hRQ@mail.gmail.com>
 <20230628170604.505955118ac2f91abd554f13@sraoss.co.jp>
In-Reply-To: <20230628170604.505955118ac2f91abd554f13@sraoss.co.jp>
From: jian he <jian.universality@gmail.com>
Date: Thu, 29 Jun 2023 00:40:45 +0800
Message-ID: <CACJufxEA-V+0Fa3Q6xQrFwG6wLs4DsX7E4exdfA=rO-70svmBw@mail.gmail.com>
Subject: Re: Incremental View Maintenance, take 2
To: Yugo NAGATA <nagata@sraoss.co.jp>
Cc: pgsql-hackers@postgresql.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/CACJufxEA-V%2B0Fa3Q6xQrFwG6wLs4DsX7E4exdfA%3DrO-70svmBw%40mail.gmail.com>
Precedence: bulk

On Wed, Jun 28, 2023 at 4:06=E2=80=AFPM Yugo NAGATA <nagata@sraoss.co.jp> w=
rote:
>
> On Wed, 28 Jun 2023 00:01:02 +0800
> jian he <jian.universality@gmail.com> wrote:
>
> > On Thu, Jun 1, 2023 at 2:47=E2=80=AFAM Yugo NAGATA <nagata@sraoss.co.jp=
> wrote:
> > >
> > > On Thu, 1 Jun 2023 23:59:09 +0900
> > > Yugo NAGATA <nagata@sraoss.co.jp> wrote:
> > >
> > > > Hello hackers,
> > > >
> > > > Here's a rebased version of the patch-set adding Incremental View
> > > > Maintenance support for PostgreSQL. That was discussed in [1].
> > >
> > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166=
b4874f8983a641123f5%40sraoss.co.jp
> > >
> > > ---------------------------------------------------------------------=
------------------
> > > * Overview
> > >
> > > Incremental View Maintenance (IVM) is a way to make materialized view=
s
> > > up-to-date by computing only incremental changes and applying them on
> > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when
> > > only small parts of the view are changed.
> > >
> > > ** Feature
> > >
> > > The attached patchset provides a feature that allows materialized vie=
ws
> > > to be updated automatically and incrementally just after a underlying
> > > table is modified.
> > >
> > > You can create an incementally maintainable materialized view (IMMV)
> > > by using CREATE INCREMENTAL MATERIALIZED VIEW command.
> > >
> > > The followings are supported in view definition queries:
> > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
> > > - some built-in aggregate functions (count, sum, avg, min, max)
> > > - GROUP BY clause
> > > - DISTINCT clause
> > >
> > > Views can contain multiple tuples with the same content (duplicate tu=
ples).
> > >
> > > ** Restriction
> > >
> > > The following are not supported in a view definition:
> > > - Outer joins
> > > - Aggregates otehr than above, window functions, HAVING
> > > - Sub-queries, CTEs
> > > - Set operations (UNION, INTERSECT, EXCEPT)
> > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET
> > >
> > > Also, a view definition query cannot contain other views, materialize=
d views,
> > > foreign tables, partitioned tables, partitions, VALUES, non-immutable=
 functions,
> > > system columns, or expressions that contains aggregates.
> > >
> > > ---------------------------------------------------------------------=
------------------
> > > * Design
> > >
> > > An IMMV is maintained using statement-level AFTER triggers.
> > > When an IMMV is created, triggers are automatically created on all ba=
se
> > > tables contained in the view definition query.
> > >
> > > When a table is modified, changes that occurred in the table are extr=
acted
> > > as transition tables in the AFTER triggers. Then, changes that will o=
ccur in
> > > the view are calculated by a rewritten view dequery in which the modi=
fied table
> > > is replaced with the transition table.
> > >
> > > For example, if the view is defined as "SELECT * FROM R, S", and tupl=
es inserted
> > > into R are stored in a transiton table dR, the tuples that will be in=
serted into
> > > the view are calculated as the result of "SELECT * FROM dR, S".
> > >
> > > ** Multiple Tables Modification
> > >
> > > Multiple tables can be modified in a statement when using triggers, f=
oreign key
> > > constraint, or modifying CTEs. When multiple tables are modified, we =
need
> > > the state of tables before the modification.
> > >
> > > For example, when some tuples, dR and dS, are inserted into R and S r=
espectively,
> > > the tuples that will be inserted into the view are calculated by the =
following
> > > two queries:
> > >
> > >  "SELECT * FROM dR, S_pre"
> > >  "SELECT * FROM R, dS"
> > >
> > > where S_pre is the table before the modification, R is the current st=
ate of
> > > table, that is, after the modification. This pre-update states of tab=
le
> > > is calculated by filtering inserted tuples and appending deleted tupl=
es.
> > > The subquery that represents pre-update state is generated in get_pre=
state_rte().
> > > Specifically, the insterted tuples are filtered by calling IVM_visibl=
e_in_prestate()
> > > in WHERE clause. This function checks the visibility of tuples by usi=
ng
> > > the snapshot taken before table modification. The deleted tuples are =
contained
> > > in the old transition table, and this table is appended using UNION A=
LL.
> > >
> > > Transition tables for each modification are collected in each AFTER t=
rigger
> > > function call. Then, the view maintenance is performed in the last ca=
ll of
> > > the trigger.
> > >
> > > In the original PostgreSQL, tuplestores of transition tables are free=
d at the
> > > end of each nested query. However, their lifespan needs to be prolong=
ed to
> > > the end of the out-most query in order to maintain the view in the la=
st AFTER
> > > trigger. For this purpose, SetTransitionTablePreserved is added in tr=
igger.c.
> > >
> > > ** Duplicate Tulpes
> > >
> > > When calculating changes that will occur in the view (=3D delta table=
s),
> > > multiplicity of tuples are calculated by using count(*).
> > >
> > > When deleting tuples from the view, tuples to be deleted are identifi=
ed by
> > > joining the delta table with the view, and tuples are deleted as many=
 as
> > > specified multiplicity by numbered using row_number() function.
> > > This is implemented in apply_old_delta().
> > >
> > > When inserting tuples into the view, each tuple is duplicated to the
> > > specified multiplicity using generate_series() function. This is impl=
emented
> > > in apply_new_delta().
> > >
> > > ** DISTINCT clause
> > >
> > > When DISTINCT is used, the view has a hidden column __ivm_count__ tha=
t
> > > stores multiplicity for tuples. When tuples are deleted from or inser=
ted into
> > > the view, the values of __ivm_count__ column is decreased or increase=
d as many
> > > as specified multiplicity. Eventually, when the values becomes zero, =
the
> > > corresponding tuple is deleted from the view.  This is implemented in
> > > apply_old_delta_with_count() and apply_new_delta_with_count().
> > >
> > > ** Aggregates
> > >
> > > Built-in count sum, avg, min, and max are supported. Whether a given
> > > aggregate function can be used or not is checked by using its OID in
> > > check_aggregate_supports_ivm().
> > >
> > > When creating a materialized view containing aggregates, in addition
> > > to __ivm_count__, more than one hidden columns for each aggregate are
> > > added to the target list. For example, columns for storing sum(x),
> > > count(x) are added if we have avg(x). When the view is maintained,
> > > aggregated values are updated using these hidden columns, also hidden
> > > columns are updated at the same time.
> > >
> > > The maintenance of aggregated view is performed in
> > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SE=
T
> > > clauses for updating columns are generated by append_set_clause_*().
> > >
> > > If the view has min(x) or max(x) and the minimum or maximal value is
> > > deleted from a table, we need to update the value to the new min/max
> > > recalculated from the tables rather than incremental computation. Thi=
s
> > > is performed in recalc_and_set_values().
> > >
> > > ---------------------------------------------------------------------=
------------------
> > > * Details of the patch-set (v28)
> > >
> > > > The patch-set consists of the following eleven patches.
> > >
> > > In the previous version, the number of patches were nine.
> > > In the latest patch-set, the patches are divided more finely
> > > aiming to make the review easier.
> > >
> > > > - 0001: Add a syntax to create Incrementally Maintainable Materiali=
zed Views
> > >
> > > The prposed syntax to create an incrementally maintainable materializ=
ed
> > > view (IMMV) is;
> > >
> > >  CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....;
> > >
> > > However, this syntax is tentative, so any suggestions are welcomed.
> > >
> > > > - 0002: Add relisivm column to pg_class system catalog
> > >
> > > We add a new field in pg_class to indicate a relation is IMMV.
> > > Another alternative is to add a new catalog for managing materialized
> > > views including IMMV, but I am not sure if we want this.
> > >
> > > > - 0003: Allow to prolong life span of transition tables until trans=
action end
> > >
> > > This patch fixes the trigger system to allow to prolong lifespan of
> > > tuple stores for transition tables until the transaction end. We need
> > > this because multiple transition tables have to be preserved until th=
e
> > > end of the out-most query when multiple tables are modified by nested
> > > triggers. (as explained above in Design - Multiple Tables Modificatio=
n)
> > >
> > > If we don't want to change the trigger system in such way, the altern=
ative
> > > is to copy the contents of transition tables to other tuplestores, al=
though
> > > it needs more time and memory.
> > >
> > > > - 0004: Add Incremental View Maintenance support to pg_dump
> > >
> > > This patch enables pg_dump to output IMMV using the new syntax.
> > >
> > > > - 0005: Add Incremental View Maintenance support to psql
> > >
> > > This patch implements tab-completion for the new syntax and adds
> > > information of IMMV to \d meta-command results.
> > >
> > > > - 0006: Add Incremental View Maintenance support
> > >
> > > This patch implements the basic IVM feature.
> > > DISTINCT and aggregate are not supported here.
> > >
> > > When an IMMV is created, the view query is checked, and if any
> > > non-supported feature is used, it raises an error. If it is ok,
> > > triggers are created on base tables and an unique index is
> > > created on the view if possible.
> > >
> > > In BEFORE trigger, an entry is created for each IMMV and the number
> > > of trigger firing is counted. Also, the snapshot just before the
> > > table modification is stored.
> > >
> > > In AFTER triggers, each transition tables are preserved. The number
> > > of trigger firing is counted also here, and when the firing number of
> > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER
> > > trigger call.
> > >
> > > In the final AFTER trigger, the IMMV is maintained. Rewritten view
> > > query is executed to generate delta tables, and deltas are applied
> > > to the view. If multiple tables are modified simultaneously, this
> > > process is iterated for each modified table. Tables before processed
> > > are represented in "pre-update-state", processed tables are
> > > "post-update-state" in the rewritten query.
> > >
> > > > - 0007: Add DISTINCT support for IVM
> > >
> > > This patch adds DISTINCT clause support.
> > >
> > > When an IMMV including DISTINCT is created, a hidden column
> > > "__ivm_count__" is added to the target list. This column has the
> > > number of duplicity of the same tuples. The duplicity is calculated
> > > by adding "count(*)" and GROUP BY to the view query.
> > >
> > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated=
,
> > > and a tuples whose duplicity becomes zero can be deleted from the vie=
w.
> > > This logic is implemented by SQL in apply_old_delta_with_count and
> > > apply_new_delta_with_count.
> > >
> > > Columns starting with "__ivm_" are deemed hidden columns that doesn't
> > > appear when a view is accessed by "SELECT * FROM ....".  This is
> > > implemented by fixing parse_relation.c.
> > >
> > > > - 0008: Add aggregates support in IVM
> > >
> > > This patch provides codes for aggregates support, specifically
> > > for builtin count, sum, and avg.
> > >
> > > When an IMMV containing an aggregate is created, it is checked if thi=
s
> > > aggregate function is supported, and if it is ok, some hidden columns
> > > are added to the target list.
> > >
> > > When the IMMV is maintained, the aggregated value is updated as well =
as
> > > related hidden columns. The way of update depends the type of aggrega=
te
> > > functions, and SET clause string is generated for each aggregate.
> > >
> > > > - 0009: Add support for min/max aggregates for IVM
> > >
> > > This patch adds min/max aggregates support.
> > >
> > > This is separated from #0008 because min/max needs more complicated
> > > work than count, sum, and avg.
> > >
> > > If the view has min(x) or max(x) and the minimum or maximal value is
> > > deleted from a table, we need to update the value to the new min/max
> > > recalculated from the tables rather than incremental computation.
> > > This is performed in recalc_and_set_values().
> > >
> > > TIDs and keys of tuples that need re-calculation are returned as a
> > > result of the query that deleted min/max values from the view using
> > > RETURNING clause. The plan to recalculate and set the new min/max val=
ue
> > > are stored and reused.
> > >
> > > > - 0010: regression tests
> > >
> > > This patch provides regression tests for IVM.
> > >
> > > > - 0011: documentation
> > >
> > > This patch provides documantation for IVM.
> > >
> > > ---------------------------------------------------------------------=
------------------
> > > * Changes from the Previous Version (v27)
> > >
> > > - Allow TRUNCATE on base tables
> > >
> > > When a base table is truncated, the view content will be empty if the
> > > view definition query does not contain an aggregate without a GROUP c=
lause.
> > > Therefore, such views can be truncated.
> > >
> > > Aggregate views without a GROUP clause always have one row. Therefore=
,
> > > if a base table is truncated, the view will not be empty and will con=
tain
> > > a row with NULL value (or 0 for count()). So, in this case, we refres=
h the
> > > view instead of truncating it.
> > >
> > > - Fix bugs reported by huyajun [1]
> > >
> > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BD=
DFDDA5D6A19587809%40qq.com
> > >
> > > ---------------------------------------------------------------------=
------------------
> > > * Discussion
> > >
> > > ** Aggregate support
> > >
> > > There were a few suggestions that general aggregate functions should =
be
> > > supported [2][3], which may be possible by extending pg_aggregate cat=
alog.
> > > However, we decided to leave supporting general aggregates to the fut=
ure work [4]
> > > because it would need substantial works and make the patch more compl=
ex and
> > > bigger.
> > >
> > > There has been no opposite opinion on this. However, if we need more =
discussion
> > > on the design of aggregate support, we can omit aggregate support for=
 the first
> > > release of IVM.
> > >
> > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40al=
vherre.pgsql
> > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiK=
GVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com
> > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1=
eca7babe093e%40sraoss.co.jp
> > >
> > > ** Hidden columns
> > >
> > > In order to support DISTINCT or aggregates, our implementation uses h=
idden columns.
> > >
> > > Columns starting with "__ivm_" are hidden columns that doesn't appear=
 when a
> > > view is accessed by "SELECT * FROM ....". For this aim, parse_relatio=
n.c is
> > > fixed. There was a proposal to enable hidden columns by adding a new =
flag to
> > > pg_attribute [5], but this thread is no longer active, so we decided =
to check
> > > the hidden column by its name [6].
> > >
> > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEn=
Vbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com
> > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1=
eca7babe093e%40sraoss.co.jp
> > >
> > > ** Concurrent Transactions
> > >
> > > When the view definition has more than one table, we acquire an exclu=
sive
> > > lock before the view maintenance in order to avoid inconsistent resul=
ts.
> > > This behavior was explained in [7]. The lock was improved to use weak=
er lock
> > > when the view has only one table based on a suggestion from Konstanti=
n Knizhnik [8].
> > > However, due to the implementation that uses ctid for identifying tar=
get tuples,
> > > we still have to use an exclusive lock for DELETE and UPDATE.
> > >
> > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3=
479668e82643%40sraoss.co.jp
> > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d=
279567e2a%40postgrespro.ru
> > >
> > > ** Automatic Index Creation
> > >
> > > When a view is created, a unique index is automatically created if
> > > possible, that is, if the view definition query has a GROUP BY or
> > > DISTINCT, or if the view contains all primary key attributes of
> > > its base tables in the target list. It is necessary for efficient
> > > view maintenance. This feature is based on a suggestion from
> > > Konstantin Knizhnik [9].
> > >
> > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e41=
5df6da14d%40postgrespro.ru
> > >
> > >
> > > ** Trigger and Transition Tables
> > >
> > > We implemented IVM based on triggers. This is because we want to use
> > > transition tables to extract changes on base tables. Also, there are
> > > other constraint that are using triggers in its implementation, like
> > > foreign references. However, if we can use transition table like feat=
ure
> > > without relying triggers, we don't have to insist to use triggers and=
 we
> > > might implement IVM in the executor directly as similar as declarativ=
e
> > > partitioning.
> > >
> > > ** Feature to be Supported in the First Release
> > >
> > > The current patch-set supports DISTINCT and aggregates for built-in c=
ount,
> > > sum, avg, min and max. Do we need all these feature for the first IVM=
 release?
> > > Supporting DISTINCT and aggregates needs discussion on hidden columns=
, and
> > > for supporting min/max we need to discuss on  re-calculation method. =
Before
> > > handling such relatively advanced feature, maybe, should we focus to =
design
> > > and implement of the basic feature of IVM?
> > >
> > >
> > > Any suggestion and discussion are welcomed!
> > >
> > > Regards,
> > > Yugo Nagata
> > >
> > > --
> > > Yugo NAGATA <nagata@sraoss.co.jp>
> > >
> > >
> >
> >
> > > The followings are supported in view definition queries:
> > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins)
> >
> >
> > > Also, a view definition query cannot contain other views, materialize=
d views,
> > > foreign tables, partitioned tables, partitions, VALUES, non-immutable=
 functions,
> > > system columns, or expressions that contains aggregates.
> >
> > Does this also apply to tableoid?  but tableoid is a constant, so it
> > should be fine?
> > can following two queries apply to this feature.
> > select tableoid, unique1 from tenk1;
>
> Currently, this is not allowed because tableoid is a system column.
> As you say, tableoid is a constant, so we can allow. Should we do this?
>
> > select 1 as constant, unique1 from tenk1;
>
> This is allowed, of course.
>
> > I didn't apply the patch.(will do later, for someone to test, it would
> > be a better idea to dump a whole file separately....).
>
> Thank you! I'm looking forward to your feedback.
> (I didn't attach a whole patch separately because I wouldn't like
> cfbot to be unhappy...)
>
> Regards,
> Yugo Nagata
>
> --
> Yugo NAGATA <nagata@sraoss.co.jp>

I played around first half of regress patch.
these all following queries fails.

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
    SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a;

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
    SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a;

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
    SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a;

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
    SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a;

CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS
    SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a;

so the hidden column reserved pattern "__ivm_count.*"? that would be a lot.=
...

select * from pg_matviews where matviewname =3D 'mv_ivm_1';
don't have relisivm option. it's reasonable to make it in view pg_matviews?