Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qEYEB-0004ke-IM for pgsql-hackers@arkaria.postgresql.org; Wed, 28 Jun 2023 16:41:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1qEYEA-0004WP-4g for pgsql-hackers@arkaria.postgresql.org; Wed, 28 Jun 2023 16:41:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qEYE9-0004S0-M6 for pgsql-hackers@lists.postgresql.org; Wed, 28 Jun 2023 16:41:09 +0000 Received: from mail-ua1-x92b.google.com ([2607:f8b0:4864:20::92b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1qEYE4-0010Ci-5i for pgsql-hackers@postgresql.org; Wed, 28 Jun 2023 16:41:09 +0000 Received: by mail-ua1-x92b.google.com with SMTP id a1e0cc1a2514c-79470b88d88so209687241.0 for ; Wed, 28 Jun 2023 09:41:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1687970462; x=1690562462; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Nlh7zgqmG+zR7gDmlLfyD1dlNJgGyglS9Rf76taV0bs=; b=mOIvLtKH4b8xG5DfcwgpRM1JpZt/tbIufZ0x/1XoUS6tpD6D4mwFmVrLWGPc6X3VJb H6ChDmlrXAp9ZJWtcnfvxKzlIhocIdIB3DUf5E7IErF69Y16Fy8S8YbiaTDVdb+VscJg fhex/YwZhlLFscPP6VsKoRczoG2Ai4NsdOhH60PZIBeJ+gC4myr9NLfFmAKdE2E+iyVh PaDvC98SfPNS+nwpqYoITv/u4UlrHEr9zFakZfX7xnKo1lnHnGHjMkFZCjv170cdiptK Bys+TJf0Mmik8df8xaVuOiyDS6tB5J0PS57XsDcbK+p0SJIUU2TM6NdJe+i2NCa0JKCL 6K9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687970462; x=1690562462; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Nlh7zgqmG+zR7gDmlLfyD1dlNJgGyglS9Rf76taV0bs=; b=SS1VabLAKbIm0m9/ot62FqWSTQoVTh807KoDZ03OQ4pSEe9vaFyohxLFzEfR2/FI9e zs3otV3Uu+CyFuGmTSDyJI/rqq9olJOVlL4jsTwIuCW+MCf7X5XWXt6d1J+EjGlAqqph nL4stsZ62IbsaceDf32mKEelAOm2dBYh6ASozFZrA8096QzLAaFXAQOW9nMK/YRZ1pkw 5vGDQIPQ0QUom70VFfo5/5vRqeIht1dnB/7kEU9tZSmPdhGAGqk3VQt9JLYJBy5UJ1w4 L4nV25JFOBpOF7/E3IwYiB7m3WLmKtBE7ddHTnxp9WuElFA6iXFY1I2rmg1DGz0oU/c1 5+rw== X-Gm-Message-State: AC+VfDyVYL4N9CcSwc0K8yNcJT8dHeHeJCmttbBkLR6grSOzOTiWVEGS 7h7kAz83ItksLaZJo5DsKXKvKmrr6tKZt2ymAR9TRHNFqyDjaw== X-Google-Smtp-Source: ACHHUZ5PZtRza3Estlda66zOwlbZ+rooS1+Lf5Aftq0bueNi7+SC+Ad6NkOXRJkh+zJobhWF4TRL/t3BIY8iHGsXwPE= X-Received: by 2002:a67:ea8f:0:b0:443:6b64:aed4 with SMTP id f15-20020a67ea8f000000b004436b64aed4mr5731061vso.26.1687970461678; Wed, 28 Jun 2023 09:41:01 -0700 (PDT) MIME-Version: 1.0 References: <20230601235909.0e1572c27e59112f9d0cbe86@sraoss.co.jp> <20230601034703.9e4f81f5d92ae6e3949b84d2@sraoss.co.jp> <20230628170604.505955118ac2f91abd554f13@sraoss.co.jp> In-Reply-To: <20230628170604.505955118ac2f91abd554f13@sraoss.co.jp> From: jian he Date: Thu, 29 Jun 2023 00:40:45 +0800 Message-ID: Subject: Re: Incremental View Maintenance, take 2 To: Yugo NAGATA Cc: pgsql-hackers@postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Jun 28, 2023 at 4:06=E2=80=AFPM Yugo NAGATA w= rote: > > On Wed, 28 Jun 2023 00:01:02 +0800 > jian he wrote: > > > On Thu, Jun 1, 2023 at 2:47=E2=80=AFAM Yugo NAGATA wrote: > > > > > > On Thu, 1 Jun 2023 23:59:09 +0900 > > > Yugo NAGATA wrote: > > > > > > > Hello hackers, > > > > > > > > Here's a rebased version of the patch-set adding Incremental View > > > > Maintenance support for PostgreSQL. That was discussed in [1]. > > > > > > > [1] https://www.postgresql.org/message-id/flat/20181227215726.4d166= b4874f8983a641123f5%40sraoss.co.jp > > > > > > ---------------------------------------------------------------------= ------------------ > > > * Overview > > > > > > Incremental View Maintenance (IVM) is a way to make materialized view= s > > > up-to-date by computing only incremental changes and applying them on > > > views. IVM is more efficient than REFRESH MATERIALIZED VIEW when > > > only small parts of the view are changed. > > > > > > ** Feature > > > > > > The attached patchset provides a feature that allows materialized vie= ws > > > to be updated automatically and incrementally just after a underlying > > > table is modified. > > > > > > You can create an incementally maintainable materialized view (IMMV) > > > by using CREATE INCREMENTAL MATERIALIZED VIEW command. > > > > > > The followings are supported in view definition queries: > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > - some built-in aggregate functions (count, sum, avg, min, max) > > > - GROUP BY clause > > > - DISTINCT clause > > > > > > Views can contain multiple tuples with the same content (duplicate tu= ples). > > > > > > ** Restriction > > > > > > The following are not supported in a view definition: > > > - Outer joins > > > - Aggregates otehr than above, window functions, HAVING > > > - Sub-queries, CTEs > > > - Set operations (UNION, INTERSECT, EXCEPT) > > > - DISTINCT ON, ORDER BY, LIMIT, OFFSET > > > > > > Also, a view definition query cannot contain other views, materialize= d views, > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable= functions, > > > system columns, or expressions that contains aggregates. > > > > > > ---------------------------------------------------------------------= ------------------ > > > * Design > > > > > > An IMMV is maintained using statement-level AFTER triggers. > > > When an IMMV is created, triggers are automatically created on all ba= se > > > tables contained in the view definition query. > > > > > > When a table is modified, changes that occurred in the table are extr= acted > > > as transition tables in the AFTER triggers. Then, changes that will o= ccur in > > > the view are calculated by a rewritten view dequery in which the modi= fied table > > > is replaced with the transition table. > > > > > > For example, if the view is defined as "SELECT * FROM R, S", and tupl= es inserted > > > into R are stored in a transiton table dR, the tuples that will be in= serted into > > > the view are calculated as the result of "SELECT * FROM dR, S". > > > > > > ** Multiple Tables Modification > > > > > > Multiple tables can be modified in a statement when using triggers, f= oreign key > > > constraint, or modifying CTEs. When multiple tables are modified, we = need > > > the state of tables before the modification. > > > > > > For example, when some tuples, dR and dS, are inserted into R and S r= espectively, > > > the tuples that will be inserted into the view are calculated by the = following > > > two queries: > > > > > > "SELECT * FROM dR, S_pre" > > > "SELECT * FROM R, dS" > > > > > > where S_pre is the table before the modification, R is the current st= ate of > > > table, that is, after the modification. This pre-update states of tab= le > > > is calculated by filtering inserted tuples and appending deleted tupl= es. > > > The subquery that represents pre-update state is generated in get_pre= state_rte(). > > > Specifically, the insterted tuples are filtered by calling IVM_visibl= e_in_prestate() > > > in WHERE clause. This function checks the visibility of tuples by usi= ng > > > the snapshot taken before table modification. The deleted tuples are = contained > > > in the old transition table, and this table is appended using UNION A= LL. > > > > > > Transition tables for each modification are collected in each AFTER t= rigger > > > function call. Then, the view maintenance is performed in the last ca= ll of > > > the trigger. > > > > > > In the original PostgreSQL, tuplestores of transition tables are free= d at the > > > end of each nested query. However, their lifespan needs to be prolong= ed to > > > the end of the out-most query in order to maintain the view in the la= st AFTER > > > trigger. For this purpose, SetTransitionTablePreserved is added in tr= igger.c. > > > > > > ** Duplicate Tulpes > > > > > > When calculating changes that will occur in the view (=3D delta table= s), > > > multiplicity of tuples are calculated by using count(*). > > > > > > When deleting tuples from the view, tuples to be deleted are identifi= ed by > > > joining the delta table with the view, and tuples are deleted as many= as > > > specified multiplicity by numbered using row_number() function. > > > This is implemented in apply_old_delta(). > > > > > > When inserting tuples into the view, each tuple is duplicated to the > > > specified multiplicity using generate_series() function. This is impl= emented > > > in apply_new_delta(). > > > > > > ** DISTINCT clause > > > > > > When DISTINCT is used, the view has a hidden column __ivm_count__ tha= t > > > stores multiplicity for tuples. When tuples are deleted from or inser= ted into > > > the view, the values of __ivm_count__ column is decreased or increase= d as many > > > as specified multiplicity. Eventually, when the values becomes zero, = the > > > corresponding tuple is deleted from the view. This is implemented in > > > apply_old_delta_with_count() and apply_new_delta_with_count(). > > > > > > ** Aggregates > > > > > > Built-in count sum, avg, min, and max are supported. Whether a given > > > aggregate function can be used or not is checked by using its OID in > > > check_aggregate_supports_ivm(). > > > > > > When creating a materialized view containing aggregates, in addition > > > to __ivm_count__, more than one hidden columns for each aggregate are > > > added to the target list. For example, columns for storing sum(x), > > > count(x) are added if we have avg(x). When the view is maintained, > > > aggregated values are updated using these hidden columns, also hidden > > > columns are updated at the same time. > > > > > > The maintenance of aggregated view is performed in > > > apply_old_delta_with_count() and apply_new_delta_with_count(). The SE= T > > > clauses for updating columns are generated by append_set_clause_*(). > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > deleted from a table, we need to update the value to the new min/max > > > recalculated from the tables rather than incremental computation. Thi= s > > > is performed in recalc_and_set_values(). > > > > > > ---------------------------------------------------------------------= ------------------ > > > * Details of the patch-set (v28) > > > > > > > The patch-set consists of the following eleven patches. > > > > > > In the previous version, the number of patches were nine. > > > In the latest patch-set, the patches are divided more finely > > > aiming to make the review easier. > > > > > > > - 0001: Add a syntax to create Incrementally Maintainable Materiali= zed Views > > > > > > The prposed syntax to create an incrementally maintainable materializ= ed > > > view (IMMV) is; > > > > > > CREATE INCREMENTAL MATERIALIZED VIEW AS SELECT .....; > > > > > > However, this syntax is tentative, so any suggestions are welcomed. > > > > > > > - 0002: Add relisivm column to pg_class system catalog > > > > > > We add a new field in pg_class to indicate a relation is IMMV. > > > Another alternative is to add a new catalog for managing materialized > > > views including IMMV, but I am not sure if we want this. > > > > > > > - 0003: Allow to prolong life span of transition tables until trans= action end > > > > > > This patch fixes the trigger system to allow to prolong lifespan of > > > tuple stores for transition tables until the transaction end. We need > > > this because multiple transition tables have to be preserved until th= e > > > end of the out-most query when multiple tables are modified by nested > > > triggers. (as explained above in Design - Multiple Tables Modificatio= n) > > > > > > If we don't want to change the trigger system in such way, the altern= ative > > > is to copy the contents of transition tables to other tuplestores, al= though > > > it needs more time and memory. > > > > > > > - 0004: Add Incremental View Maintenance support to pg_dump > > > > > > This patch enables pg_dump to output IMMV using the new syntax. > > > > > > > - 0005: Add Incremental View Maintenance support to psql > > > > > > This patch implements tab-completion for the new syntax and adds > > > information of IMMV to \d meta-command results. > > > > > > > - 0006: Add Incremental View Maintenance support > > > > > > This patch implements the basic IVM feature. > > > DISTINCT and aggregate are not supported here. > > > > > > When an IMMV is created, the view query is checked, and if any > > > non-supported feature is used, it raises an error. If it is ok, > > > triggers are created on base tables and an unique index is > > > created on the view if possible. > > > > > > In BEFORE trigger, an entry is created for each IMMV and the number > > > of trigger firing is counted. Also, the snapshot just before the > > > table modification is stored. > > > > > > In AFTER triggers, each transition tables are preserved. The number > > > of trigger firing is counted also here, and when the firing number of > > > BEFORE and AFTER trigger reach the same, it is deemed the final AFTER > > > trigger call. > > > > > > In the final AFTER trigger, the IMMV is maintained. Rewritten view > > > query is executed to generate delta tables, and deltas are applied > > > to the view. If multiple tables are modified simultaneously, this > > > process is iterated for each modified table. Tables before processed > > > are represented in "pre-update-state", processed tables are > > > "post-update-state" in the rewritten query. > > > > > > > - 0007: Add DISTINCT support for IVM > > > > > > This patch adds DISTINCT clause support. > > > > > > When an IMMV including DISTINCT is created, a hidden column > > > "__ivm_count__" is added to the target list. This column has the > > > number of duplicity of the same tuples. The duplicity is calculated > > > by adding "count(*)" and GROUP BY to the view query. > > > > > > When an IMMV is maintained, the duplicity in __ivm_count__ is updated= , > > > and a tuples whose duplicity becomes zero can be deleted from the vie= w. > > > This logic is implemented by SQL in apply_old_delta_with_count and > > > apply_new_delta_with_count. > > > > > > Columns starting with "__ivm_" are deemed hidden columns that doesn't > > > appear when a view is accessed by "SELECT * FROM ....". This is > > > implemented by fixing parse_relation.c. > > > > > > > - 0008: Add aggregates support in IVM > > > > > > This patch provides codes for aggregates support, specifically > > > for builtin count, sum, and avg. > > > > > > When an IMMV containing an aggregate is created, it is checked if thi= s > > > aggregate function is supported, and if it is ok, some hidden columns > > > are added to the target list. > > > > > > When the IMMV is maintained, the aggregated value is updated as well = as > > > related hidden columns. The way of update depends the type of aggrega= te > > > functions, and SET clause string is generated for each aggregate. > > > > > > > - 0009: Add support for min/max aggregates for IVM > > > > > > This patch adds min/max aggregates support. > > > > > > This is separated from #0008 because min/max needs more complicated > > > work than count, sum, and avg. > > > > > > If the view has min(x) or max(x) and the minimum or maximal value is > > > deleted from a table, we need to update the value to the new min/max > > > recalculated from the tables rather than incremental computation. > > > This is performed in recalc_and_set_values(). > > > > > > TIDs and keys of tuples that need re-calculation are returned as a > > > result of the query that deleted min/max values from the view using > > > RETURNING clause. The plan to recalculate and set the new min/max val= ue > > > are stored and reused. > > > > > > > - 0010: regression tests > > > > > > This patch provides regression tests for IVM. > > > > > > > - 0011: documentation > > > > > > This patch provides documantation for IVM. > > > > > > ---------------------------------------------------------------------= ------------------ > > > * Changes from the Previous Version (v27) > > > > > > - Allow TRUNCATE on base tables > > > > > > When a base table is truncated, the view content will be empty if the > > > view definition query does not contain an aggregate without a GROUP c= lause. > > > Therefore, such views can be truncated. > > > > > > Aggregate views without a GROUP clause always have one row. Therefore= , > > > if a base table is truncated, the view will not be empty and will con= tain > > > a row with NULL value (or 0 for count()). So, in this case, we refres= h the > > > view instead of truncating it. > > > > > > - Fix bugs reported by huyajun [1] > > > > > > [1] https://www.postgresql.org/message-id/tencent_FCAF11BCA5003FD16BD= DFDDA5D6A19587809%40qq.com > > > > > > ---------------------------------------------------------------------= ------------------ > > > * Discussion > > > > > > ** Aggregate support > > > > > > There were a few suggestions that general aggregate functions should = be > > > supported [2][3], which may be possible by extending pg_aggregate cat= alog. > > > However, we decided to leave supporting general aggregates to the fut= ure work [4] > > > because it would need substantial works and make the patch more compl= ex and > > > bigger. > > > > > > There has been no opposite opinion on this. However, if we need more = discussion > > > on the design of aggregate support, we can omit aggregate support for= the first > > > release of IVM. > > > > > > [2] https://www.postgresql.org/message-id/20191128140333.GA25947%40al= vherre.pgsql > > > [3] https://www.postgresql.org/message-id/CAM-w4HOvDrL4ou6m%3D592zUiK= GVzTcOpNj-d_cJqzL00fdsS5kg%40mail.gmail.com > > > [4] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1= eca7babe093e%40sraoss.co.jp > > > > > > ** Hidden columns > > > > > > In order to support DISTINCT or aggregates, our implementation uses h= idden columns. > > > > > > Columns starting with "__ivm_" are hidden columns that doesn't appear= when a > > > view is accessed by "SELECT * FROM ....". For this aim, parse_relatio= n.c is > > > fixed. There was a proposal to enable hidden columns by adding a new = flag to > > > pg_attribute [5], but this thread is no longer active, so we decided = to check > > > the hidden column by its name [6]. > > > > > > [5] https://www.postgresql.org/message-id/flat/CAEepm%3D3ZHh%3Dp0nEEn= Vbs1Dig_UShPzHUcMNAqvDQUgYgcDo-pA%40mail.gmail.com > > > [6] https://www.postgresql.org/message-id/20201016193034.9a4c44c79fc1= eca7babe093e%40sraoss.co.jp > > > > > > ** Concurrent Transactions > > > > > > When the view definition has more than one table, we acquire an exclu= sive > > > lock before the view maintenance in order to avoid inconsistent resul= ts. > > > This behavior was explained in [7]. The lock was improved to use weak= er lock > > > when the view has only one table based on a suggestion from Konstanti= n Knizhnik [8]. > > > However, due to the implementation that uses ctid for identifying tar= get tuples, > > > we still have to use an exclusive lock for DELETE and UPDATE. > > > > > > [7] https://www.postgresql.org/message-id/20200909092752.c91758a1bec3= 479668e82643%40sraoss.co.jp > > > [8] https://www.postgresql.org/message-id/5663f5f0-48af-686c-bf3c-62d= 279567e2a%40postgrespro.ru > > > > > > ** Automatic Index Creation > > > > > > When a view is created, a unique index is automatically created if > > > possible, that is, if the view definition query has a GROUP BY or > > > DISTINCT, or if the view contains all primary key attributes of > > > its base tables in the target list. It is necessary for efficient > > > view maintenance. This feature is based on a suggestion from > > > Konstantin Knizhnik [9]. > > > > > > [9] https://www.postgresql.org/message-id/89729da8-9042-7ea0-95af-e41= 5df6da14d%40postgrespro.ru > > > > > > > > > ** Trigger and Transition Tables > > > > > > We implemented IVM based on triggers. This is because we want to use > > > transition tables to extract changes on base tables. Also, there are > > > other constraint that are using triggers in its implementation, like > > > foreign references. However, if we can use transition table like feat= ure > > > without relying triggers, we don't have to insist to use triggers and= we > > > might implement IVM in the executor directly as similar as declarativ= e > > > partitioning. > > > > > > ** Feature to be Supported in the First Release > > > > > > The current patch-set supports DISTINCT and aggregates for built-in c= ount, > > > sum, avg, min and max. Do we need all these feature for the first IVM= release? > > > Supporting DISTINCT and aggregates needs discussion on hidden columns= , and > > > for supporting min/max we need to discuss on re-calculation method. = Before > > > handling such relatively advanced feature, maybe, should we focus to = design > > > and implement of the basic feature of IVM? > > > > > > > > > Any suggestion and discussion are welcomed! > > > > > > Regards, > > > Yugo Nagata > > > > > > -- > > > Yugo NAGATA > > > > > > > > > > > > > The followings are supported in view definition queries: > > > - SELECT ... FROM ... WHERE ..., joins (inner joins, self-joins) > > > > > > > Also, a view definition query cannot contain other views, materialize= d views, > > > foreign tables, partitioned tables, partitions, VALUES, non-immutable= functions, > > > system columns, or expressions that contains aggregates. > > > > Does this also apply to tableoid? but tableoid is a constant, so it > > should be fine? > > can following two queries apply to this feature. > > select tableoid, unique1 from tenk1; > > Currently, this is not allowed because tableoid is a system column. > As you say, tableoid is a constant, so we can allow. Should we do this? > > > select 1 as constant, unique1 from tenk1; > > This is allowed, of course. > > > I didn't apply the patch.(will do later, for someone to test, it would > > be a better idea to dump a whole file separately....). > > Thank you! I'm looking forward to your feedback. > (I didn't attach a whole patch separately because I wouldn't like > cfbot to be unhappy...) > > Regards, > Yugo Nagata > > -- > Yugo NAGATA I played around first half of regress patch. these all following queries fails. CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_count__" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_countblablabla" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_count" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_count_____" FROM mv_base_a; CREATE INCREMENTAL MATERIALIZED VIEW mv_ivm_rename AS SELECT DISTINCT * , 1 as "__ivm_countblabla" FROM mv_base_a; so the hidden column reserved pattern "__ivm_count.*"? that would be a lot.= ... select * from pg_matviews where matviewname =3D 'mv_ivm_1'; don't have relisivm option. it's reasonable to make it in view pg_matviews?