Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY

public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed

From: surya poondla <suryapoondla4@gmail.com>
To: Rafia Sabih <rafia.pghackers@gmail.com>
Cc: Giuliano Gagliardi <gogi@gogi.tv>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
Date: Mon, 2 Mar 2026 14:45:32 -0800
Message-ID: <CAOVWO5p2U8rMvh3pJ4WaNN7ES4SbiEKXLV_0pZ-UdYp0kACa7g@mail.gmail.com> (raw)
In-Reply-To: <CAOVWO5rfJGpxRDn496gAHmAxW7L_72==KkOcz0q16YOvqm4-=g@mail.gmail.com>
References: <40d694df-39fd-4a4a-9459-9d6489165f60@gogi.tv>
	<CAOVWO5p7PieGTm3GJC8NAYgSEQLoBdZw9Se4u=EbPrr1mW7d5Q@mail.gmail.com>
	<CAOVWO5rWkGXx37tpsAuTwrY5muPePHG44ke0xwYcakWVfhkPaw@mail.gmail.com>
	<CAOVWO5qchazoL8H+-d4eZ_zm8jFUH4YF3WvWnXqcDEh8i7hbhw@mail.gmail.com>
	<CAOVWO5qF2_FV7M=USdSgjjPO124YMpw7TY+M4+mnkNPOA2Bstg@mail.gmail.com>
	<CA+FpmFe7pgJRi16zP8Liz5vuUkVdTnPy30Zr=mXzZA3shKgvCw@mail.gmail.com>
	<CAOVWO5rfJGpxRDn496gAHmAxW7L_72==KkOcz0q16YOvqm4-=g@mail.gmail.com>

Hi All,

Thank you Rafia for the suggestions.
I split both the bugs in 2 different commits, attaching the patches here.

For bug1, I added the test case for NULL values too.

For bug 2, I only changed matview.c and added no test case as the timings
are not constant.

I ran the regression tests for both the patches and all tests succeeded in
both cases.

Regards,
Surya Poondla

>


Attachments:

  [application/octet-stream] 0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect_bug1.patch (6.7K, 3-0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect_bug1.patch)
  download | inline diff:
From a74bf7261d82f84b27396221e60cec440045dad5 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:22:11 -0800
Subject: [PATCH v3] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY to detect
 duplicate rows with NULLs

REFRESH MATERIALIZED VIEW CONCURRENTLY doesn't throw expected error as REFRESH MATERIALIZED VIEW
Issue:
REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly skipping duplicate
detection for rows containing any NULL values. This would cause the refresh
to silently succeed while leaving the materialized view with stale data,
rather than properly reporting duplicate row errors.

The bug occurred because the duplicate check used "WHERE newdata.* IS NOT NULL"
which returns false if any column contains NULL. When duplicates existed in
rows with NULLs (e.g., two rows of ('test', NULL)), the check was skipped
entirely. The subsequent FULL OUTER JOIN would then match both duplicate rows
to the same old row, producing an empty diff, causing no updates to be applied.

Fix:
The fix removes the "IS NOT NULL" preconditions from the duplicate detection
query. The query now correctly checks all rows using the record equality
operator (*=), which treats NULL as equal to NULL. This matches the same
equality semantics used by the FULL OUTER JOIN in the diff query, ensuring
consistent duplicate detection.

Added a regression test covering the case of duplicate rows that contain
NULL values.

Note:
The non-concurrent REFRESH was unaffected since it rebuilds indexes from
scratch, which properly detects duplicates during index creation.
---
 src/backend/commands/matview.c        | 20 +++++++++++---------
 src/test/regress/expected/matview.out | 16 +++++++++++++++-
 src/test/regress/sql/matview.sql      | 10 ++++++++++
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..e90b10972ef 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -635,11 +635,13 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
 
 	/*
-	 * We need to ensure that there are not duplicate rows without NULLs in
-	 * the new data set before we can count on the "diff" results.  Check for
-	 * that in a way that allows showing the first duplicated row found.  Even
-	 * after we pass this test, a unique index on the materialized view may
-	 * find a duplicate key problem.
+	 * We need to ensure that there are not duplicate rows in the new data set
+	 * before we can count on the "diff" results.  Check for that in a way
+	 * that allows showing the first duplicated row found.  We check for
+	 * duplicates using the record equality operator (*=), which treats NULLs
+	 * as equal to each other --- the same semantics used by the FULL OUTER
+	 * JOIN in the diff query below.  Even after we pass this test, a unique
+	 * index on the materialized view may find a duplicate key problem.
 	 *
 	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack to
 	 * keep ".*" from being expanded into multiple columns in a SELECT list.
@@ -648,9 +650,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	resetStringInfo(&querybuf);
 	appendStringInfo(&querybuf,
 					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
+					 "WHERE EXISTS "
+					 "(SELECT 1 FROM %s newdata2 "
+					 "WHERE newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
 					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
 					 "newdata.ctid)",
 					 tempname, tempname, tempname);
@@ -667,7 +669,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		 */
 		ereport(ERROR,
 				(errcode(ERRCODE_CARDINALITY_VIOLATION),
-				 errmsg("new data for materialized view \"%s\" contains duplicate rows without any null columns",
+				 errmsg("new data for materialized view \"%s\" contains duplicate rows",
 						RelationGetRelationName(matviewRel)),
 				 errdetail("Row: %s",
 						   SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1))));
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..3f66ada1b6b 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -396,10 +396,24 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 ERROR:  could not create unique index "mvtest_mv_a_idx"
 DETAIL:  Key (a)=(1) is duplicated.
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
-ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows without any null columns
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
 DETAIL:  Row: (1,10)
 DROP TABLE mvtest_foo CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that duplicate rows containing NULLs are also detected (bug fix)
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+ERROR:  could not create unique index "mvtest_mv_a_idx"
+DETAIL:  Key (a)=(test) is duplicated.
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
+DETAIL:  Row: (test,)
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..d14a5b95384 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -135,6 +135,16 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
 DROP TABLE mvtest_foo CASCADE;
 
+-- test that duplicate rows containing NULLs are also detected (bug fix)
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
-- 
2.39.5 (Apple Git-154)



  [application/octet-stream] 0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performance_bug2.patch (5.0K, 4-0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performance_bug2.patch)
  download | inline diff:
From 982d721013abf04b5eba36ce1abab350894883b2 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:23:17 -0800
Subject: [PATCH v3] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY performance
 with nullable indexed columns

Issue:
When a materialized view has a unique index on a nullable column, REFRESH
MATERIALIZED VIEW CONCURRENTLY included that column in the FULL OUTER JOIN
condition used to detect changes.  This caused severe performance degradation
because standard equality (col = col) evaluates to NULL when either side is
NULL, causing unchanged rows to appear as both deleted and re-inserted.

Example:
A materialized view with 1M rows where a nullable indexed column
contains all NULLs. When data is unchanged, the refresh should produce an
empty diff (0 rows). However, because NULL = NULL evaluates to NULL (not TRUE),
the join fails to match any rows. The diff incorrectly shows all 1M rows as
deleted and all 1M rows as newly inserted, so 2M changes in total, causing unnecessary
use of resources which drastically increased the time for "refresh materialized view concurrently" operation.

Fix:
Skip nullable columns when building the FULL OUTER JOIN equality conditions.
Only include columns with NOT NULL constraints from unique indexes in the join
predicate.
This is semantically correct because nullable columns in unique
indexes do not guarantee row uniqueness for NULL values (multiple NULLs are allowed).
The record equality operator (*=), which is always appended to the
join predicate, already handles all columns including nullable ones (i.e treating NULL as equal to NULL.)

The fix adds a new variable 'addedAnyQuals' to track whether any column conditions
were actually added to the query, separate from 'foundUniqueIndex' which tracks
whether a usable unique index exists (for validation).
This ensures correct SQL syntax when all columns in an index are nullable.
---
 src/backend/commands/matview.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..236c233914c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -602,6 +602,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	char	   *nsp;
 	TupleDesc	tupdesc;
 	bool		foundUniqueIndex;
+	bool		addedAnyQuals;
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
 	int16		relnatts;
@@ -715,6 +716,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	tupdesc = matviewRel->rd_att;
 	opUsedForQual = palloc0_array(Oid, relnatts);
 	foundUniqueIndex = false;
+	addedAnyQuals = false;
 
 	indexoidlist = RelationGetIndexList(matviewRel);
 
@@ -732,6 +734,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 			Datum		indclassDatum;
 			int			i;
 
+			foundUniqueIndex = true;
+
 			/* Must get indclass the hard way. */
 			indclassDatum = SysCacheGetAttrNotNull(INDEXRELID,
 												   indexRel->rd_indextuple,
@@ -753,6 +757,18 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				const char *leftop;
 				const char *rightop;
 
+				/*
+				 * Skip nullable columns.  Nullable columns in unique indexes
+				 * don't provide row uniqueness for NULL values (multiple NULLs
+				 * are allowed), so they cannot reliably identify matching rows.
+				 * Including them in the join condition causes NULL = NULL
+				 * comparisons which evaluate to NULL, making unchanged rows
+				 * appear different.  The record equality operator (*=) appended
+				 * below handles all columns including nullable ones.
+				 */
+				if (!attr->attnotnull)
+					continue;
+
 				/*
 				 * Identify the equality operator associated with this index
 				 * column.  First we need to look up the column's opclass.
@@ -788,7 +804,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				/*
 				 * Actually add the qual, ANDed with any others.
 				 */
-				if (foundUniqueIndex)
+				if (addedAnyQuals)
 					appendStringInfoString(&querybuf, " AND ");
 
 				leftop = quote_qualified_identifier("newdata",
@@ -801,7 +817,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 										 op,
 										 rightop, attrtype);
 
-				foundUniqueIndex = true;
+				addedAnyQuals = true;
 			}
 		}
 
@@ -826,8 +842,16 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				errmsg("could not find suitable unique index on materialized view \"%s\"",
 					   RelationGetRelationName(matviewRel)));
 
+	/*
+	 * The record equality operator (*=) is always included in the join
+	 * predicate.  It handles all columns correctly including nullable ones
+	 * that were skipped above, since *=  treats NULL as equal to NULL.
+	 */
+	if (addedAnyQuals)
+		appendStringInfoString(&querybuf, " AND ");
+
 	appendStringInfoString(&querybuf,
-						   " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) "
+						   "newdata.* OPERATOR(pg_catalog.*=) mv.*) "
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
-- 
2.39.5 (Apple Git-154)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: suryapoondla4@gmail.com, rafia.pghackers@gmail.com, gogi@gogi.tv, pgsql-bugs@lists.postgresql.org
  Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  In-Reply-To: <CAOVWO5p2U8rMvh3pJ4WaNN7ES4SbiEKXLV_0pZ-UdYp0kACa7g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox