Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY

public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed

Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
11+ messages / 3 participants
[nested] [flat]

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
@ 2026-02-18 22:32 surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: surya poondla @ 2026-02-18 22:32 UTC (permalink / raw)
  To: Rafia Sabih <rafia.pghackers@gmail.com>; +Cc: Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs@lists.postgresql.org

Hi All,


Also, for issue 1, additional test case should be added.
>
Sure, I will add test cases for issue 1.


For issue 2, it would be helpful if you may share some performance numbers
> to confirm if this solution is only improving the performance and not
> causing any regressions.
>
I ran check, check-world and didn't see any regressions.

Here is the output and performance improvement:
>>
>> postgres=# \timing on
>>
>> Timing is on.
>>
>> postgres=# DROP MATERIALIZED VIEW IF EXISTS s CASCADE;
>>
>> NOTICE:  materialized view "s" does not exist, skipping
>>
>> DROP MATERIALIZED VIEW
>>
>> Time: 0.858 ms
>>
>> postgres=#
>>
>> postgres=# CREATE MATERIALIZED VIEW s AS SELECT generate_series as x,
>> null as y FROM generate_series(1, 1000000);
>>
>> SELECT 1000000
>>
>> Time: 1076.254 ms (00:01.076)
>>
>> postgres=#
>>
>> postgres=# CREATE UNIQUE INDEX ON s(x);
>>
>> CREATE INDEX
>>
>> Time: 375.026 ms
>>
>> postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY s;
>>
>> REFRESH MATERIALIZED VIEW
>>
>> Time: 3807.143 ms (00:03.807)
>>
>> postgres=# CREATE UNIQUE INDEX ON s(y);
>>
>> CREATE INDEX
>>
>> Time: 331.382 ms
>>
>> postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY s;
>>
>> REFRESH MATERIALIZED VIEW
>>
>> Time: 3636.049 ms (00:03.636)
>> postgres=#
>>
>> As we can see the REFRESH MATERIALIZED VIEW CONCURRENTLY now takes 3636.049
>> ms
>>
> Regrading the performance, (quoting the output from my previous message)
with unique index having NULL values we see that both "REFRESH MATERIALIZED
VIEW CONCURRENTLY s;" operations (operation 1 was after CREATE UNIQUE INDEX
ON s(x); and operation 2 was after CREATE UNIQUE INDEX ON s(x);) take about
the same time. Without the patch, operation 2 was taking around ~11000
ms, due to NULL = NULL comparison checks and this was causing the
degradation.

Regarding different commits to each issue, I don't have any
particular opinion but since both the issues are related to the same
function and NULL comparison, I feel we can have a single commit, but open
to create 2 commits too.

Regards,
Surya Poondla


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
@ 2026-03-02 22:45 ` surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: surya poondla @ 2026-03-02 22:45 UTC (permalink / raw)
  To: Rafia Sabih <rafia.pghackers@gmail.com>; +Cc: Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs@lists.postgresql.org

Hi All,

Thank you Rafia for the suggestions.
I split both the bugs in 2 different commits, attaching the patches here.

For bug1, I added the test case for NULL values too.

For bug 2, I only changed matview.c and added no test case as the timings
are not constant.

I ran the regression tests for both the patches and all tests succeeded in
both cases.

Regards,
Surya Poondla

>


Attachments:

  [application/octet-stream] 0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect_bug1.patch (6.7K, 3-0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect_bug1.patch)
  download | inline diff:
From a74bf7261d82f84b27396221e60cec440045dad5 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:22:11 -0800
Subject: [PATCH v3] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY to detect
 duplicate rows with NULLs

REFRESH MATERIALIZED VIEW CONCURRENTLY doesn't throw expected error as REFRESH MATERIALIZED VIEW
Issue:
REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly skipping duplicate
detection for rows containing any NULL values. This would cause the refresh
to silently succeed while leaving the materialized view with stale data,
rather than properly reporting duplicate row errors.

The bug occurred because the duplicate check used "WHERE newdata.* IS NOT NULL"
which returns false if any column contains NULL. When duplicates existed in
rows with NULLs (e.g., two rows of ('test', NULL)), the check was skipped
entirely. The subsequent FULL OUTER JOIN would then match both duplicate rows
to the same old row, producing an empty diff, causing no updates to be applied.

Fix:
The fix removes the "IS NOT NULL" preconditions from the duplicate detection
query. The query now correctly checks all rows using the record equality
operator (*=), which treats NULL as equal to NULL. This matches the same
equality semantics used by the FULL OUTER JOIN in the diff query, ensuring
consistent duplicate detection.

Added a regression test covering the case of duplicate rows that contain
NULL values.

Note:
The non-concurrent REFRESH was unaffected since it rebuilds indexes from
scratch, which properly detects duplicates during index creation.
---
 src/backend/commands/matview.c        | 20 +++++++++++---------
 src/test/regress/expected/matview.out | 16 +++++++++++++++-
 src/test/regress/sql/matview.sql      | 10 ++++++++++
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..e90b10972ef 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -635,11 +635,13 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
 
 	/*
-	 * We need to ensure that there are not duplicate rows without NULLs in
-	 * the new data set before we can count on the "diff" results.  Check for
-	 * that in a way that allows showing the first duplicated row found.  Even
-	 * after we pass this test, a unique index on the materialized view may
-	 * find a duplicate key problem.
+	 * We need to ensure that there are not duplicate rows in the new data set
+	 * before we can count on the "diff" results.  Check for that in a way
+	 * that allows showing the first duplicated row found.  We check for
+	 * duplicates using the record equality operator (*=), which treats NULLs
+	 * as equal to each other --- the same semantics used by the FULL OUTER
+	 * JOIN in the diff query below.  Even after we pass this test, a unique
+	 * index on the materialized view may find a duplicate key problem.
 	 *
 	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack to
 	 * keep ".*" from being expanded into multiple columns in a SELECT list.
@@ -648,9 +650,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	resetStringInfo(&querybuf);
 	appendStringInfo(&querybuf,
 					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
+					 "WHERE EXISTS "
+					 "(SELECT 1 FROM %s newdata2 "
+					 "WHERE newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
 					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
 					 "newdata.ctid)",
 					 tempname, tempname, tempname);
@@ -667,7 +669,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		 */
 		ereport(ERROR,
 				(errcode(ERRCODE_CARDINALITY_VIOLATION),
-				 errmsg("new data for materialized view \"%s\" contains duplicate rows without any null columns",
+				 errmsg("new data for materialized view \"%s\" contains duplicate rows",
 						RelationGetRelationName(matviewRel)),
 				 errdetail("Row: %s",
 						   SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1))));
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..3f66ada1b6b 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -396,10 +396,24 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 ERROR:  could not create unique index "mvtest_mv_a_idx"
 DETAIL:  Key (a)=(1) is duplicated.
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
-ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows without any null columns
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
 DETAIL:  Row: (1,10)
 DROP TABLE mvtest_foo CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that duplicate rows containing NULLs are also detected (bug fix)
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+ERROR:  could not create unique index "mvtest_mv_a_idx"
+DETAIL:  Key (a)=(test) is duplicated.
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
+DETAIL:  Row: (test,)
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..d14a5b95384 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -135,6 +135,16 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
 DROP TABLE mvtest_foo CASCADE;
 
+-- test that duplicate rows containing NULLs are also detected (bug fix)
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
-- 
2.39.5 (Apple Git-154)



  [application/octet-stream] 0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performance_bug2.patch (5.0K, 4-0003-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performance_bug2.patch)
  download | inline diff:
From 982d721013abf04b5eba36ce1abab350894883b2 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:23:17 -0800
Subject: [PATCH v3] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY performance
 with nullable indexed columns

Issue:
When a materialized view has a unique index on a nullable column, REFRESH
MATERIALIZED VIEW CONCURRENTLY included that column in the FULL OUTER JOIN
condition used to detect changes.  This caused severe performance degradation
because standard equality (col = col) evaluates to NULL when either side is
NULL, causing unchanged rows to appear as both deleted and re-inserted.

Example:
A materialized view with 1M rows where a nullable indexed column
contains all NULLs. When data is unchanged, the refresh should produce an
empty diff (0 rows). However, because NULL = NULL evaluates to NULL (not TRUE),
the join fails to match any rows. The diff incorrectly shows all 1M rows as
deleted and all 1M rows as newly inserted, so 2M changes in total, causing unnecessary
use of resources which drastically increased the time for "refresh materialized view concurrently" operation.

Fix:
Skip nullable columns when building the FULL OUTER JOIN equality conditions.
Only include columns with NOT NULL constraints from unique indexes in the join
predicate.
This is semantically correct because nullable columns in unique
indexes do not guarantee row uniqueness for NULL values (multiple NULLs are allowed).
The record equality operator (*=), which is always appended to the
join predicate, already handles all columns including nullable ones (i.e treating NULL as equal to NULL.)

The fix adds a new variable 'addedAnyQuals' to track whether any column conditions
were actually added to the query, separate from 'foundUniqueIndex' which tracks
whether a usable unique index exists (for validation).
This ensures correct SQL syntax when all columns in an index are nullable.
---
 src/backend/commands/matview.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..236c233914c 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -602,6 +602,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	char	   *nsp;
 	TupleDesc	tupdesc;
 	bool		foundUniqueIndex;
+	bool		addedAnyQuals;
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
 	int16		relnatts;
@@ -715,6 +716,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	tupdesc = matviewRel->rd_att;
 	opUsedForQual = palloc0_array(Oid, relnatts);
 	foundUniqueIndex = false;
+	addedAnyQuals = false;
 
 	indexoidlist = RelationGetIndexList(matviewRel);
 
@@ -732,6 +734,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 			Datum		indclassDatum;
 			int			i;
 
+			foundUniqueIndex = true;
+
 			/* Must get indclass the hard way. */
 			indclassDatum = SysCacheGetAttrNotNull(INDEXRELID,
 												   indexRel->rd_indextuple,
@@ -753,6 +757,18 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				const char *leftop;
 				const char *rightop;
 
+				/*
+				 * Skip nullable columns.  Nullable columns in unique indexes
+				 * don't provide row uniqueness for NULL values (multiple NULLs
+				 * are allowed), so they cannot reliably identify matching rows.
+				 * Including them in the join condition causes NULL = NULL
+				 * comparisons which evaluate to NULL, making unchanged rows
+				 * appear different.  The record equality operator (*=) appended
+				 * below handles all columns including nullable ones.
+				 */
+				if (!attr->attnotnull)
+					continue;
+
 				/*
 				 * Identify the equality operator associated with this index
 				 * column.  First we need to look up the column's opclass.
@@ -788,7 +804,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				/*
 				 * Actually add the qual, ANDed with any others.
 				 */
-				if (foundUniqueIndex)
+				if (addedAnyQuals)
 					appendStringInfoString(&querybuf, " AND ");
 
 				leftop = quote_qualified_identifier("newdata",
@@ -801,7 +817,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 										 op,
 										 rightop, attrtype);
 
-				foundUniqueIndex = true;
+				addedAnyQuals = true;
 			}
 		}
 
@@ -826,8 +842,16 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				errmsg("could not find suitable unique index on materialized view \"%s\"",
 					   RelationGetRelationName(matviewRel)));
 
+	/*
+	 * The record equality operator (*=) is always included in the join
+	 * predicate.  It handles all columns correctly including nullable ones
+	 * that were skipped above, since *=  treats NULL as equal to NULL.
+	 */
+	if (addedAnyQuals)
+		appendStringInfoString(&querybuf, " AND ");
+
 	appendStringInfoString(&querybuf,
-						   " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) "
+						   "newdata.* OPERATOR(pg_catalog.*=) mv.*) "
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
@ 2026-03-05 01:01   ` Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: Rafia Sabih @ 2026-03-05 01:01 UTC (permalink / raw)
  To: surya poondla <suryapoondla4@gmail.com>; +Cc: Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs@lists.postgresql.org

On Mon, 2 Mar 2026 at 14:45, surya poondla <suryapoondla4@gmail.com> wrote:

> Hi All,
>
> Thank you Rafia for the suggestions.
> I split both the bugs in 2 different commits, attaching the patches here.
>
> For bug1, I added the test case for NULL values too.
>
> Thanks for working on this. This looks good to me.
+-- test that duplicate rows containing NULLs are also detected (bug fix)
I wouldn't use bug fix here, it looks fine without it.

>

-- 
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
@ 2026-03-09 13:57     ` Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: Rafia Sabih @ 2026-03-09 13:57 UTC (permalink / raw)
  To: surya poondla <suryapoondla4@gmail.com>; +Cc: Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs@lists.postgresql.org

In order to take this further, I think it would be good to add this to
commitfest.

On Wed, 4 Mar 2026 at 17:01, Rafia Sabih <rafia.pghackers@gmail.com> wrote:

>
>
> On Mon, 2 Mar 2026 at 14:45, surya poondla <suryapoondla4@gmail.com>
> wrote:
>
>> Hi All,
>>
>> Thank you Rafia for the suggestions.
>> I split both the bugs in 2 different commits, attaching the patches here.
>>
>> For bug1, I added the test case for NULL values too.
>>
>> Thanks for working on this. This looks good to me.
> +-- test that duplicate rows containing NULLs are also detected (bug fix)
> I wouldn't use bug fix here, it looks fine without it.
>
>>
>
> --
> Regards,
> Rafia Sabih
> CYBERTEC PostgreSQL International GmbH
>


-- 
Regards,
Rafia Sabih
CYBERTEC PostgreSQL International GmbH


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
@ 2026-03-12 00:45       ` surya poondla <suryapoondla4@gmail.com>
  2026-03-12 11:33         ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  0 siblings, 1 reply; 11+ messages in thread

From: surya poondla @ 2026-03-12 00:45 UTC (permalink / raw)
  To: Rafia Sabih <rafia.pghackers@gmail.com>; +Cc: Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs@lists.postgresql.org

Hi Rafia,
Thank you for the suggestion.

I created the commit fest entries:
https://commitfest.postgresql.org/patch/6579/ (for bug 1)
https://commitfest.postgresql.org/patch/6580/ (for the performance
improvement)

Regards,
Surya Poondla


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
@ 2026-03-12 11:33         ` =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-13 07:20           ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  0 siblings, 1 reply; 11+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-03-12 11:33 UTC (permalink / raw)
  To: =?utf-8?B?c3VyeWEgcG9vbmRsYQ==?= <suryapoondla4@gmail.com>; =?utf-8?B?UmFmaWEgU2FiaWg=?= <rafia.pghackers@gmail.com>; +Cc: =?utf-8?B?R2l1bGlhbm8gR2FnbGlhcmRp?= <gogi@gogi.tv>; =?utf-8?B?cGdzcWwtYnVncw==?= <pgsql-bugs@lists.postgresql.org>

Hi,

I think we might want the "*=" operator treat NULL as not equal to NULL and
this is why we add "IS NOT NULL" to the duplicate detection query.

Your patch treats NULL as equal to NULL, which is different from the SQL
standard, may confuse users.

So I think we should make the "*=" operator treat NULL as not equal to NULL
or add a new operator to implement it. Thoughts?

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-12 11:33         ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
@ 2026-03-13 07:20           ` =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-20 18:01             ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-03-13 07:20 UTC (permalink / raw)
  To: =?utf-8?B?c3VyeWEgcG9vbmRsYQ==?= <suryapoondla4@gmail.com>; =?utf-8?B?UmFmaWEgU2FiaWg=?= <rafia.pghackers@gmail.com>; +Cc: =?utf-8?B?R2l1bGlhbm8gR2FnbGlhcmRp?= <gogi@gogi.tv>; =?utf-8?B?cGdzcWwtYnVncw==?= <pgsql-bugs@lists.postgresql.org>

> I think we might want the "*=" operator treat NULL as not equal to NULL and
> this is why we add "IS NOT NULL" to the duplicate detection query.
> 
> Your patch treats NULL as equal to NULL, which is different from the SQL
> standard, may confuse users.
> 
> So I think we should make the "*=" operator treat NULL as not equal to NULL
> or add a new operator to implement it. Thoughts?

Attach a patch. I add a new built-in function called record_image_eq_variant
which considers two NULLs not equal so that each row can match at most one
row during the full join.

--
Regards,
ChangAo Chen


Attachments:

  [application/octet-stream] v4-0001-Fix-wrong-result-of-refresh-matview-concurrently.patch (7.1K, 2-v4-0001-Fix-wrong-result-of-refresh-matview-concurrently.patch)
  download | inline diff:
From cde9f3c1af6aa1c2c33db651af31a81fd5f9a724 Mon Sep 17 00:00:00 2001
From: ChangAo Chen <cca5507@qq.com>
Date: Fri, 13 Mar 2026 14:53:21 +0800
Subject: [PATCH v4] Fix wrong result of refresh matview concurrently.

During populating the diff table, we use the "*=" operator which
considers two NULLs equal. This can lead to one row may match more
than one row during the full join. If this happens, we will get a
wrong diff table. To fix it, we add a new built-in function called
record_image_eq_variant which considers two NULLs not equal and use
it when populating the diff table. This makes sure that each row can
match at most one row during the full join.
---
 src/backend/commands/matview.c        |  6 +++-
 src/backend/utils/adt/rowtypes.c      | 46 +++++++++++++++++----------
 src/include/catalog/catversion.h      |  2 +-
 src/include/catalog/pg_proc.dat       |  5 +++
 src/test/regress/expected/matview.out | 18 +++++++++++
 src/test/regress/sql/matview.sql      | 17 ++++++++++
 6 files changed, 75 insertions(+), 19 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..fc63e86eac1 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -826,8 +826,12 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				errmsg("could not find suitable unique index on materialized view \"%s\"",
 					   RelationGetRelationName(matviewRel)));
 
+	/*
+	 * We use record_image_eq_variant which considers two NULLs not equal
+	 * so that each row can match at most one row during the full join.
+	 */
 	appendStringInfoString(&querybuf,
-						   " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) "
+						   " AND pg_catalog.record_image_eq_variant(newdata.*, mv.*)) "
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
diff --git a/src/backend/utils/adt/rowtypes.c b/src/backend/utils/adt/rowtypes.c
index e4eb7111ee7..64592162fab 100644
--- a/src/backend/utils/adt/rowtypes.c
+++ b/src/backend/utils/adt/rowtypes.c
@@ -1583,16 +1583,19 @@ record_image_cmp(FunctionCallInfo fcinfo)
 }
 
 /*
- * record_image_eq :
+ * record_image_eq_internal :
  *		  compares two records for identical contents, based on byte images
  * result :
  *		  returns true if the records are identical, false otherwise.
  *
  * Note: we do not use record_image_cmp here, since we can avoid
  * de-toasting for unequal lengths this way.
+ *
+ * Note: we consider two NULLs equal if null_equals_null is true, otherwise
+ * not equal.
  */
-Datum
-record_image_eq(PG_FUNCTION_ARGS)
+static bool
+record_image_eq_internal(FunctionCallInfo fcinfo, bool null_equals_null)
 {
 	HeapTupleHeader record1 = PG_GETARG_HEAPTUPLEHEADER(0);
 	HeapTupleHeader record2 = PG_GETARG_HEAPTUPLEHEADER(1);
@@ -1720,21 +1723,18 @@ record_image_eq(PG_FUNCTION_ARGS)
 							j + 1)));
 
 		/*
-		 * We consider two NULLs equal; NULL > not-NULL.
+		 * We consider two NULLs equal if null_equals_null is
+		 * true, otherwise not equal.
 		 */
-		if (!nulls1[i1] || !nulls2[i2])
-		{
-			if (nulls1[i1] || nulls2[i2])
-			{
-				result = false;
-				break;
-			}
-
-			/* Compare the pair of elements */
+		if (nulls1[i1] && nulls2[i2])
+			result = null_equals_null;
+		else if (nulls1[i1] || nulls2[i2])
+			result = false;
+		else
 			result = datum_image_eq(values1[i1], values2[i2], att1->attbyval, att2->attlen);
-			if (!result)
-				break;
-		}
+
+		if (!result)
+			break;
 
 		/* equal, so continue to next column */
 		i1++, i2++, j++;
@@ -1764,7 +1764,19 @@ record_image_eq(PG_FUNCTION_ARGS)
 	PG_FREE_IF_COPY(record1, 0);
 	PG_FREE_IF_COPY(record2, 1);
 
-	PG_RETURN_BOOL(result);
+	return result;
+}
+
+Datum
+record_image_eq(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_BOOL(record_image_eq_internal(fcinfo, true));
+}
+
+Datum
+record_image_eq_variant(PG_FUNCTION_ARGS)
+{
+	PG_RETURN_BOOL(record_image_eq_internal(fcinfo, false));
 }
 
 Datum
diff --git a/src/include/catalog/catversion.h b/src/include/catalog/catversion.h
index 90f46b03502..3da6a75ff87 100644
--- a/src/include/catalog/catversion.h
+++ b/src/include/catalog/catversion.h
@@ -57,6 +57,6 @@
  */
 
 /*							yyyymmddN */
-#define CATALOG_VERSION_NO	202603101
+#define CATALOG_VERSION_NO	202603131
 
 #endif
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 361e2cfffeb..42e7d5f946d 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -10856,6 +10856,11 @@
   proname => 'btequalimage', prorettype => 'bool', proargtypes => 'oid',
   prosrc => 'btequalimage' },
 
+# used for REFRESH MATERIALIZED VIEW CONCURRENTLY
+{ oid => '9145', descr => 'variant of record_image_eq',
+  proname => 'record_image_eq_variant', prorettype => 'bool',
+  proargtypes => 'record record', prosrc => 'record_image_eq_variant' },
+
 # Extensions
 { oid => '3082', descr => 'list available extensions',
   proname => 'pg_available_extensions', procost => '10', prorows => '100',
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..ebd9d806698 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -400,6 +400,24 @@ ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows witho
 DETAIL:  Row: (1,10)
 DROP TABLE mvtest_foo CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that two NULLs not equal when populating the diff table
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(1, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo SELECT * FROM mvtest_foo;
+DO $$
+BEGIN
+  BEGIN
+    REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+  EXCEPTION
+    WHEN unique_violation THEN RAISE NOTICE 'unique violation';
+  END;
+END;
+$$;
+NOTICE:  unique violation
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..a27f81c349d 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -135,6 +135,23 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
 DROP TABLE mvtest_foo CASCADE;
 
+-- test that two NULLs not equal when populating the diff table
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(1, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo SELECT * FROM mvtest_foo;
+DO $$
+BEGIN
+  BEGIN
+    REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+  EXCEPTION
+    WHEN unique_violation THEN RAISE NOTICE 'unique violation';
+  END;
+END;
+$$;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-12 11:33         ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-13 07:20           ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
@ 2026-03-20 18:01             ` surya poondla <suryapoondla4@gmail.com>
  2026-03-21 06:19               ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  0 siblings, 1 reply; 11+ messages in thread

From: surya poondla @ 2026-03-20 18:01 UTC (permalink / raw)
  To: cca5507 <cca5507@qq.com>; +Cc: Rafia Sabih <rafia.pghackers@gmail.com>; Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs <pgsql-bugs@lists.postgresql.org>

Hi ChangAo,

Thank you for the detailed review.

For issue 1, my fix removes the IS NOT NULL guard from the pre-check so
that *= can detect all duplicate rows, including those containing NULLs.
(Note: The semantics of *=  has always treated NULL as equal to NULL.)

The reasoning is straightforward: the JOIN uses *= to match newdata rows
against MV rows. If newdata contains two *=-equal rows, both would match
the same MV row in the JOIN, producing a wrong diff. The pre-check must
therefore use the same *= semantics to catch exactly those duplicates
which is what my fix does by removing the IS NOT NULL guard.
The IS NOT NULL guard was the bug as it was hiding real duplicates from
detection.

Your approach leaves the pre-check unchanged and instead replaces *= in
the JOIN with record_image_eq_variant (NULL != NULL). I see two concerns:
1. record_image_eq_variant applies NULL != NULL globally to all rows in
   the JOIN, not just duplicate ones. This means any unchanged row
   containing any NULL in any column will never match its counterpart
   during the JOIN, causing a DELETE + INSERT for that row on every
   refresh even when the data has not changed. The original issue 2 was
   specifically about nullable indexed columns, your fix extends the
   performance problem to all nullable columns anywhere in the row,
   which makes the performance worse than issue 2.
2. The error surfaced becomes a unique_violation from the index rather
   than the explicit "contains duplicate rows" message, which is harder
   for users to diagnose.

Regards,
Surya Poondla


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-12 11:33         ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-13 07:20           ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-20 18:01             ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
@ 2026-03-21 06:19               ` =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-25 22:45                 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: =?utf-8?B?Y2NhNTUwNw==?= @ 2026-03-21 06:19 UTC (permalink / raw)
  To: =?utf-8?B?c3VyeWEgcG9vbmRsYQ==?= <suryapoondla4@gmail.com>; +Cc: =?utf-8?B?UmFmaWEgU2FiaWg=?= <rafia.pghackers@gmail.com>; =?utf-8?B?R2l1bGlhbm8gR2FnbGlhcmRp?= <gogi@gogi.tv>; =?utf-8?B?cGdzcWwtYnVncw==?= <pgsql-bugs@lists.postgresql.org>

Hi Surya,

> For issue 1, my fix removes the IS NOT NULL guard from the pre-check so
> that *= can detect all duplicate rows, including those containing NULLs.
> (Note: The semantics of *=  has always treated NULL as equal to NULL.)
> 
> The reasoning is straightforward: the JOIN uses *= to match newdata rows
> against MV rows. If newdata contains two *=-equal rows, both would match
> the same MV row in the JOIN, producing a wrong diff. The pre-check must
> therefore use the same *= semantics to catch exactly those duplicates 
> which is what my fix does by removing the IS NOT NULL guard.
> The IS NOT NULL guard was the bug as it was hiding real duplicates from detection.

If I understand correctly, your fix will break the following case which works well currently:

CREATE TABLE t (a int, b int);
INSERT INTO t VALUES (null, null);
CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
CREATE UNIQUE INDEX ON m(a);
INSERT INTO t VALUES (null, null);
REFRESH MATERIALIZED VIEW CONCURRENTLY m;

Your fix will report an error because of the two (null, null) rows. On master, this case
works well because of the join condition "mv.a = newdada.a" which considers two
NULLs not equal, so we will get a correct diff table.

> Your approach leaves the pre-check unchanged and instead replaces *= in
> the JOIN with record_image_eq_variant (NULL != NULL). I see two concerns:
> 1. record_image_eq_variant applies NULL != NULL globally to all rows in
>    the JOIN, not just duplicate ones. This means any unchanged row
>    containing any NULL in any column will never match its counterpart
>    during the JOIN, causing a DELETE + INSERT for that row on every
>    refresh even when the data has not changed. The original issue 2 was
>    specifically about nullable indexed columns, your fix extends the
>    performance problem to all nullable columns anywhere in the row,
>    which makes the performance worse than issue 2.

Yeah, you're right. Any row containing any NULL in any column will get into the
diff table. But it's for correctness. Maybe user should avoid using CONCURRENTLY
with a lot of rows containing NULL.

> 2. The error surfaced becomes a unique_violation from the index rather
>    than the explicit "contains duplicate rows" message, which is harder
>    for users to diagnose.

I don't think it's a big issue.

--
Regards,
ChangAo Chen


^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-12 11:33         ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-13 07:20           ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-20 18:01             ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-21 06:19               ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
@ 2026-03-25 22:45                 ` surya poondla <suryapoondla4@gmail.com>
  2026-05-18 16:48                   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  0 siblings, 1 reply; 11+ messages in thread

From: surya poondla @ 2026-03-25 22:45 UTC (permalink / raw)
  To: cca5507 <cca5507@qq.com>; +Cc: Rafia Sabih <rafia.pghackers@gmail.com>; Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs <pgsql-bugs@lists.postgresql.org>

Hi ChangAo,

Thank you for the continued review, your test case helped me find a
real issue in v3 patch, and I updated the fix accordingly.

You were correct that v3 was wrong. Removing IS NOT NULL and using *= alone
in the pre-check was too aggressive: *= treats NULL=NULL=true, which caused
(NULL,NULL)×2 with a unique index on a nullable column to be flagged
as duplicates even though the unique index allows multiple NULLs.

The updated patch takes a different approach. Instead of removing the guard
and relying on *= alone, the pre-check now uses the same per-column equality
operators as the FULL OUTER JOIN accumulated during the same
index-scanning loop. These operators treat NULL=NULL as false, which is
consistent with how
unique indexes actually work (NULLs are distinct).

For (test, NULL)×2 with a single index on a (non-null):
newdata2.a = newdata.a i.e 'test'='test' that is TRUE
newdata2.* *= newdata.* that is TRUE
Thus the duplicate is caught and error is raised

For (NULL, NULL)×2 with a single index on a (nullable):
newdata2.a = newdata.a i.e NULL=NULL that is NULL (false)
Thus no duplicate is caught, the refresh correctly succeeds, MV gets 2 rows

For (test, NULL)×2 with a composite index on (a, b):
newdata2.a = newdata.a i.e TRUE
newdata2.b = newdata.b i.e NULL=NULL that is NULL (false)
Combined: NULL is not caught, refresh correctly succeeds

Regarding your record_image_eq_variant approach: it correctly handles the
NULL-in-indexed-column case, but it introduces a performance regression for
unchanged rows where any non-indexed column contains NULL.
For example, an unchanged row (1, NULL) with a unique index on non-null
index on a would require a DELETE+INSERT on every CONCURRENTLY refresh
because
record_image_eq_variant((1,NULL),(1,NULL)) returns false.
This makes CONCURRENTLY impractical for any table where rows contain NULLs
in non-indexed columns.

The updated patch for bug 1.

Here is some additional tests I did:
postgres=# SET client_min_messages = WARNING;
SET
postgres=# -- Test 1: (test,NULL)×2, single index on 'a' should ERROR
postgres=#   CREATE TABLE t(a text, b text);
  INSERT INTO t VALUES('test', NULL);
  CREATE MATERIALIZED VCREATE TABLE
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
  CREATE UNIQUE INDEX ON m(a);
  INSERT INTO t VALUES('test', NULL);
  REFSELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must error
ERROR:  new data for materialized view "m" contains duplicate rows
DETAIL:  Row: (test,)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# -- Test 2: (NULL,NULL)×2, single index on 'a' should SUCCEED
postgres=#   CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=#   INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   INSERT INTO t VALUES(NULL, NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must succeed
  SELECT COUNT(*) FROM m;
  DROP TABLE t CASCADE;REFRESH MATERIALIZED VIEW
postgres=#   SELECT COUNT(*) FROM m;  -- should be 2
 count
-------
     2
(1 row)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# --Test 3: (test,NULL)×2, composite index (a,b) should SUCCEED
postgres=#   CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a, b);
  INSERT INTO t VALUES('test', NUL  CREATE UNIQUE INDEX ON m(a, b);
CREATE INDEX
postgres=#   INSERT INTO t VALUES('test', NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must succeed
REFRESH MATERIALIZED VIEW
postgres=#   SELECT COUNT(*) FROM m;
  DROP TABLE t CASCADE;
  SELECT COUNT(*) FROM m;  -- must be 2
 count
-------
     2
(1 row)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# -- Test 4: unchanged (1,NULL), index on 'a' should SUCCEED.
postgres=#   CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=#   INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must succeed
REFRESH MATERIALIZED VIEW
postgres=#   SELECT * FROM m;  -- must still show (1,)
 a | b
---+---
 1 |
(1 row)
postgres=#   DROP TABLE t CASCADE;
DROP TABLE

postgres=# -- Test 5: (1,NULL)×2, separate index on a AND b, should ERROR
postgres=#   CREATE TABLE t(a int, b int);
CREATE TABLE
postgres=#   INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#   CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=#   CREATE UNIQUE INDEX ON m(b);
CREATE INDEX
postgres=#   INSERT INTO t VALUES(1, NULL);
INSERT 0 1
postgres=#   REFRESH MATERIALIZED VIEW CONCURRENTLY m;  -- must error
ERROR:  duplicate key value violates unique constraint "m_a_idx"
DETAIL:  Key (a)=(1) already exists.
CONTEXT:  SQL statement "INSERT INTO public.m SELECT (diff.newdata).* FROM
pg_temp_2.pg_temp_16535_2 diff WHERE tid IS NULL"
postgres=#   DROP TABLE t CASCADE;
DROP TABLE
postgres=#

Made some minor changes to bug2 patch too.

Regards,
Surya Poondla


Attachments:

  [application/octet-stream] 0005-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect_bug1.patch (11.4K, 3-0005-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect_bug1.patch)
  download | inline diff:
From f8419fe83c6e6831ff987b53f2bcae2f3173029b Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:22:11 -0800
Subject: [PATCH v5] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY to detect
 duplicate rows

REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly skipping
duplicate detection for rows containing NULL values. The pre-check
query used "WHERE newdata.* IS NOT NULL" which caused rows with any
NULL column to bypass duplicate detection entirely.

The fix rebuilds the pre-check using the same per-column equality
operators used by the diff JOIN (one condition per unique-indexed
column).  These operators treat NULL as not equal to NULL, matching
unique index semantics where NULLs are considered distinct.  Only
rows whose indexed columns are non-null and equal are flagged as
duplicates --- the same rows that would cause join ambiguity.

This correctly handles two cases that the old approach got wrong:
  - (test, NULL) x2 with index on a: a='test' is non-null and
    duplicated, so the duplicate is correctly detected and an error
    is raised.
  - (NULL, NULL) x2 with index on a: a=NULL, and unique indexes
    allow multiple NULLs (each is treated as distinct), so the
    refresh correctly succeeds and updates the view to two rows.

Added regression tests covering both cases.


---
 src/backend/commands/matview.c        | 122 ++++++++++++++++++--------
 src/test/regress/expected/matview.out |  33 ++++++-
 src/test/regress/sql/matview.sql      |  22 +++++
 3 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..2a1694f3b4e 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -606,6 +606,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	ListCell   *indexoidscan;
 	int16		relnatts;
 	Oid		   *opUsedForQual;
+	StringInfoData precheck_cond_buf;
+	bool		precheck_has_cond;
 
 	initStringInfo(&querybuf);
 	matviewRel = table_open(matviewOid, NoLock);
@@ -634,45 +636,6 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	if (SPI_exec(querybuf.data, 0) != SPI_OK_UTILITY)
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
 
-	/*
-	 * We need to ensure that there are not duplicate rows without NULLs in
-	 * the new data set before we can count on the "diff" results.  Check for
-	 * that in a way that allows showing the first duplicated row found.  Even
-	 * after we pass this test, a unique index on the materialized view may
-	 * find a duplicate key problem.
-	 *
-	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack to
-	 * keep ".*" from being expanded into multiple columns in a SELECT list.
-	 * Compare ruleutils.c's get_variable().
-	 */
-	resetStringInfo(&querybuf);
-	appendStringInfo(&querybuf,
-					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
-					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
-					 "newdata.ctid)",
-					 tempname, tempname, tempname);
-	if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT)
-		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
-	if (SPI_processed > 0)
-	{
-		/*
-		 * Note that this ereport() is returning data to the user.  Generally,
-		 * we would want to make sure that the user has been granted access to
-		 * this data.  However, REFRESH MAT VIEW is only able to be run by the
-		 * owner of the mat view (or a superuser) and therefore there is no
-		 * need to check for access to data in the mat view.
-		 */
-		ereport(ERROR,
-				(errcode(ERRCODE_CARDINALITY_VIOLATION),
-				 errmsg("new data for materialized view \"%s\" contains duplicate rows without any null columns",
-						RelationGetRelationName(matviewRel)),
-				 errdetail("Row: %s",
-						   SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1))));
-	}
-
 	/*
 	 * Create the temporary "diff" table.
 	 *
@@ -715,6 +678,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	tupdesc = matviewRel->rd_att;
 	opUsedForQual = palloc0_array(Oid, relnatts);
 	foundUniqueIndex = false;
+	initStringInfo(&precheck_cond_buf);
+	precheck_has_cond = false;
 
 	indexoidlist = RelationGetIndexList(matviewRel);
 
@@ -802,6 +767,30 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 										 rightop, attrtype);
 
 				foundUniqueIndex = true;
+
+				/*
+				 * Also accumulate the same condition for the duplicate-row
+				 * pre-check, comparing newdata2 against newdata (instead of
+				 * newdata against mv).  This lets us detect rows that are
+				 * duplicates with respect to the unique index semantics ---
+				 * i.e. rows whose indexed columns are non-null and equal ---
+				 * without falsely flagging rows whose indexed columns are
+				 * NULL (which unique indexes treat as distinct).
+				 */
+				if (precheck_has_cond)
+					appendStringInfoString(&precheck_cond_buf, " AND ");
+
+				leftop = quote_qualified_identifier("newdata2",
+														NameStr(attr->attname));
+				rightop = quote_qualified_identifier("newdata",
+														 NameStr(attr->attname));
+
+				generate_operator_clause(&precheck_cond_buf,
+										 leftop, attrtype,
+										 op,
+										 rightop, attrtype);
+
+				precheck_has_cond = true;
 			}
 		}
 
@@ -831,6 +820,61 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
+	/*
+	 * Before populating the diff table, check for duplicate rows in the
+	 * new data set.  We look for rows that are equal in all unique index
+	 * columns (using the same operators as the join above, where
+	 * NULL = NULL is false) AND are also *=-equal overall.  Such rows
+	 * would both match the same materialized view row in the join,
+	 * producing an ambiguous diff.
+	 *
+	 * Using index column operators rather than *= alone is important: it
+	 * correctly excludes rows whose indexed columns are NULL, because
+	 * unique indexes treat NULLs as distinct so those rows do not cause
+	 * join ambiguity.
+	 *
+	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack
+	 * to keep ".*" from being expanded into multiple columns in a SELECT
+	 * list.  Compare ruleutils.c's get_variable().
+	 *
+	 * Even after we pass this test, a unique index on the materialized
+	 * view may find a duplicate key problem.
+	 */
+	{
+		StringInfoData precheck_querybuf;
+
+		initStringInfo(&precheck_querybuf);
+		appendStringInfo(&precheck_querybuf,
+						 "SELECT newdata.*::%s FROM %s newdata "
+						 "WHERE EXISTS "
+						 "(SELECT 1 FROM %s newdata2 WHERE %s"
+						 " AND newdata2.* OPERATOR(pg_catalog.*=) newdata.*"
+						 " AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid)",
+						 tempname, tempname, tempname, precheck_cond_buf.data);
+
+		if (SPI_execute(precheck_querybuf.data, false, 1) != SPI_OK_SELECT)
+			elog(ERROR, "SPI_exec failed: %s", precheck_querybuf.data);
+
+		if (SPI_processed > 0)
+		{
+			/*
+			 * Note that this ereport() is returning data to the user.
+			 * Generally, we would want to make sure that the user has been
+			 * granted access to this data.  However, REFRESH MAT VIEW is
+			 * only able to be run by the owner of the mat view (or a
+			 * superuser) and therefore there is no need to check for access
+			 * to data in the mat view.
+			 */
+			ereport(ERROR,
+					(errcode(ERRCODE_CARDINALITY_VIOLATION),
+					 errmsg("new data for materialized view \"%s\" contains duplicate rows",
+							RelationGetRelationName(matviewRel)),
+					 errdetail("Row: %s",
+							   SPI_getvalue(SPI_tuptable->vals[0],
+										   SPI_tuptable->tupdesc, 1))));
+		}
+	}
+
 	/* Populate the temporary "diff" table. */
 	if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT)
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..1dcff26a2e4 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -396,8 +396,39 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 ERROR:  could not create unique index "mvtest_mv_a_idx"
 DETAIL:  Key (a)=(1) is duplicated.
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
-ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows without any null columns
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
 DETAIL:  Row: (1,10)
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that duplicate rows containing NULLs in non-indexed columns are detected
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+ERROR:  could not create unique index "mvtest_mv_a_idx"
+DETAIL:  Key (a)=(test) is duplicated.
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
+DETAIL:  Row: (test,)
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that rows with NULLs in the indexed column are not false positives:
+-- unique indexes treat NULLs as distinct, so (NULL,NULL)x2 is a valid state
+-- and CONCURRENTLY should succeed and update the view to reflect both rows
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+SELECT COUNT(*) FROM mvtest_mv;
+ count 
+-------
+     2
+(1 row)
+
 DROP TABLE mvtest_foo CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that all columns covered by unique indexes works
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..e65cd4fa94b 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -135,6 +135,28 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
 DROP TABLE mvtest_foo CASCADE;
 
+-- test that duplicate rows containing NULLs in non-indexed columns are detected
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
+-- test that rows with NULLs in the indexed column are not false positives:
+-- unique indexes treat NULLs as distinct, so (NULL,NULL)x2 is a valid state
+-- and CONCURRENTLY should succeed and update the view to reflect both rows
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+SELECT COUNT(*) FROM mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
-- 
2.39.5 (Apple Git-154)



  [application/octet-stream] 0005-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performance_bug2.patch (10.0K, 4-0005-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performance_bug2.patch)
  download | inline diff:
From 15fb766ee6cf7245092fabd7c506b880f9f0b6b6 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:23:17 -0800
Subject: [PATCH v5] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY performance
 with nullable indexed columns

When a materialized view has a unique index on a nullable column, the join
condition used to detect changes included NULL = NULL comparisons which
evaluate to NULL (false), causing every row with a NULL indexed value to
appear as changed on every CONCURRENTLY refresh even when the data was
unchanged.

The fix skips nullable columns from the per-column join conditions when the
index has at least one NOT NULL column.  The NOT NULL column(s) are sufficient
to identify matching rows uniquely, so omitting the nullable ones avoids the
unnecessary NULL = NULL comparisons.  The record equality operator (*=),
appended to the join condition unconditionally, handles all columns including
nullable ones and ensures changed rows are still detected correctly.

When all columns of a unique index are nullable, no columns are skipped.
Relying on *= alone in that case would be incorrect: *= treats NULL as equal
to NULL, which can cause join ambiguity or silent data loss when rows contain
all-null values.  Falling back to including all columns preserves the original
(slower but correct) behavior for that edge case.

Added regression tests covering both the optimization path (index with at
least one NOT NULL column) and the fallback path (all-nullable index).
---
 src/backend/commands/matview.c        | 60 +++++++++++++++++++++++++--
 src/test/regress/expected/matview.out | 29 +++++++++++++
 src/test/regress/sql/matview.sql      | 24 +++++++++++
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..b0d7c59444a 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -602,6 +602,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	char	   *nsp;
 	TupleDesc	tupdesc;
 	bool		foundUniqueIndex;
+	bool		addedAnyQuals;
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
 	int16		relnatts;
@@ -715,6 +716,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	tupdesc = matviewRel->rd_att;
 	opUsedForQual = palloc0_array(Oid, relnatts);
 	foundUniqueIndex = false;
+	addedAnyQuals = false;
 
 	indexoidlist = RelationGetIndexList(matviewRel);
 
@@ -731,6 +733,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 			oidvector  *indclass;
 			Datum		indclassDatum;
 			int			i;
+			bool		index_has_nonnull;
+
+			foundUniqueIndex = true;
 
 			/* Must get indclass the hard way. */
 			indclassDatum = SysCacheGetAttrNotNull(INDEXRELID,
@@ -738,6 +743,29 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 												   Anum_pg_index_indclass);
 			indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+			/*
+			 * Pre-scan: check whether this index has at least one NOT NULL
+			 * column.  If it does, we can safely skip nullable columns in
+			 * the join condition below (the NOT NULL column(s) uniquely
+			 * identify rows, so omitting nullable columns does not introduce
+			 * join ambiguity).  If all columns are nullable we must include
+			 * all of them to preserve correctness: skipping everything and
+			 * relying on *= alone can cause join ambiguity or data loss
+			 * because *= treats NULL as equal to NULL.
+			 */
+			index_has_nonnull = false;
+			for (i = 0; i < indnkeyatts; i++)
+			{
+				Form_pg_attribute attr_check =
+					TupleDescAttr(tupdesc, indexStruct->indkey.values[i] - 1);
+
+				if (attr_check->attnotnull)
+				{
+					index_has_nonnull = true;
+					break;
+				}
+			}
+
 			/* Add quals for all columns from this index. */
 			for (i = 0; i < indnkeyatts; i++)
 			{
@@ -753,6 +781,24 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				const char *leftop;
 				const char *rightop;
 
+				/*
+				 * Skip nullable columns when this index has at least one NOT
+				 * NULL column.  Nullable columns in unique indexes allow
+				 * multiple NULLs and NULL = NULL evaluates to NULL (false) in
+				 * the join, making unchanged NULL-containing rows appear as
+				 * changed on every refresh.  The NOT NULL column(s) are
+				 * sufficient to identify matching rows, so we can safely omit
+				 * nullable ones without introducing join ambiguity.
+				 *
+				 * When all columns are nullable we must not skip any of them:
+				 * with no per-column conditions only *= would remain, which
+				 * can cause join ambiguity or data loss (see pre-scan comment
+				 * above).  We fall back to including all columns so the join
+				 * behaves as in the original code.
+				 */
+				if (index_has_nonnull && !attr->attnotnull)
+					continue;
+
 				/*
 				 * Identify the equality operator associated with this index
 				 * column.  First we need to look up the column's opclass.
@@ -788,7 +834,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				/*
 				 * Actually add the qual, ANDed with any others.
 				 */
-				if (foundUniqueIndex)
+				if (addedAnyQuals)
 					appendStringInfoString(&querybuf, " AND ");
 
 				leftop = quote_qualified_identifier("newdata",
@@ -801,7 +847,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 										 op,
 										 rightop, attrtype);
 
-				foundUniqueIndex = true;
+				addedAnyQuals = true;
 			}
 		}
 
@@ -826,8 +872,16 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				errmsg("could not find suitable unique index on materialized view \"%s\"",
 					   RelationGetRelationName(matviewRel)));
 
+	/*
+	 * The record equality operator (*=) is always included in the join
+	 * predicate.  It handles all columns correctly including nullable ones
+	 * that were skipped above, since *=  treats NULL as equal to NULL.
+	 */
+	if (addedAnyQuals)
+		appendStringInfoString(&querybuf, " AND ");
+
 	appendStringInfoString(&querybuf,
-						   " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) "
+						   "newdata.* OPERATOR(pg_catalog.*=) mv.*) "
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..98db1e08185 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -418,6 +418,35 @@ CREATE MATERIALIZED VIEW mvtest_mv2 AS SELECT * FROM mvtest_mv1
   WHERE col1 = (SELECT LEAST(col1) FROM mvtest_mv1) WITH NO DATA;
 DROP MATERIALIZED VIEW mvtest_mv1 CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv2
+-- test that nullable indexed columns are skipped in the join condition when
+-- the index has at least one NOT NULL column.  The NOT NULL column alone
+-- identifies matching rows, so the nullable column does not need to be in
+-- the join and its NULL values do not cause unnecessary DELETE+INSERT churn.
+CREATE TABLE mvtest_foo(a int NOT NULL, b int);
+INSERT INTO mvtest_foo VALUES(1, NULL), (2, NULL), (3, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- a is NOT NULL, b is nullable
+-- Add a second index on the nullable column alone to make it exercise the
+-- all-nullable fallback path through the pre-scan.
+CREATE UNIQUE INDEX ON mvtest_mv(b);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that CONCURRENTLY still works when all indexed columns are nullable
+-- (falls back to including all columns, which is slower but correct).
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, 1);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- all-nullable index
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+SELECT * FROM mvtest_mv;
+ a | b 
+---+---
+   | 1
+(1 row)
+
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that types with unusual equality tests work
 CREATE TABLE mvtest_boxes (id serial primary key, b box);
 INSERT INTO mvtest_boxes (b) VALUES
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..624056a4dc5 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -153,6 +153,30 @@ CREATE MATERIALIZED VIEW mvtest_mv2 AS SELECT * FROM mvtest_mv1
   WHERE col1 = (SELECT LEAST(col1) FROM mvtest_mv1) WITH NO DATA;
 DROP MATERIALIZED VIEW mvtest_mv1 CASCADE;
 
+-- test that nullable indexed columns are skipped in the join condition when
+-- the index has at least one NOT NULL column.  The NOT NULL column alone
+-- identifies matching rows, so the nullable column does not need to be in
+-- the join and its NULL values do not cause unnecessary DELETE+INSERT churn.
+CREATE TABLE mvtest_foo(a int NOT NULL, b int);
+INSERT INTO mvtest_foo VALUES(1, NULL), (2, NULL), (3, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- a is NOT NULL, b is nullable
+-- Add a second index on the nullable column alone to make it exercise the
+-- all-nullable fallback path through the pre-scan.
+CREATE UNIQUE INDEX ON mvtest_mv(b);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+DROP TABLE mvtest_foo CASCADE;
+
+-- test that CONCURRENTLY still works when all indexed columns are nullable
+-- (falls back to including all columns, which is slower but correct).
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, 1);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- all-nullable index
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+SELECT * FROM mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that types with unusual equality tests work
 CREATE TABLE mvtest_boxes (id serial primary key, b box);
 INSERT INTO mvtest_boxes (b) VALUES
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

* Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-02 22:45 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-05 01:01   ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-09 13:57     ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY Rafia Sabih <rafia.pghackers@gmail.com>
  2026-03-12 00:45       ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-12 11:33         ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-13 07:20           ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-20 18:01             ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
  2026-03-21 06:19               ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
  2026-03-25 22:45                 ` Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
@ 2026-05-18 16:48                   ` surya poondla <suryapoondla4@gmail.com>
  0 siblings, 0 replies; 11+ messages in thread

From: surya poondla @ 2026-05-18 16:48 UTC (permalink / raw)
  To: cca5507 <cca5507@qq.com>; +Cc: Rafia Sabih <rafia.pghackers@gmail.com>; Giuliano Gagliardi <gogi@gogi.tv>; pgsql-bugs <pgsql-bugs@lists.postgresql.org>

Hi All,

I rebased the patches to the latest code.

Regards,
Surya Poondla

>


Attachments:

  [application/octet-stream] 0006-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect.patch (11.4K, 3-0006-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect.patch)
  download | inline diff:
From 19873f4a02509a6870623140ed5f63a8dd390b90 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:22:11 -0800
Subject: [PATCH v6] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY to detect
 duplicate rows

REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly skipping
duplicate detection for rows containing NULL values. The pre-check
query used "WHERE newdata.* IS NOT NULL" which caused rows with any
NULL column to bypass duplicate detection entirely.

The fix rebuilds the pre-check using the same per-column equality
operators used by the diff JOIN (one condition per unique-indexed
column).  These operators treat NULL as not equal to NULL, matching
unique index semantics where NULLs are considered distinct.  Only
rows whose indexed columns are non-null and equal are flagged as
duplicates --- the same rows that would cause join ambiguity.

This correctly handles two cases that the old approach got wrong:
  - (test, NULL) x2 with index on a: a='test' is non-null and
    duplicated, so the duplicate is correctly detected and an error
    is raised.
  - (NULL, NULL) x2 with index on a: a=NULL, and unique indexes
    allow multiple NULLs (each is treated as distinct), so the
    refresh correctly succeeds and updates the view to two rows.

Added regression tests covering both cases.
---
 src/backend/commands/matview.c        | 122 ++++++++++++++++++--------
 src/test/regress/expected/matview.out |  33 ++++++-
 src/test/regress/sql/matview.sql      |  22 +++++
 3 files changed, 137 insertions(+), 40 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index f7d8007f796..b806ab46647 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -606,6 +606,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	ListCell   *indexoidscan;
 	int16		relnatts;
 	Oid		   *opUsedForQual;
+	StringInfoData precheck_cond_buf;
+	bool		precheck_has_cond;
 
 	initStringInfo(&querybuf);
 	matviewRel = table_open(matviewOid, NoLock);
@@ -634,45 +636,6 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	if (SPI_exec(querybuf.data, 0) != SPI_OK_UTILITY)
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
 
-	/*
-	 * We need to ensure that there are not duplicate rows without NULLs in
-	 * the new data set before we can count on the "diff" results.  Check for
-	 * that in a way that allows showing the first duplicated row found.  Even
-	 * after we pass this test, a unique index on the materialized view may
-	 * find a duplicate key problem.
-	 *
-	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack to
-	 * keep ".*" from being expanded into multiple columns in a SELECT list.
-	 * Compare ruleutils.c's get_variable().
-	 */
-	resetStringInfo(&querybuf);
-	appendStringInfo(&querybuf,
-					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
-					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
-					 "newdata.ctid)",
-					 tempname, tempname, tempname);
-	if (SPI_execute(querybuf.data, false, 1) != SPI_OK_SELECT)
-		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
-	if (SPI_processed > 0)
-	{
-		/*
-		 * Note that this ereport() is returning data to the user.  Generally,
-		 * we would want to make sure that the user has been granted access to
-		 * this data.  However, REFRESH MAT VIEW is only able to be run by the
-		 * owner of the mat view (or a superuser) and therefore there is no
-		 * need to check for access to data in the mat view.
-		 */
-		ereport(ERROR,
-				(errcode(ERRCODE_CARDINALITY_VIOLATION),
-				 errmsg("new data for materialized view \"%s\" contains duplicate rows without any null columns",
-						RelationGetRelationName(matviewRel)),
-				 errdetail("Row: %s",
-						   SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1))));
-	}
-
 	/*
 	 * Create the temporary "diff" table.
 	 *
@@ -715,6 +678,8 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	tupdesc = matviewRel->rd_att;
 	opUsedForQual = palloc0_array(Oid, relnatts);
 	foundUniqueIndex = false;
+	initStringInfo(&precheck_cond_buf);
+	precheck_has_cond = false;
 
 	indexoidlist = RelationGetIndexList(matviewRel);
 
@@ -802,6 +767,30 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 										 rightop, attrtype);
 
 				foundUniqueIndex = true;
+
+				/*
+				 * Also accumulate the same condition for the duplicate-row
+				 * pre-check, comparing newdata2 against newdata (instead of
+				 * newdata against mv).  This lets us detect rows that are
+				 * duplicates with respect to the unique index semantics ---
+				 * i.e. rows whose indexed columns are non-null and equal ---
+				 * without falsely flagging rows whose indexed columns are
+				 * NULL (which unique indexes treat as distinct).
+				 */
+				if (precheck_has_cond)
+					appendStringInfoString(&precheck_cond_buf, " AND ");
+
+				leftop = quote_qualified_identifier("newdata2",
+														NameStr(attr->attname));
+				rightop = quote_qualified_identifier("newdata",
+														 NameStr(attr->attname));
+
+				generate_operator_clause(&precheck_cond_buf,
+										 leftop, attrtype,
+										 op,
+										 rightop, attrtype);
+
+				precheck_has_cond = true;
 			}
 		}
 
@@ -831,6 +820,61 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
+	/*
+	 * Before populating the diff table, check for duplicate rows in the
+	 * new data set.  We look for rows that are equal in all unique index
+	 * columns (using the same operators as the join above, where
+	 * NULL = NULL is false) AND are also *=-equal overall.  Such rows
+	 * would both match the same materialized view row in the join,
+	 * producing an ambiguous diff.
+	 *
+	 * Using index column operators rather than *= alone is important: it
+	 * correctly excludes rows whose indexed columns are NULL, because
+	 * unique indexes treat NULLs as distinct so those rows do not cause
+	 * join ambiguity.
+	 *
+	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack
+	 * to keep ".*" from being expanded into multiple columns in a SELECT
+	 * list.  Compare ruleutils.c's get_variable().
+	 *
+	 * Even after we pass this test, a unique index on the materialized
+	 * view may find a duplicate key problem.
+	 */
+	{
+		StringInfoData precheck_querybuf;
+
+		initStringInfo(&precheck_querybuf);
+		appendStringInfo(&precheck_querybuf,
+						 "SELECT newdata.*::%s FROM %s newdata "
+						 "WHERE EXISTS "
+						 "(SELECT 1 FROM %s newdata2 WHERE %s"
+						 " AND newdata2.* OPERATOR(pg_catalog.*=) newdata.*"
+						 " AND newdata2.ctid OPERATOR(pg_catalog.<>) newdata.ctid)",
+						 tempname, tempname, tempname, precheck_cond_buf.data);
+
+		if (SPI_execute(precheck_querybuf.data, false, 1) != SPI_OK_SELECT)
+			elog(ERROR, "SPI_exec failed: %s", precheck_querybuf.data);
+
+		if (SPI_processed > 0)
+		{
+			/*
+			 * Note that this ereport() is returning data to the user.
+			 * Generally, we would want to make sure that the user has been
+			 * granted access to this data.  However, REFRESH MAT VIEW is
+			 * only able to be run by the owner of the mat view (or a
+			 * superuser) and therefore there is no need to check for access
+			 * to data in the mat view.
+			 */
+			ereport(ERROR,
+					(errcode(ERRCODE_CARDINALITY_VIOLATION),
+					 errmsg("new data for materialized view \"%s\" contains duplicate rows",
+							RelationGetRelationName(matviewRel)),
+					 errdetail("Row: %s",
+							   SPI_getvalue(SPI_tuptable->vals[0],
+										   SPI_tuptable->tupdesc, 1))));
+		}
+	}
+
 	/* Populate the temporary "diff" table. */
 	if (SPI_exec(querybuf.data, 0) != SPI_OK_INSERT)
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..1dcff26a2e4 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -396,8 +396,39 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 ERROR:  could not create unique index "mvtest_mv_a_idx"
 DETAIL:  Key (a)=(1) is duplicated.
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
-ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows without any null columns
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
 DETAIL:  Row: (1,10)
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that duplicate rows containing NULLs in non-indexed columns are detected
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+ERROR:  could not create unique index "mvtest_mv_a_idx"
+DETAIL:  Key (a)=(test) is duplicated.
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+ERROR:  new data for materialized view "mvtest_mv" contains duplicate rows
+DETAIL:  Row: (test,)
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that rows with NULLs in the indexed column are not false positives:
+-- unique indexes treat NULLs as distinct, so (NULL,NULL)x2 is a valid state
+-- and CONCURRENTLY should succeed and update the view to reflect both rows
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+SELECT COUNT(*) FROM mvtest_mv;
+ count 
+-------
+     2
+(1 row)
+
 DROP TABLE mvtest_foo CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that all columns covered by unique indexes works
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..e65cd4fa94b 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -135,6 +135,28 @@ REFRESH MATERIALIZED VIEW mvtest_mv;
 REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
 DROP TABLE mvtest_foo CASCADE;
 
+-- test that duplicate rows containing NULLs in non-indexed columns are detected
+CREATE TABLE mvtest_foo(a text, b text);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES('test', NULL);
+REFRESH MATERIALIZED VIEW mvtest_mv;
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
+-- test that rows with NULLs in the indexed column are not false positives:
+-- unique indexes treat NULLs as distinct, so (NULL,NULL)x2 is a valid state
+-- and CONCURRENTLY should succeed and update the view to reflect both rows
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);
+INSERT INTO mvtest_foo VALUES(NULL, NULL);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;
+SELECT COUNT(*) FROM mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that all columns covered by unique indexes works
 CREATE TABLE mvtest_foo(a, b, c) AS VALUES(1, 2, 3);
 CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
-- 
2.39.5 (Apple Git-154)



  [application/octet-stream] 0006-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performan.patch (10.0K, 4-0006-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-performan.patch)
  download | inline diff:
From 662da36edbe9b190d78478c4fe128a93728f952f Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Mon, 2 Mar 2026 11:23:17 -0800
Subject: [PATCH v6] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY performance
 with nullable indexed columns

When a materialized view has a unique index on a nullable column, the join
condition used to detect changes included NULL = NULL comparisons which
evaluate to NULL (false), causing every row with a NULL indexed value to
appear as changed on every CONCURRENTLY refresh even when the data was
unchanged.

The fix skips nullable columns from the per-column join conditions when the
index has at least one NOT NULL column.  The NOT NULL column(s) are sufficient
to identify matching rows uniquely, so omitting the nullable ones avoids the
unnecessary NULL = NULL comparisons.  The record equality operator (*=),
appended to the join condition unconditionally, handles all columns including
nullable ones and ensures changed rows are still detected correctly.

When all columns of a unique index are nullable, no columns are skipped.
Relying on *= alone in that case would be incorrect: *= treats NULL as equal
to NULL, which can cause join ambiguity or silent data loss when rows contain
all-null values.  Falling back to including all columns preserves the original
(slower but correct) behavior for that edge case.

Added regression tests covering both the optimization path (index with at
least one NOT NULL column) and the fallback path (all-nullable index).
---
 src/backend/commands/matview.c        | 60 +++++++++++++++++++++++++--
 src/test/regress/expected/matview.out | 29 +++++++++++++
 src/test/regress/sql/matview.sql      | 24 +++++++++++
 3 files changed, 110 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index f7d8007f796..a47504992d6 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -602,6 +602,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	char	   *nsp;
 	TupleDesc	tupdesc;
 	bool		foundUniqueIndex;
+	bool		addedAnyQuals;
 	List	   *indexoidlist;
 	ListCell   *indexoidscan;
 	int16		relnatts;
@@ -715,6 +716,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	tupdesc = matviewRel->rd_att;
 	opUsedForQual = palloc0_array(Oid, relnatts);
 	foundUniqueIndex = false;
+	addedAnyQuals = false;
 
 	indexoidlist = RelationGetIndexList(matviewRel);
 
@@ -731,6 +733,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 			oidvector  *indclass;
 			Datum		indclassDatum;
 			int			i;
+			bool		index_has_nonnull;
+
+			foundUniqueIndex = true;
 
 			/* Must get indclass the hard way. */
 			indclassDatum = SysCacheGetAttrNotNull(INDEXRELID,
@@ -738,6 +743,29 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 												   Anum_pg_index_indclass);
 			indclass = (oidvector *) DatumGetPointer(indclassDatum);
 
+			/*
+			 * Pre-scan: check whether this index has at least one NOT NULL
+			 * column.  If it does, we can safely skip nullable columns in
+			 * the join condition below (the NOT NULL column(s) uniquely
+			 * identify rows, so omitting nullable columns does not introduce
+			 * join ambiguity).  If all columns are nullable we must include
+			 * all of them to preserve correctness: skipping everything and
+			 * relying on *= alone can cause join ambiguity or data loss
+			 * because *= treats NULL as equal to NULL.
+			 */
+			index_has_nonnull = false;
+			for (i = 0; i < indnkeyatts; i++)
+			{
+				Form_pg_attribute attr_check =
+					TupleDescAttr(tupdesc, indexStruct->indkey.values[i] - 1);
+
+				if (attr_check->attnotnull)
+				{
+					index_has_nonnull = true;
+					break;
+				}
+			}
+
 			/* Add quals for all columns from this index. */
 			for (i = 0; i < indnkeyatts; i++)
 			{
@@ -753,6 +781,24 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				const char *leftop;
 				const char *rightop;
 
+				/*
+				 * Skip nullable columns when this index has at least one NOT
+				 * NULL column.  Nullable columns in unique indexes allow
+				 * multiple NULLs and NULL = NULL evaluates to NULL (false) in
+				 * the join, making unchanged NULL-containing rows appear as
+				 * changed on every refresh.  The NOT NULL column(s) are
+				 * sufficient to identify matching rows, so we can safely omit
+				 * nullable ones without introducing join ambiguity.
+				 *
+				 * When all columns are nullable we must not skip any of them:
+				 * with no per-column conditions only *= would remain, which
+				 * can cause join ambiguity or data loss (see pre-scan comment
+				 * above).  We fall back to including all columns so the join
+				 * behaves as in the original code.
+				 */
+				if (index_has_nonnull && !attr->attnotnull)
+					continue;
+
 				/*
 				 * Identify the equality operator associated with this index
 				 * column.  First we need to look up the column's opclass.
@@ -788,7 +834,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				/*
 				 * Actually add the qual, ANDed with any others.
 				 */
-				if (foundUniqueIndex)
+				if (addedAnyQuals)
 					appendStringInfoString(&querybuf, " AND ");
 
 				leftop = quote_qualified_identifier("newdata",
@@ -801,7 +847,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 										 op,
 										 rightop, attrtype);
 
-				foundUniqueIndex = true;
+				addedAnyQuals = true;
 			}
 		}
 
@@ -826,8 +872,16 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 				errmsg("could not find suitable unique index on materialized view \"%s\"",
 					   RelationGetRelationName(matviewRel)));
 
+	/*
+	 * The record equality operator (*=) is always included in the join
+	 * predicate.  It handles all columns correctly including nullable ones
+	 * that were skipped above, since *=  treats NULL as equal to NULL.
+	 */
+	if (addedAnyQuals)
+		appendStringInfoString(&querybuf, " AND ");
+
 	appendStringInfoString(&querybuf,
-						   " AND newdata.* OPERATOR(pg_catalog.*=) mv.*) "
+						   "newdata.* OPERATOR(pg_catalog.*=) mv.*) "
 						   "WHERE newdata.* IS NULL OR mv.* IS NULL "
 						   "ORDER BY tid");
 
diff --git a/src/test/regress/expected/matview.out b/src/test/regress/expected/matview.out
index 0355720dfc6..98db1e08185 100644
--- a/src/test/regress/expected/matview.out
+++ b/src/test/regress/expected/matview.out
@@ -418,6 +418,35 @@ CREATE MATERIALIZED VIEW mvtest_mv2 AS SELECT * FROM mvtest_mv1
   WHERE col1 = (SELECT LEAST(col1) FROM mvtest_mv1) WITH NO DATA;
 DROP MATERIALIZED VIEW mvtest_mv1 CASCADE;
 NOTICE:  drop cascades to materialized view mvtest_mv2
+-- test that nullable indexed columns are skipped in the join condition when
+-- the index has at least one NOT NULL column.  The NOT NULL column alone
+-- identifies matching rows, so the nullable column does not need to be in
+-- the join and its NULL values do not cause unnecessary DELETE+INSERT churn.
+CREATE TABLE mvtest_foo(a int NOT NULL, b int);
+INSERT INTO mvtest_foo VALUES(1, NULL), (2, NULL), (3, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- a is NOT NULL, b is nullable
+-- Add a second index on the nullable column alone to make it exercise the
+-- all-nullable fallback path through the pre-scan.
+CREATE UNIQUE INDEX ON mvtest_mv(b);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
+-- test that CONCURRENTLY still works when all indexed columns are nullable
+-- (falls back to including all columns, which is slower but correct).
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, 1);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- all-nullable index
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+SELECT * FROM mvtest_mv;
+ a | b 
+---+---
+   | 1
+(1 row)
+
+DROP TABLE mvtest_foo CASCADE;
+NOTICE:  drop cascades to materialized view mvtest_mv
 -- make sure that types with unusual equality tests work
 CREATE TABLE mvtest_boxes (id serial primary key, b box);
 INSERT INTO mvtest_boxes (b) VALUES
diff --git a/src/test/regress/sql/matview.sql b/src/test/regress/sql/matview.sql
index 934426b9ae8..624056a4dc5 100644
--- a/src/test/regress/sql/matview.sql
+++ b/src/test/regress/sql/matview.sql
@@ -153,6 +153,30 @@ CREATE MATERIALIZED VIEW mvtest_mv2 AS SELECT * FROM mvtest_mv1
   WHERE col1 = (SELECT LEAST(col1) FROM mvtest_mv1) WITH NO DATA;
 DROP MATERIALIZED VIEW mvtest_mv1 CASCADE;
 
+-- test that nullable indexed columns are skipped in the join condition when
+-- the index has at least one NOT NULL column.  The NOT NULL column alone
+-- identifies matching rows, so the nullable column does not need to be in
+-- the join and its NULL values do not cause unnecessary DELETE+INSERT churn.
+CREATE TABLE mvtest_foo(a int NOT NULL, b int);
+INSERT INTO mvtest_foo VALUES(1, NULL), (2, NULL), (3, NULL);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- a is NOT NULL, b is nullable
+-- Add a second index on the nullable column alone to make it exercise the
+-- all-nullable fallback path through the pre-scan.
+CREATE UNIQUE INDEX ON mvtest_mv(b);
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+DROP TABLE mvtest_foo CASCADE;
+
+-- test that CONCURRENTLY still works when all indexed columns are nullable
+-- (falls back to including all columns, which is slower but correct).
+CREATE TABLE mvtest_foo(a int, b int);
+INSERT INTO mvtest_foo VALUES(NULL, 1);
+CREATE MATERIALIZED VIEW mvtest_mv AS SELECT * FROM mvtest_foo;
+CREATE UNIQUE INDEX ON mvtest_mv(a);  -- all-nullable index
+REFRESH MATERIALIZED VIEW CONCURRENTLY mvtest_mv;  -- must succeed
+SELECT * FROM mvtest_mv;
+DROP TABLE mvtest_foo CASCADE;
+
 -- make sure that types with unusual equality tests work
 CREATE TABLE mvtest_boxes (id serial primary key, b box);
 INSERT INTO mvtest_boxes (b) VALUES
-- 
2.39.5 (Apple Git-154)



^ permalink  raw  reply  [nested|flat] 11+ messages in thread

end of thread, other threads:[~2026-05-18 16:48 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-02-18 22:32 Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY surya poondla <suryapoondla4@gmail.com>
2026-03-02 22:45 ` surya poondla <suryapoondla4@gmail.com>
2026-03-05 01:01   ` Rafia Sabih <rafia.pghackers@gmail.com>
2026-03-09 13:57     ` Rafia Sabih <rafia.pghackers@gmail.com>
2026-03-12 00:45       ` surya poondla <suryapoondla4@gmail.com>
2026-03-12 11:33         ` =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
2026-03-13 07:20           ` =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
2026-03-20 18:01             ` surya poondla <suryapoondla4@gmail.com>
2026-03-21 06:19               ` =?utf-8?B?Y2NhNTUwNw==?= <cca5507@qq.com>
2026-03-25 22:45                 ` surya poondla <suryapoondla4@gmail.com>
2026-05-18 16:48                   ` surya poondla <suryapoondla4@gmail.com>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox