Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY

public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed

From: surya poondla <suryapoondla4@gmail.com>
To: Giuliano Gagliardi <gogi@gogi.tv>
Cc: pgsql-bugs@lists.postgresql.org
Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
Date: Wed, 11 Feb 2026 16:34:25 -0800
Message-ID: <CAOVWO5rWkGXx37tpsAuTwrY5muPePHG44ke0xwYcakWVfhkPaw@mail.gmail.com> (raw)
In-Reply-To: <CAOVWO5p7PieGTm3GJC8NAYgSEQLoBdZw9Se4u=EbPrr1mW7d5Q@mail.gmail.com>
References: <40d694df-39fd-4a4a-9459-9d6489165f60@gogi.tv>
	<CAOVWO5p7PieGTm3GJC8NAYgSEQLoBdZw9Se4u=EbPrr1mW7d5Q@mail.gmail.com>

Hi Giuliano,

Regarding the issue 1,
>
> I noticed the following two issues while looking at the code that handles
> REFRESH MATERIALIZED VIEW CONCURRENTLY (refresh_by_match_merge() in
> matview.c):
>
> 1.
>
> At the beginning of the function, there is some code that checks for
> duplicate
> rows, but it does not catch the following case:
>
> CREATE TABLE t(a text, b text);
> INSERT INTO t VALUES('test', null);
> CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
> CREATE UNIQUE INDEX ON m(a);
> INSERT INTO t VALUES('test', null); -- t now contains two identical rows
>
> REFRESH MATERIALIZED VIEW CONCURRENTLY m;
>      -> no error, but m still contains only one row!
> REFRESH MATERIALIZED VIEW m;
>      -> error (as expected)
>
>
Thank you for the pointers, I made a patch in refresh_by_match_merge()
which reports an error in the REFRESH MATERIALIZED VIEW CONCURRENTLY case
too.

The issue was REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly
skipping duplicate detection for rows containing any NULL values. This was
happening because the "WHERE newdata.* IS NOT NULL" condition returns false
if any column contains NULL.
My patch removes the "IS NOT NULL" preconditions from the duplicate
detection query. The query now correctly checks all rows using the record
equality operator (*=), which treats NULL as equal to NULL (i.e True).

Here is the output with my patch:
postgres=# CREATE TABLE t(a text, b text);
CREATE TABLE
postgres=#   INSERT INTO t VALUES('test', null);
INSERT 0 1
postgres=#
postgres=#   CREATE MATERIALIZED VIEW m AS SELECT * FROM t;
SELECT 1
postgres=#
postgres=# CREATE UNIQUE INDEX ON m(a);
CREATE INDEX
postgres=# SELECT * FROM m;
  a   | b
------+---
 test |
(1 row)

postgres=# SELECT * FROM t;
  a   | b
------+---
 test |
(1 row)

postgres=# INSERT INTO t VALUES('test', null);
INSERT 0 1
postgres=# SELECT * FROM t;
  a   | b
------+---
 test |
 test |
(2 rows)

postgres=# SELECT * FROM m;
  a   | b
------+---
 test |
(1 row)

postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY m;
2026-02-11 15:57:46.751 PST [39510] ERROR:  new data for materialized view
"m" contains duplicate rows
2026-02-11 15:57:46.751 PST [39510] DETAIL:  Row: (test,)
2026-02-11 15:57:46.751 PST [39510] STATEMENT:  REFRESH MATERIALIZED VIEW
CONCURRENTLY m;
ERROR:  new data for materialized view "m" contains duplicate rows
DETAIL:  Row: (test,)
postgres=#
postgres=# REFRESH MATERIALIZED VIEW m;
2026-02-11 15:57:55.877 PST [39510] ERROR:  could not create unique index
"m_a_idx"
2026-02-11 15:57:55.877 PST [39510] DETAIL:  Key (a)=(test) is duplicated.
2026-02-11 15:57:55.877 PST [39510] STATEMENT:  REFRESH MATERIALIZED VIEW m;
ERROR:  could not create unique index "m_a_idx"
DETAIL:  Key (a)=(test) is duplicated.
postgres=#

Regards,
Surya Poondla

Attachments:

  [application/octet-stream] 0001-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect.patch (3.9K, 3-0001-Fix-REFRESH-MATERIALIZED-VIEW-CONCURRENTLY-to-detect.patch)
  download | inline diff:
From d26e084ed00cde21d917cd4dc52f2ab1467fdc19 Mon Sep 17 00:00:00 2001
From: spoondla <s_poondla@apple.com>
Date: Wed, 11 Feb 2026 16:06:56 -0800
Subject: [PATCH] Fix REFRESH MATERIALIZED VIEW CONCURRENTLY to detect NULL
 containing duplicates.

Issue:
REFRESH MATERIALIZED VIEW CONCURRENTLY was incorrectly skipping duplicate
detection for rows containing any NULL values. This would cause the refresh
to silently succeed while leaving the materialized view with stale data,
rather than properly reporting duplicate row errors.

The bug occurred because the duplicate check used "WHERE newdata.* IS NOT NULL"
which returns false if any column contains NULL. When duplicates existed in
rows with NULLs (e.g., two rows of ('test', NULL)), the check was skipped
entirely. The subsequent FULL OUTER JOIN would then match both duplicate rows
to the same old row, producing an empty diff, causing no updates to be applied.

Fix:
The fix removes the "IS NOT NULL" preconditions from the duplicate detection
query. The query now correctly checks all rows using the record equality
operator (*=), which treats NULL as equal to NULL. This matches the same
equality semantics used by the FULL OUTER JOIN in the diff query, ensuring
consistent duplicate detection.

Note:
The non-concurrent REFRESH was unaffected since it rebuilds indexes from
scratch, which properly detects duplicates during index creation.
---
 src/backend/commands/matview.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 81a55a33ef2..6aaf7a68c20 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -635,11 +635,13 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		elog(ERROR, "SPI_exec failed: %s", querybuf.data);

 	/*
-	 * We need to ensure that there are not duplicate rows without NULLs in
-	 * the new data set before we can count on the "diff" results.  Check for
-	 * that in a way that allows showing the first duplicated row found.  Even
-	 * after we pass this test, a unique index on the materialized view may
-	 * find a duplicate key problem.
+	 * We need to ensure that there are not duplicate rows in the new data set
+	 * before we can count on the "diff" results.  Check for that in a way
+	 * that allows showing the first duplicated row found.  We check for
+	 * duplicates using the record equality operator (*=), which treats NULLs
+	 * as equal to each other - the same semantics used by the FULL OUTER JOIN
+	 * in the diff query below.  Even after we pass this test, a unique index
+	 * on the materialized view may find a duplicate key problem.
 	 *
 	 * Note: here and below, we use "tablename.*::tablerowtype" as a hack to
 	 * keep ".*" from being expanded into multiple columns in a SELECT list.
@@ -648,9 +650,9 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 	resetStringInfo(&querybuf);
 	appendStringInfo(&querybuf,
 					 "SELECT newdata.*::%s FROM %s newdata "
-					 "WHERE newdata.* IS NOT NULL AND EXISTS "
-					 "(SELECT 1 FROM %s newdata2 WHERE newdata2.* IS NOT NULL "
-					 "AND newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
+					 "WHERE EXISTS "
+					 "(SELECT 1 FROM %s newdata2 "
+					 "WHERE newdata2.* OPERATOR(pg_catalog.*=) newdata.* "
 					 "AND newdata2.ctid OPERATOR(pg_catalog.<>) "
 					 "newdata.ctid)",
 					 tempname, tempname, tempname);
@@ -667,7 +669,7 @@ refresh_by_match_merge(Oid matviewOid, Oid tempOid, Oid relowner,
 		 */
 		ereport(ERROR,
 				(errcode(ERRCODE_CARDINALITY_VIOLATION),
-				 errmsg("new data for materialized view \"%s\" contains duplicate rows without any null columns",
+				 errmsg("new data for materialized view \"%s\" contains duplicate rows",
 						RelationGetRelationName(matviewRel)),
 				 errdetail("Row: %s",
 						   SPI_getvalue(SPI_tuptable->vals[0], SPI_tuptable->tupdesc, 1))));
-- 
2.39.5 (Apple Git-154)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: suryapoondla4@gmail.com, gogi@gogi.tv, pgsql-bugs@lists.postgresql.org
  Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  In-Reply-To: <CAOVWO5rWkGXx37tpsAuTwrY5muPePHG44ke0xwYcakWVfhkPaw@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox