public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed
From: surya poondla <suryapoondla4@gmail.com>
To: cca5507 <cca5507@qq.com>
Cc: Rafia Sabih <rafia.pghackers@gmail.com>
Cc: Giuliano Gagliardi <gogi@gogi.tv>
Cc: pgsql-bugs <pgsql-bugs@lists.postgresql.org>
Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
Date: Fri, 20 Mar 2026 11:01:22 -0700
Message-ID: <CAOVWO5pQ4jyR1M6XYkjkg7c1KshrzaG-9mng7AGBGghZpYKR-Q@mail.gmail.com> (raw)
In-Reply-To: <tencent_7FA4F0B1D63BE55A8608EFF600CD4E371405@qq.com>
References: <40d694df-39fd-4a4a-9459-9d6489165f60@gogi.tv>
	<CAOVWO5p7PieGTm3GJC8NAYgSEQLoBdZw9Se4u=EbPrr1mW7d5Q@mail.gmail.com>
	<CAOVWO5rWkGXx37tpsAuTwrY5muPePHG44ke0xwYcakWVfhkPaw@mail.gmail.com>
	<CAOVWO5qchazoL8H+-d4eZ_zm8jFUH4YF3WvWnXqcDEh8i7hbhw@mail.gmail.com>
	<CAOVWO5qF2_FV7M=USdSgjjPO124YMpw7TY+M4+mnkNPOA2Bstg@mail.gmail.com>
	<CA+FpmFe7pgJRi16zP8Liz5vuUkVdTnPy30Zr=mXzZA3shKgvCw@mail.gmail.com>
	<CAOVWO5rfJGpxRDn496gAHmAxW7L_72==KkOcz0q16YOvqm4-=g@mail.gmail.com>
	<CAOVWO5p2U8rMvh3pJ4WaNN7ES4SbiEKXLV_0pZ-UdYp0kACa7g@mail.gmail.com>
	<CA+FpmFeLJDWZSx1TSRVodFbQOLWa-SXBCiG0rspu--HtadaJ+w@mail.gmail.com>
	<CA+FpmFdCtzBO56sAg9hdJ79gALNTfBDc0OxLah2wLAimeORt0A@mail.gmail.com>
	<CAOVWO5ouwJTgGsXguP=pZn6wB2fL+xErK9+pdg98vCApq7F10g@mail.gmail.com>
	<tencent_175CBC8C83733BB7D09242AFDAD814749907@qq.com>
	<tencent_7FA4F0B1D63BE55A8608EFF600CD4E371405@qq.com>

Hi ChangAo,

Thank you for the detailed review.

For issue 1, my fix removes the IS NOT NULL guard from the pre-check so
that *= can detect all duplicate rows, including those containing NULLs.
(Note: The semantics of *=  has always treated NULL as equal to NULL.)

The reasoning is straightforward: the JOIN uses *= to match newdata rows
against MV rows. If newdata contains two *=-equal rows, both would match
the same MV row in the JOIN, producing a wrong diff. The pre-check must
therefore use the same *= semantics to catch exactly those duplicates
which is what my fix does by removing the IS NOT NULL guard.
The IS NOT NULL guard was the bug as it was hiding real duplicates from
detection.

Your approach leaves the pre-check unchanged and instead replaces *= in
the JOIN with record_image_eq_variant (NULL != NULL). I see two concerns:
1. record_image_eq_variant applies NULL != NULL globally to all rows in
   the JOIN, not just duplicate ones. This means any unchanged row
   containing any NULL in any column will never match its counterpart
   during the JOIN, causing a DELETE + INSERT for that row on every
   refresh even when the data has not changed. The original issue 2 was
   specifically about nullable indexed columns, your fix extends the
   performance problem to all nullable columns anywhere in the row,
   which makes the performance worse than issue 2.
2. The error surfaced becomes a unique_violation from the index rather
   than the explicit "contains duplicate rows" message, which is harder
   for users to diagnose.

Regards,
Surya Poondla


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: suryapoondla4@gmail.com, cca5507@qq.com, rafia.pghackers@gmail.com, gogi@gogi.tv, pgsql-bugs@lists.postgresql.org
  Subject: Re: Two issues with REFRESH MATERIALIZED VIEW CONCURRENTLY
  In-Reply-To: <CAOVWO5pQ4jyR1M6XYkjkg7c1KshrzaG-9mng7AGBGghZpYKR-Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox