Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wOafR-0001HD-1f for pgsql-bugs@arkaria.postgresql.org; Sun, 17 May 2026 12:32:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wOaeO-000675-16 for pgsql-bugs@arkaria.postgresql.org; Sun, 17 May 2026 12:31:21 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wOBwF-002fnQ-1H for pgsql-bugs@lists.postgresql.org; Sat, 16 May 2026 10:08:07 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wOBwC-00000000pny-2CaT for pgsql-bugs@lists.postgresql.org; Sat, 16 May 2026 10:08:06 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:Cc:From:To:Subject: Content-Transfer-Encoding:MIME-Version:Content-Type:Sender:Content-ID: Content-Description:In-Reply-To:References; bh=afTvN2lpKv/4w9XabHCdLPFuTvn8kvQDSixmwC+oj/Q=; b=R0WH4bTlpSPNbKqwhBpG3+tgnL fSA45Lxt+LzhWWVr54wp4u9AqTXd/81qBwZgIhuW0cco8/1T4mIcSJgEC7pw7x8Q+C6CbexgHh/Uw 01Y4e2chY6plc20d3yPMjQmlbxVC6NnUUysiOerTeZroo6DnU5V/I2jcBklN0MT0coCELhbD7evVk Ev23125qB/3HEtCfDk+hEMCQYsPpY0ObJqQy8OwLipTwHS/eEUMYC0zXGXzrWs11dTpB4gDNx9G61 5Un/7X68QyyG0/fVXHmXHvN9T8V5qKoIfHlUVDIyFmoPrLtqsd9Qj8/vu0VASJauwVGLUOjgWjGae GH0ri5EA==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wOBwC-002SdF-07 for pgsql-bugs@lists.postgresql.org; Sat, 16 May 2026 10:08:04 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wOBwA-004zke-33 for pgsql-bugs@lists.postgresql.org; Sat, 16 May 2026 10:08:02 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI To: pgsql-bugs@lists.postgresql.org From: PG Bug reporting form Cc: yankairong@ruc.edu.cn Reply-To: yankairong@ruc.edu.cn, pgsql-bugs@lists.postgresql.org Date: Sat, 16 May 2026 10:07:20 +0000 Message-ID: <19481-4afc34accc9043fe@postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk The following bug has been logged on the website: Bug reference: 19481 Logged by: muyehu Email address: yankairong@ruc.edu.cn PostgreSQL version: 18.4 Operating system: ubuntu22.04 Description: =20 I found what appears to be a planner selectivity issue involving multivariate MCV statistics on expressions. I can reproduce a case where expression MCV stats are applied for a plain scan, but apparently not for an equivalent predicate when the same relation appears on the nullable side of a LEFT JOIN. Tested version: PostgreSQL 18.4 on x86_64-pc-linux-gnu Minimal self-contained repro: DROP TABLE IF EXISTS mcv_bug; CREATE TABLE mcv_bug (a int, b int, c int); INSERT INTO mcv_bug SELECT i, i, i FROM generate_series(1,1000) AS g(i); CREATE STATISTICS mcv_bug_stats (mcv) ON (coalesce(mod(a,20),1)), (coalesce(mod(b,10),1)), (coalesce(mod(c,5),1)) FROM mcv_bug; ANALYZE mcv_bug; EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM mcv_bug WHERE coalesce(mod(a,20),1) =3D 1 AND coalesce(mod(b,10),1) =3D 1 AND coalesce(mod(c,5),1) =3D 1; EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM (VALUES (1)) AS d(x) LEFT JOIN mcv_bug m ON true WHERE coalesce(mod(m.a,20),1) =3D 1 AND coalesce(mod(m.b,10),1) =3D 1 AND coalesce(mod(m.c,5),1) =3D 1; Observed output on my system: Seq Scan on public.mcv_bug (cost=3D0.00..31.00 rows=3D50 width=3D12) (actu= al time=3D0.013..0.074 rows=3D50 loops=3D1) Output: a, b, c Filter: ((COALESCE(mod(mcv_bug.a, 20), 1) =3D 1) AND (COALESCE(mod(mcv_bug.b, 10), 1) =3D 1) AND (COALESCE(mod(mcv_bug.c, 5), 1)= =3D 1)) Rows Removed by Filter: 950 Nested Loop Left Join (cost=3D0.00..41.01 rows=3D1 width=3D16) (actual time=3D0.013..0.140 rows=3D50 loops=3D1) Output: 1, m.a, m.b, m.c Filter: ((COALESCE(mod(m.a, 20), 1) =3D 1) AND (COALESCE(mod(m.b, 10), 1)= =3D 1) AND (COALESCE(mod(m.c, 5), 1) =3D 1)) Rows Removed by Filter: 950 -> Result (cost=3D0.00..0.01 rows=3D1 width=3D0) (actual time=3D0.001..= 0.001 rows=3D1 loops=3D1) -> Seq Scan on public.mcv_bug m (cost=3D0.00..16.00 rows=3D1000 width= =3D12) (actual time=3D0.010..0.065 rows=3D1000 loops=3D1) Output: m.a, m.b, m.c What seems wrong is that the plain query gets the expected estimate (rows=3D50, matching the actual result), but the LEFT JOIN variant with the same filter conditions falls back to a severe underestimate (rows=3D1, while the actual result is 50). Expected behavior: I would expect the second query to receive a selectivity estimate comparable to the first one, or at least to continue benefiting from the same expression MCV statistics, since the predicate expressions are otherwise equivalent. Possible implementation hint: This looks related to expression matching for extended statistics after outer-join nullability decoration. In selfuncs.c, examine_variable() already has logic and comments about stripping nullingrels before trying to match expressions to extended statistics or expression indexes. However, the extended-statistics path used for clause or expression matching appears to still rely on direct expression equality in a way that no longer matches once Vars on the nullable side of an outer join carry nullingrel markings. This may be related to commit e28033fe1af8037e0fec8bb3a32fabbe18ac06b1 ("Ignore nullingrels when looking up statistics"), which appears to fix a closely related class of nullingrels-related statistics lookup issues. However, the attached testcase still reproduces on PostgreSQL 18.4, suggesting there may be a remaining gap specifically in multivariate expression MCV clause matching on the nullable side of an outer join. So this may be a residual nullingrels issue specific to multivariate expression MCV matching. If useful, I can test a patch or provide additional reduced cases. ## Reduced SQL Testcase Compared to the original repro, this version removes the first `ANALYZE` because the later `ANALYZE mcv_bug;` is sufficient to collect both regular and extended statistics after `CREATE STATISTICS`. ```sql DROP TABLE IF EXISTS mcv_bug; CREATE TABLE mcv_bug (a int, b int, c int); INSERT INTO mcv_bug SELECT i, i, i FROM generate_series(1,1000) AS g(i); CREATE STATISTICS mcv_bug_stats (mcv) ON (coalesce(mod(a,20),1)), (coalesce(mod(b,10),1)), (coalesce(mod(c,5),1)) FROM mcv_bug; ANALYZE mcv_bug; EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM mcv_bug WHERE coalesce(mod(a,20),1) =3D 1 AND coalesce(mod(b,10),1) =3D 1 AND coalesce(mod(c,5),1) =3D 1; EXPLAIN (ANALYZE, VERBOSE) SELECT * FROM (VALUES (1)) AS d(x) LEFT JOIN mcv_bug m ON true WHERE coalesce(mod(m.a,20),1) =3D 1 AND coalesce(mod(m.b,10),1) =3D 1 AND coalesce(mod(m.c,5),1) =3D 1; ``` ## Short Summary The issue still reproduces on PostgreSQL 18.4: - Plain scan estimate: `rows=3D50`, actual `rows=3D50` - `LEFT JOIN` variant estimate: `rows=3D1`, actual `rows=3D50` That strongly suggests the extended expression MCV stats are still not being matched for equivalent predicates on the nullable side of the outer join.