public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed
BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI
2+ messages / 2 participants
[nested] [flat]

* BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI
@ 2026-05-16 10:07 PG Bug reporting form <noreply@postgresql.org>
  2026-05-18 06:40 ` Re: BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI Andrei Lepikhov <lepihov@gmail.com>
  0 siblings, 1 reply; 2+ messages in thread

From: PG Bug reporting form @ 2026-05-16 10:07 UTC (permalink / raw)
  To: pgsql-bugs@lists.postgresql.org; +Cc: yankairong@ruc.edu.cn

The following bug has been logged on the website:

Bug reference:      19481
Logged by:          muyehu
Email address:      yankairong@ruc.edu.cn
PostgreSQL version: 18.4
Operating system:   ubuntu22.04
Description:        

I found what appears to be a planner selectivity issue involving
multivariate MCV statistics on expressions.

I can reproduce a case where expression MCV stats are applied for a plain
scan, but apparently not for an equivalent predicate when the same relation
appears on the nullable side of a LEFT JOIN.

Tested version:

PostgreSQL 18.4 on x86_64-pc-linux-gnu

Minimal self-contained repro:

DROP TABLE IF EXISTS mcv_bug;
CREATE TABLE mcv_bug (a int, b int, c int);
INSERT INTO mcv_bug
SELECT i, i, i
FROM generate_series(1,1000) AS g(i);

CREATE STATISTICS mcv_bug_stats (mcv)
ON (coalesce(mod(a,20),1)),
   (coalesce(mod(b,10),1)),
   (coalesce(mod(c,5),1))
FROM mcv_bug;

ANALYZE mcv_bug;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM mcv_bug
WHERE coalesce(mod(a,20),1) = 1
  AND coalesce(mod(b,10),1) = 1
  AND coalesce(mod(c,5),1) = 1;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM (VALUES (1)) AS d(x)
LEFT JOIN mcv_bug m ON true
WHERE coalesce(mod(m.a,20),1) = 1
  AND coalesce(mod(m.b,10),1) = 1
  AND coalesce(mod(m.c,5),1) = 1;

Observed output on my system:

Seq Scan on public.mcv_bug  (cost=0.00..31.00 rows=50 width=12) (actual
time=0.013..0.074 rows=50 loops=1)
  Output: a, b, c
  Filter: ((COALESCE(mod(mcv_bug.a, 20), 1) = 1) AND
(COALESCE(mod(mcv_bug.b, 10), 1) = 1) AND (COALESCE(mod(mcv_bug.c, 5), 1) =
1))
  Rows Removed by Filter: 950

Nested Loop Left Join  (cost=0.00..41.01 rows=1 width=16) (actual
time=0.013..0.140 rows=50 loops=1)
  Output: 1, m.a, m.b, m.c
  Filter: ((COALESCE(mod(m.a, 20), 1) = 1) AND (COALESCE(mod(m.b, 10), 1) =
1) AND (COALESCE(mod(m.c, 5), 1) = 1))
  Rows Removed by Filter: 950
  ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001
rows=1 loops=1)
  ->  Seq Scan on public.mcv_bug m  (cost=0.00..16.00 rows=1000 width=12)
(actual time=0.010..0.065 rows=1000 loops=1)
        Output: m.a, m.b, m.c

What seems wrong is that the plain query gets the expected estimate
(rows=50, matching the actual result), but the LEFT JOIN variant with the
same filter conditions falls back to a severe underestimate (rows=1, while
the actual result is 50).

Expected behavior:

I would expect the second query to receive a selectivity estimate comparable
to the first one, or at least to continue benefiting from the same
expression MCV statistics, since the predicate expressions are otherwise
equivalent.

Possible implementation hint:

This looks related to expression matching for extended statistics after
outer-join nullability decoration. In selfuncs.c, examine_variable() already
has logic and comments about stripping nullingrels before trying to match
expressions to extended statistics or expression indexes. However, the
extended-statistics path used for clause or expression matching appears to
still rely on direct expression equality in a way that no longer matches
once Vars on the nullable side of an outer join carry nullingrel markings.

This may be related to commit e28033fe1af8037e0fec8bb3a32fabbe18ac06b1
("Ignore nullingrels when looking up statistics"), which appears to fix a
closely related class of nullingrels-related statistics lookup issues.
However, the attached testcase still reproduces on PostgreSQL 18.4,
suggesting there may be a remaining gap specifically in multivariate
expression MCV clause matching on the nullable side of an outer join.

So this may be a residual nullingrels issue specific to multivariate
expression MCV matching.

If useful, I can test a patch or provide additional reduced cases.

## Reduced SQL Testcase

Compared to the original repro, this version removes the first `ANALYZE`
because the later `ANALYZE mcv_bug;` is sufficient to collect both regular
and extended statistics after `CREATE STATISTICS`.

```sql
DROP TABLE IF EXISTS mcv_bug;
CREATE TABLE mcv_bug (a int, b int, c int);

INSERT INTO mcv_bug
SELECT i, i, i
FROM generate_series(1,1000) AS g(i);

CREATE STATISTICS mcv_bug_stats (mcv)
ON (coalesce(mod(a,20),1)),
   (coalesce(mod(b,10),1)),
   (coalesce(mod(c,5),1))
FROM mcv_bug;

ANALYZE mcv_bug;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM mcv_bug
WHERE coalesce(mod(a,20),1) = 1
  AND coalesce(mod(b,10),1) = 1
  AND coalesce(mod(c,5),1) = 1;

EXPLAIN (ANALYZE, VERBOSE)
SELECT *
FROM (VALUES (1)) AS d(x)
LEFT JOIN mcv_bug m ON true
WHERE coalesce(mod(m.a,20),1) = 1
  AND coalesce(mod(m.b,10),1) = 1
  AND coalesce(mod(m.c,5),1) = 1;
```

## Short Summary

The issue still reproduces on PostgreSQL 18.4:

- Plain scan estimate: `rows=50`, actual `rows=50`
- `LEFT JOIN` variant estimate: `rows=1`, actual `rows=50`

That strongly suggests the extended expression MCV stats are still not being
matched for equivalent predicates on the nullable side of the outer join.








^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI
  2026-05-16 10:07 BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI PG Bug reporting form <noreply@postgresql.org>
@ 2026-05-18 06:40 ` Andrei Lepikhov <lepihov@gmail.com>
  0 siblings, 0 replies; 2+ messages in thread

From: Andrei Lepikhov @ 2026-05-18 06:40 UTC (permalink / raw)
  To: yankairong@ruc.edu.cn; pgsql-bugs@lists.postgresql.org; PG Bug reporting form <noreply@postgresql.org>

On 16/05/2026 12:07, PG Bug reporting form wrote:
> That strongly suggests the extended expression MCV stats are still not being
> matched for equivalent predicates on the nullable side of the outer join.
The query explain tells the whole story:

 Nested Loop Left Join
   Filter: ((COALESCE(mod(m.a, 20), 1) = 1) AND (COALESCE(mod(m.b, 10), 1) = 1)
	    AND (COALESCE(mod(m.c, 5), 1) = 1))
   ->  Result
   ->  Seq Scan on mcv_bug m

As you can see, this clause is a JOIN clause. Your coalesce filter applies to
the result of the join. But m.a,m.b, and m.c might be changed (nullified) in
this join. So, your extended statistic isn't applicable here. To be relevant, it
should calculate the number of not-matched LHS tuples and account for them.

In your case, the ON clause is 'true', so all tuples will be matched. It is a
degenerate case and might potentially be improved, but it doesn't look like a bug.

-- 
regards, Andrei Lepikhov,
pgEdge






^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2026-05-18 06:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-16 10:07 BUG #19481: multivariate MCV expression stats not applied for equivalent predicates on nullable side of LEFT JOI PG Bug reporting form <noreply@postgresql.org>
2026-05-18 06:40 ` Andrei Lepikhov <lepihov@gmail.com>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox