public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed
From: Thomas Munro <thomas.munro@gmail.com>
To: Tomas Vondra <tomas@vondra.me>
Cc: Tom Lane <tgl@sss.pgh.pa.us>
Cc: Adrian Mönnich <adrian.moennich@cern.ch>
Cc: Andres Freund <andres@anarazel.de>
Cc: pgsql-bugs@lists.postgresql.org
Cc: Tomas Vondra <tv@fuzzy.cz>
Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
Date: Thu, 16 Apr 2026 17:25:01 +1200
Message-ID: <CA+hUKGL=tHLCbVHMEno2AQentOYL0jTS=iLE0BEgdWPaUDASiA@mail.gmail.com> (raw)
In-Reply-To: <9f6b7a6d-62db-4a63-9fb7-5deee702a24f@vondra.me>
References: <19449-4fac687c06cc7def@postgresql.org>
	<dihw6lynx3p75sv5fbgqjlsu3kfeagcnm4px2r7mgsvf4w2sf5@53udqm4e5wid>
	<43225458.20260402160627@cern.ch>
	<jivwllcuyvd7m4ceydwwpjptmadfe3cfbw47hqnej7yjfkleej@2q33rbrfybm4>
	<94712944.20260402164957@cern.ch>
	<2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me>
	<e43f543b-fac2-46da-9a4c-951c038ac0bc@vondra.me>
	<bbaf15a4-d743-47df-92fe-a1c5e94165ba@vondra.me>
	<3675338.1775169816@sss.pgh.pa.us>
	<9f6b7a6d-62db-4a63-9fb7-5deee702a24f@vondra.me>

On Sun, Apr 5, 2026 at 2:45 AM Tomas Vondra <tomas@vondra.me> wrote:
> At this point I was suspecting the data distributions for the join
> columns may be somewhat weird, causing issues for the hashjoin batching.
> For events.contributions.id it's perfectly fine - it's entirely unique,
> with each ID having 1 entry. Unsurprisingly, because it's the PK. But
> for attachments.folders.contribution_id I see this:
>
> SELECT contribution_id, count(*) FROM attachments.folders
>  GROUP BY contribution_id ORDER BY 2 DESC;
>
>  contribution_id | count
> -----------------+--------
>                  | 464515
>          5492978 |     67
>          4117499 |     42
>          4045045 |     41
>              ...
>
> So there's ~500k entries with NULL, that can't possibly match to
> anything (right)? I assume we still add them to the hash, though.

That's also the conditions required to prevent the
"stop-partitioning-it's-not-working" logic from triggering.  That
thing where we know we need to pick a better lower than 100%.  But
what?

Did this commit help?

commit 1811f1af98fb237fdd5adb588cd4b57c433b75f8
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Mar 19 15:21:36 2026 -0400

    Improve hash join's handling of tuples with null join keys.






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: thomas.munro@gmail.com, tomas@vondra.me, tgl@sss.pgh.pa.us, adrian.moennich@cern.ch, andres@anarazel.de, pgsql-bugs@lists.postgresql.org, tv@fuzzy.cz
  Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
  In-Reply-To: <CA+hUKGL=tHLCbVHMEno2AQentOYL0jTS=iLE0BEgdWPaUDASiA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox