Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)

public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed

From: Tomas Vondra <tomas@vondra.me>
To: Adrian Mönnich <adrian.moennich@cern.ch>
To: Andres Freund <andres@anarazel.de>
Cc: pgsql-bugs@lists.postgresql.org
Cc: Tomas Vondra <tv@fuzzy.cz>
Cc: Thomas Munro <thomas.munro@gmail.com>
Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
Date: Thu, 2 Apr 2026 21:00:48 +0200
Message-ID: <e43f543b-fac2-46da-9a4c-951c038ac0bc@vondra.me> (raw)
In-Reply-To: <2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me>
References: <19449-4fac687c06cc7def@postgresql.org>
	<dihw6lynx3p75sv5fbgqjlsu3kfeagcnm4px2r7mgsvf4w2sf5@53udqm4e5wid>
	<43225458.20260402160627@cern.ch>
	<jivwllcuyvd7m4ceydwwpjptmadfe3cfbw47hqnej7yjfkleej@2q33rbrfybm4>
	<94712944.20260402164957@cern.ch>
	<2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me>



On 4/2/26 20:12, Tomas Vondra wrote:
> Hi,
> 
> I can reproduce the performance getting much worse in 16, using the
> provided SQL scripts. This is what I see:
> 
>   14: 1551.363 ms
>   15: 1385.414 ms
>   16: 161571.400 ms
>   17: 156434.543 ms
>   18: 159095.001 ms
> 
> I'm attaching the explains for 15+16. I don't know what's causing it,
> but I have a couple interesting observations.
> 
> 1) If I disable parallel query, the timings change to
> 
>   14: 3990.439 ms
>   15: 3518.453 ms
>   16: 3606.460 ms
>   17: 3591.039 ms
>   18: 3617.872 ms
> 
> So no regression in this case. It seems to be related to parallelism.
> 
> 
> 2) There seems to be an explosion of temporary files. We don't have that
> in explain, but I queried pg_stat_database before/after the query, and
> there's huge difference. Both start at
> 
>   temp_files               | 112
>   temp_bytes               | 1942275280
> 
> so 112 files, ~2GB disk space. But after the query, 15 says
> 
>   temp_files               | 721
>   temp_bytes               | 2755839184
> 
> while 16 has
> 
>   temp_files               | 2078995
>   temp_bytes               | 70607906000
> 
> 2M files and 70GB? Wow!
> 
> 
> 3) Indeed, before the query completes the pgsql_tmp directory has this:
> 
>   63M	pgsql_tmp3499395.0.fileset
>   63G	pgsql_tmp3499395.1.fileset
>   95M	pgsql_tmp3499395.2.fileset
>   95M	pgsql_tmp3499395.3.fileset
>   127M	pgsql_tmp3499395.4.fileset
> 
> So I guess that's one of the parallel hash joins doing something, and
> consuming 63GB of disk space? I don't see anything suspicious in the
> plan, but I assume parallel HJ may not report the relevant stats.
> 
> FWIW bumping up work_mem (to 64MB) solved this with the sample data.
> 
> I suspect this is going to be something like the hash join explosion,
> where we just happen to add more and more batches. I don't have time to
> investigate this more at the moment.
> 

FWIW I think that's what's happening. If I add an elog(WARNING) into
ExecParallelHashJoinSetUpBatches, I see this:

    WARNING:  0x55dbe375a5e8 initializing 16 batches
    WARNING:  0x7f3868a3a978 initializing 32 batches
    WARNING:  0x7f3868a3ab80 initializing 4 batches
    WARNING:  0x55dbe36148c0 initializing 4 batches
    WARNING:  0x7f3868a3b230 initializing 16 batches
    WARNING:  0x7f3868a3a978 initializing 64 batches
    WARNING:  0x55dbe36144b0 initializing 128 batches
    WARNING:  0x55dbe36144b0 initializing 256 batches
    WARNING:  0x55dbe36144b0 initializing 512 batches
    WARNING:  0x55dbe36144b0 initializing 1024 batches
    WARNING:  0x7f3868a3a978 initializing 2048 batches
    WARNING:  0x7f3868a3a978 initializing 4096 batches
    WARNING:  0x55dbe36144b0 initializing 8192 batches
    WARNING:  0x55dbe36144b0 initializing 16384 batches
    WARNING:  0x55dbe36144b0 initializing 32768 batches
    WARNING:  0x7f3868a3a978 initializing 65536 batches
    WARNING:  0x55dbe36144b0 initializing 131072 batches
    WARNING:  0x7f3868a3a978 initializing 262144 batches

so we're ending with 256k batches, for this one join. I'm not sure how
exactly this maps to the 2M files from pg_stat_database, but it means
~0.5M tuplestores and ~10GB virtual memory (at lest per top).

I don't know what triggers the batch increase, but I still suspect it's
similar to the explosion we fixed (or mitigated) in PG18, but only for
serial (non-parallel) joins.

regards

-- 
Tomas Vondra

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: tomas@vondra.me, adrian.moennich@cern.ch, andres@anarazel.de, pgsql-bugs@lists.postgresql.org, tv@fuzzy.cz, thomas.munro@gmail.com
  Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours)
  In-Reply-To: <e43f543b-fac2-46da-9a4c-951c038ac0bc@vondra.me>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox