Message-ID: <e43f543b-fac2-46da-9a4c-951c038ac0bc@vondra.me>
Date: Thu, 2 Apr 2026 21:00:48 +0200
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: BUG #19449: Massive performance degradation for complex query on
 Postgres 16+ (few seconds -> multiple hours)
From: Tomas Vondra <tomas@vondra.me>
To: =?UTF-8?Q?Adrian_M=C3=B6nnich?= <adrian.moennich@cern.ch>,
 Andres Freund <andres@anarazel.de>
Cc: pgsql-bugs@lists.postgresql.org, Tomas Vondra <tv@fuzzy.cz>,
 Thomas Munro <thomas.munro@gmail.com>
References: <19449-4fac687c06cc7def@postgresql.org>
 <dihw6lynx3p75sv5fbgqjlsu3kfeagcnm4px2r7mgsvf4w2sf5@53udqm4e5wid>
 <43225458.20260402160627@cern.ch>
 <jivwllcuyvd7m4ceydwwpjptmadfe3cfbw47hqnej7yjfkleej@2q33rbrfybm4>
 <94712944.20260402164957@cern.ch>
 <2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me>
Content-Language: en-US
In-Reply-To: <2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Archived-At: <https://www.postgresql.org/message-id/e43f543b-fac2-46da-9a4c-951c038ac0bc%40vondra.me>
Precedence: bulk


On 4/2/26 20:12, Tomas Vondra wrote:
> Hi,
> 
> I can reproduce the performance getting much worse in 16, using the
> provided SQL scripts. This is what I see:
> 
>   14: 1551.363 ms
>   15: 1385.414 ms
>   16: 161571.400 ms
>   17: 156434.543 ms
>   18: 159095.001 ms
> 
> I'm attaching the explains for 15+16. I don't know what's causing it,
> but I have a couple interesting observations.
> 
> 1) If I disable parallel query, the timings change to
> 
>   14: 3990.439 ms
>   15: 3518.453 ms
>   16: 3606.460 ms
>   17: 3591.039 ms
>   18: 3617.872 ms
> 
> So no regression in this case. It seems to be related to parallelism.
> 
> 
> 2) There seems to be an explosion of temporary files. We don't have that
> in explain, but I queried pg_stat_database before/after the query, and
> there's huge difference. Both start at
> 
>   temp_files               | 112
>   temp_bytes               | 1942275280
> 
> so 112 files, ~2GB disk space. But after the query, 15 says
> 
>   temp_files               | 721
>   temp_bytes               | 2755839184
> 
> while 16 has
> 
>   temp_files               | 2078995
>   temp_bytes               | 70607906000
> 
> 2M files and 70GB? Wow!
> 
> 
> 3) Indeed, before the query completes the pgsql_tmp directory has this:
> 
>   63M	pgsql_tmp3499395.0.fileset
>   63G	pgsql_tmp3499395.1.fileset
>   95M	pgsql_tmp3499395.2.fileset
>   95M	pgsql_tmp3499395.3.fileset
>   127M	pgsql_tmp3499395.4.fileset
> 
> So I guess that's one of the parallel hash joins doing something, and
> consuming 63GB of disk space? I don't see anything suspicious in the
> plan, but I assume parallel HJ may not report the relevant stats.
> 
> FWIW bumping up work_mem (to 64MB) solved this with the sample data.
> 
> I suspect this is going to be something like the hash join explosion,
> where we just happen to add more and more batches. I don't have time to
> investigate this more at the moment.
> 

FWIW I think that's what's happening. If I add an elog(WARNING) into
ExecParallelHashJoinSetUpBatches, I see this:

    WARNING:  0x55dbe375a5e8 initializing 16 batches
    WARNING:  0x7f3868a3a978 initializing 32 batches
    WARNING:  0x7f3868a3ab80 initializing 4 batches
    WARNING:  0x55dbe36148c0 initializing 4 batches
    WARNING:  0x7f3868a3b230 initializing 16 batches
    WARNING:  0x7f3868a3a978 initializing 64 batches
    WARNING:  0x55dbe36144b0 initializing 128 batches
    WARNING:  0x55dbe36144b0 initializing 256 batches
    WARNING:  0x55dbe36144b0 initializing 512 batches
    WARNING:  0x55dbe36144b0 initializing 1024 batches
    WARNING:  0x7f3868a3a978 initializing 2048 batches
    WARNING:  0x7f3868a3a978 initializing 4096 batches
    WARNING:  0x55dbe36144b0 initializing 8192 batches
    WARNING:  0x55dbe36144b0 initializing 16384 batches
    WARNING:  0x55dbe36144b0 initializing 32768 batches
    WARNING:  0x7f3868a3a978 initializing 65536 batches
    WARNING:  0x55dbe36144b0 initializing 131072 batches
    WARNING:  0x7f3868a3a978 initializing 262144 batches

so we're ending with 256k batches, for this one join. I'm not sure how
exactly this maps to the 2M files from pg_stat_database, but it means
~0.5M tuplestores and ~10GB virtual memory (at lest per top).

I don't know what triggers the batch increase, but I still suspect it's
similar to the explosion we fixed (or mitigated) in PG18, but only for
serial (non-parallel) joins.

regards

-- 
Tomas Vondra