Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8NHk-000TDN-03 for pgsql-bugs@arkaria.postgresql.org; Thu, 02 Apr 2026 19:00:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8NHi-007gwS-2b for pgsql-bugs@arkaria.postgresql.org; Thu, 02 Apr 2026 19:00:55 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8NHi-007gwB-1b for pgsql-bugs@lists.postgresql.org; Thu, 02 Apr 2026 19:00:54 +0000 Received: from relay2-d.mail.gandi.net ([2001:4b98:dc4:8::222]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w8NHg-00000000FOu-0Ije for pgsql-bugs@lists.postgresql.org; Thu, 02 Apr 2026 19:00:54 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 2C32B3EB9F; Thu, 2 Apr 2026 19:00:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1775156451; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=MkGFPk9wfu1YSqgZtly7dQwmPcBugjjRdsvd1APx9Zs=; b=BmtNrqdNUjV9XUZqD7ycIZ7r0fz+zbBEwbhPgOjJbN1V5YQKKii73wFJXav3qgPRn9TEmt T2ls3dm78xJvdnmthcq9su/Ak7UeUq0VRjOjdEUf94/PHLxzxB7rzMGpuXomH5BpHzDZI/ TlH6dSi3F4qaG9yQaZTL4YZPvO4DiHwbRnTBawcdWs51f03MIDENj9XcGB5z4AiVJ91jCT 1X1NKraPZK/IQ5dqTy/JEL0Jh+WO1NgFxNHl3/jxw8DluQrfJyTdcxk6hCkpr+a3VjsmyR /hJPvjLEYTjROdAfZdEs6ul6CuC7fBP07cFP/mKA5qYw/EDtDU3jbF2cG2FRcA== Message-ID: Date: Thu, 2 Apr 2026 21:00:48 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: BUG #19449: Massive performance degradation for complex query on Postgres 16+ (few seconds -> multiple hours) From: Tomas Vondra To: =?UTF-8?Q?Adrian_M=C3=B6nnich?= , Andres Freund Cc: pgsql-bugs@lists.postgresql.org, Tomas Vondra , Thomas Munro References: <19449-4fac687c06cc7def@postgresql.org> <43225458.20260402160627@cern.ch> <94712944.20260402164957@cern.ch> <2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me> Content-Language: en-US In-Reply-To: <2747373b-d188-43b1-8e49-66f9e23e3c24@vondra.me> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GND-Sasl: tomas@vondra.me X-GND-Cause: dmFkZTEHLR0Wc5fJGpoGXX+/bfhXdnAt7lzLDAIgH/kGJV1/INfz4jIwu2s6tq2V7nLLzfBdqhIkKkvohzGpGL98K4K85z7BeSwFZuJypIXnrwikU6QAGpzD/DA2jU6Jggl/QCQO6gsL1I7Mj/pzDqXgSrkPBBlhXf7364yUr9ObADW1BJzkaccmiUz31P7lOAj2Y2Hl7lKZ7tBED+F5aVW7a2La79xUHl5kCmTVgUX9T2NvjBMF8u7dPMrRn6Vxp4MsxkSLzMZe671BMlV3Gp9IwidIqfZpt5SNh+C6bZvdtBybPkvX+ymhc9K02y636G7M9pL4Pe4TMwEI9hZngg/gvKLePZGPmXIEVNkH3SPxPF8ur+aI3zkeN+bJaXJgTUx44MuMkPKU5A40CEPqcID9qe1AgRQSv+5UapwTkyAheCU+QqRPMK2T4VOFlZ1v03RP3zL81uVYuM9XTXDhW4ajDO77+np1FEjsoc0dvJVMXIDXiv0KI2vnTLuV1s5MrjypTpfK9l9NGXIkhuDEDfcb96DKuQsZYO65Mzv0xN0y+6C4XQ0RhLt4Fyh0YbKwWkvjfFZPIixsw3Jq9uklzb2UNlhv47XCZmeat1WHCfohxYQ1xK66gU5gURYRvpyIeve49ALqRBESppm3dKHg2K/MxR/4Xk1t79iDpoY5oDqkDcfp6w X-GND-State: clean X-GND-Score: -100 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 4/2/26 20:12, Tomas Vondra wrote: > Hi, > > I can reproduce the performance getting much worse in 16, using the > provided SQL scripts. This is what I see: > > 14: 1551.363 ms > 15: 1385.414 ms > 16: 161571.400 ms > 17: 156434.543 ms > 18: 159095.001 ms > > I'm attaching the explains for 15+16. I don't know what's causing it, > but I have a couple interesting observations. > > 1) If I disable parallel query, the timings change to > > 14: 3990.439 ms > 15: 3518.453 ms > 16: 3606.460 ms > 17: 3591.039 ms > 18: 3617.872 ms > > So no regression in this case. It seems to be related to parallelism. > > > 2) There seems to be an explosion of temporary files. We don't have that > in explain, but I queried pg_stat_database before/after the query, and > there's huge difference. Both start at > > temp_files | 112 > temp_bytes | 1942275280 > > so 112 files, ~2GB disk space. But after the query, 15 says > > temp_files | 721 > temp_bytes | 2755839184 > > while 16 has > > temp_files | 2078995 > temp_bytes | 70607906000 > > 2M files and 70GB? Wow! > > > 3) Indeed, before the query completes the pgsql_tmp directory has this: > > 63M pgsql_tmp3499395.0.fileset > 63G pgsql_tmp3499395.1.fileset > 95M pgsql_tmp3499395.2.fileset > 95M pgsql_tmp3499395.3.fileset > 127M pgsql_tmp3499395.4.fileset > > So I guess that's one of the parallel hash joins doing something, and > consuming 63GB of disk space? I don't see anything suspicious in the > plan, but I assume parallel HJ may not report the relevant stats. > > FWIW bumping up work_mem (to 64MB) solved this with the sample data. > > I suspect this is going to be something like the hash join explosion, > where we just happen to add more and more batches. I don't have time to > investigate this more at the moment. > FWIW I think that's what's happening. If I add an elog(WARNING) into ExecParallelHashJoinSetUpBatches, I see this: WARNING: 0x55dbe375a5e8 initializing 16 batches WARNING: 0x7f3868a3a978 initializing 32 batches WARNING: 0x7f3868a3ab80 initializing 4 batches WARNING: 0x55dbe36148c0 initializing 4 batches WARNING: 0x7f3868a3b230 initializing 16 batches WARNING: 0x7f3868a3a978 initializing 64 batches WARNING: 0x55dbe36144b0 initializing 128 batches WARNING: 0x55dbe36144b0 initializing 256 batches WARNING: 0x55dbe36144b0 initializing 512 batches WARNING: 0x55dbe36144b0 initializing 1024 batches WARNING: 0x7f3868a3a978 initializing 2048 batches WARNING: 0x7f3868a3a978 initializing 4096 batches WARNING: 0x55dbe36144b0 initializing 8192 batches WARNING: 0x55dbe36144b0 initializing 16384 batches WARNING: 0x55dbe36144b0 initializing 32768 batches WARNING: 0x7f3868a3a978 initializing 65536 batches WARNING: 0x55dbe36144b0 initializing 131072 batches WARNING: 0x7f3868a3a978 initializing 262144 batches so we're ending with 256k batches, for this one join. I'm not sure how exactly this maps to the 2M files from pg_stat_database, but it means ~0.5M tuplestores and ~10GB virtual memory (at lest per top). I don't know what triggers the batch increase, but I still suspect it's similar to the explosion we fixed (or mitigated) in PG18, but only for serial (non-parallel) joins. regards -- Tomas Vondra