[PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits

public inbox for pgsql-hackers@postgresql.org  
help / color / mirror / Atom feed

[PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
6+ messages / 4 participants
[nested] [flat]

* [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-19 05:46  JoongHyuk Shin <sjh910805@gmail.com>
  0 siblings, 1 reply; 6+ messages in thread

From: JoongHyuk Shin @ 2026-04-19 05:46 UTC (permalink / raw)
  To: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>

In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
The caller (LockBufferForCleanup) loops back, sets up another
deadlock_timeout,
and the signal gets sent again every interval.

The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
by adding a second ProcWaitForSignal() after the deadlock-check signal.
The buffer-pin path was left with an XXX comment asking "should we fix
this?".

The attached patch applies the same fix: after sending the deadlock-check
signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
so the startup process waits for UnpinBuffer() rather than looping
and re-signaling.

Patch attached.

Attachments:

  [application/octet-stream] 0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patch (2.4K, 3-0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patch)
  download | inline diff:
From 58239700edf0c669f4807da6140a595c2a1e8a5e Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <sjh910805@gmail.com>
Date: Fri, 17 Apr 2026 16:58:31 +0900
Subject: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin
 waits

After sending RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, the startup process
returned without waiting, so the caller's loop would fire another
deadlock_timeout and re-send the signal every interval.  This added
unnecessary overhead in both the startup process and backends.

Fix by adding a ProcWaitForSignal() call after the deadlock-check
signal, mirroring the approach already used in the lock-conflict path
(commit 8900b5a9d59a).  This ensures the signal is sent at most once
per deadlock_timeout period rather than repeatedly.

Also remove the XXX comment that noted this problem.
---
 src/backend/storage/ipc/standby.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 29af7733948..9744db5715c 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -851,17 +851,19 @@ ResolveRecoveryConflictWithBufferPin(void)
 		/*
 		 * Send out a request for hot-standby backends to check themselves for
 		 * deadlocks.
-		 *
-		 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
-		 * to be signaled by UnpinBuffer() again and send a request for
-		 * deadlocks check if deadlock_timeout happens. This causes the
-		 * request to continue to be sent every deadlock_timeout until the
-		 * buffer is unpinned or ltime is reached. This would increase the
-		 * workload in the startup process and backends. In practice it may
-		 * not be so harmful because the period that the buffer is kept pinned
-		 * is basically no so long. But we should fix this?
 		 */
 		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+		/*
+		 * Wait here to be signaled by UnpinBuffer(), to prevent the
+		 * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+		 * caller's loop) from firing another deadlock_timeout and re-sending
+		 * the deadlock-check signal.  Without this, the signal would be sent
+		 * every deadlock_timeout interval until the buffer is unpinned or
+		 * ltime is reached.
+		 */
+		got_standby_deadlock_timeout = false;
+		ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
 	}

 	/*
-- 
2.52.0

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-21 05:42  Fujii Masao <masao.fujii@gmail.com>
  parent: JoongHyuk Shin <sjh910805@gmail.com>
  0 siblings, 1 reply; 6+ messages in thread

From: Fujii Masao @ 2026-04-21 05:42 UTC (permalink / raw)
  To: JoongHyuk Shin <sjh910805@gmail.com>; +Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>

On Sun, Apr 19, 2026 at 2:47 PM JoongHyuk Shin <sjh910805@gmail.com> wrote:
>
> In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
> the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
> The caller (LockBufferForCleanup) loops back, sets up another deadlock_timeout,
> and the signal gets sent again every interval.
>
> The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
> by adding a second ProcWaitForSignal() after the deadlock-check signal.
> The buffer-pin path was left with an XXX comment asking "should we fix this?".
>
> The attached patch applies the same fix: after sending the deadlock-check
> signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
> so the startup process waits for UnpinBuffer() rather than looping
> and re-signaling.
>
> Patch attached.

Thanks for the patch! LGTM.

Since this change improves recovery-conflict behavior rather than fixing a bug,
it doesn't seem to need backpatching and we may need to wait until v20
development opens (probably July) before committing it.

While reading the patch and ResolveRecoveryConflictWithBufferPin(), I also
noticed that got_standby_delay_timeout is not initialized to false before
enabling the timeout. This is unrelated to the patch, and I think it is
harmless in the current code, but would it be better to initialize it there,
as we already do for got_standby_deadlock_timeout?

                    if (ltime != 0)
                    {
    +                       got_standby_delay_timeout = false;
                            timeouts[cnt].id = STANDBY_TIMEOUT;
                            timeouts[cnt].type = TMPARAM_AT;
                            timeouts[cnt].fin_time = ltime;

Regards,

-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-21 05:55  Michael Paquier <michael@paquier.xyz>
  parent: Fujii Masao <masao.fujii@gmail.com>
  0 siblings, 2 replies; 6+ messages in thread

From: Michael Paquier @ 2026-04-21 05:55 UTC (permalink / raw)
  To: Fujii Masao <masao.fujii@gmail.com>; +Cc: JoongHyuk Shin <sjh910805@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>

On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> Since this change improves recovery-conflict behavior rather than fixing a bug,
> it doesn't seem to need backpatching and we may need to wait until v20
> development opens (probably July) before committing it.

Yeah, this one is an improvement, not an actual bug, so let's wait for
v20 if worth doing (I did not check it). 
--
Michael


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-27 11:07  JoongHyuk Shin <sjh910805@gmail.com>
  parent: Michael Paquier <michael@paquier.xyz>
  1 sibling, 0 replies; 6+ messages in thread

From: JoongHyuk Shin @ 2026-04-27 11:07 UTC (permalink / raw)
  To: Michael Paquier <michael@paquier.xyz>; +Cc: Fujii Masao <masao.fujii@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>

Thanks for the review.
v2 attached, with the suggested initialization added for symmetry.

Agreed this is an improvement rather than a bug fix,
so I've updated the CF tag to Performance accordingly.

I also verified the fix locally on a primary-standby setup,
using the buffer-pin conflict scenario from src/test/recovery/t/
031_recovery_conflict.pl
(aborted INSERT + cursor on standby + VACUUM FREEZE on primary).
On master, strace showed 9 SIGUSR1 broadcasts to the conflicting backend
over a 10-second window (one per deadlock_timeout).
With the patch applied, only 1 broadcast over the same window.

Patch attached.

--
JoongHyuk Shin

On Tue, Apr 21, 2026 at 2:55 PM Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > Since this change improves recovery-conflict behavior rather than fixing
> a bug,
> > it doesn't seem to need backpatching and we may need to wait until v20
> > development opens (probably July) before committing it.
>
> Yeah, this one is an improvement, not an actual bug, so let's wait for
> v20 if worth doing (I did not check it).
> --
> Michael
>


Attachments:

  [application/octet-stream] v2-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch (2.8K, 3-v2-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch)
  download | inline diff:
From e70a2fc74a63d4c1e3d1277e27a37b1e26710fff Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <sjh910805@gmail.com>
Date: Fri, 17 Apr 2026 16:58:31 +0900
Subject: [PATCH v2] Prevent repeated deadlock-check signals in standby buffer
 pin waits

After sending RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, the startup process
returned without waiting, so the caller's loop would fire another
deadlock_timeout and re-send the signal every interval.  This added
unnecessary overhead in both the startup process and backends.

Fix by adding a ProcWaitForSignal() call after the deadlock-check
signal, mirroring the approach already used in the lock-conflict path
(commit 8900b5a9d59a).  This ensures the signal is sent at most once
per deadlock_timeout period rather than repeatedly.

Additionally, reset got_standby_delay_timeout to false before enabling
STANDBY_TIMEOUT, for symmetry with the existing got_standby_deadlock_timeout
reset.

Remove the XXX comment that noted this problem.
---
 src/backend/storage/ipc/standby.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 29af7733948..0dba1fb4289 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -818,6 +818,7 @@ ResolveRecoveryConflictWithBufferPin(void)
 
 		if (ltime != 0)
 		{
+			got_standby_delay_timeout = false;
 			timeouts[cnt].id = STANDBY_TIMEOUT;
 			timeouts[cnt].type = TMPARAM_AT;
 			timeouts[cnt].fin_time = ltime;
@@ -851,17 +852,19 @@ ResolveRecoveryConflictWithBufferPin(void)
 		/*
 		 * Send out a request for hot-standby backends to check themselves for
 		 * deadlocks.
-		 *
-		 * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
-		 * to be signaled by UnpinBuffer() again and send a request for
-		 * deadlocks check if deadlock_timeout happens. This causes the
-		 * request to continue to be sent every deadlock_timeout until the
-		 * buffer is unpinned or ltime is reached. This would increase the
-		 * workload in the startup process and backends. In practice it may
-		 * not be so harmful because the period that the buffer is kept pinned
-		 * is basically no so long. But we should fix this?
 		 */
 		SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+		/*
+		 * Wait here to be signaled by UnpinBuffer(), to prevent the
+		 * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+		 * caller's loop) from firing another deadlock_timeout and re-sending
+		 * the deadlock-check signal.  Without this, the signal would be sent
+		 * every deadlock_timeout interval until the buffer is unpinned or
+		 * ltime is reached.
+		 */
+		got_standby_deadlock_timeout = false;
+		ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
 	}
 
 	/*
-- 
2.52.0



^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-18 20:32  Álvaro Herrera <alvherre@kurilemu.de>
  parent: Michael Paquier <michael@paquier.xyz>
  1 sibling, 1 reply; 6+ messages in thread

From: Álvaro Herrera @ 2026-05-18 20:32 UTC (permalink / raw)
  To: Michael Paquier <michael@paquier.xyz>; +Cc: Fujii Masao <masao.fujii@gmail.com>; JoongHyuk Shin <sjh910805@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>; Vitaly Davydov <v.davydov@postgrespro.ru>

Hello,

On 2026-Apr-21, Michael Paquier wrote:

> On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > Since this change improves recovery-conflict behavior rather than fixing a bug,
> > it doesn't seem to need backpatching and we may need to wait until v20
> > development opens (probably July) before committing it.
> 
> Yeah, this one is an improvement, not an actual bug, so let's wait for
> v20 if worth doing (I did not check it). 

Hmm, is this related to 
https://postgr.es/m/44c24dcf-5710-410f-b1b6-d10b315f3d51@postgrespro.ru ?
In there, Vitaly claims to be reporting a bug that goes back to pg15,
which contradicts this assessment.

Regards,

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"¿Qué importan los años?  Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-05-22 08:41  JoongHyuk Shin <sjh910805@gmail.com>
  parent: Álvaro Herrera <alvherre@kurilemu.de>
  0 siblings, 0 replies; 6+ messages in thread

From: JoongHyuk Shin @ 2026-05-22 08:41 UTC (permalink / raw)
  To: Álvaro Herrera <alvherre@kurilemu.de>; +Cc: Michael Paquier <michael@paquier.xyz>; Fujii Masao <masao.fujii@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>; Vitaly Davydov <v.davydov@postgrespro.ru>

Hello.

Same function, different races, I think.
Vitaly reports a missed wake-up where deadlock_timeout never fires
(spurious SIGALRM from log_startup_progress_interval plus the lazy
setitimer in 09cf1d52).
This patch addresses the opposite,
deadlock_timeout does fire, but LockBufferForCleanup loops back and re-arms
it,
so the signal repeats once per second.

The two fixes do not overlap
(the added ProcWaitForSignal sits inside the deadlock branch that Vitaly's
scenario never reaches).

-- 
JH Shin

On Wed, May 20, 2026 at 7:15 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:

> Hello,
>
> On 2026-Apr-21, Michael Paquier wrote:
>
> > On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > > Since this change improves recovery-conflict behavior rather than
> fixing a bug,
> > > it doesn't seem to need backpatching and we may need to wait until v20
> > > development opens (probably July) before committing it.
> >
> > Yeah, this one is an improvement, not an actual bug, so let's wait for
> > v20 if worth doing (I did not check it).
>
> Hmm, is this related to
> https://postgr.es/m/44c24dcf-5710-410f-b1b6-d10b315f3d51@postgrespro.ru ?
> In there, Vitaly claims to be reporting a bug that goes back to pg15,
> which contradicts this assessment.
>
> Regards,
>
> --
> Álvaro Herrera        Breisgau, Deutschland  —
> https://www.EnterpriseDB.com/
> "¿Qué importan los años?  Lo que realmente importa es comprobar que
> a fin de cuentas la mejor edad de la vida es estar vivo"  (Mafalda)
>


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

end of thread, other threads:[~2026-05-22 08:41 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Fujii Masao <masao.fujii@gmail.com>
2026-04-21 05:55   ` Michael Paquier <michael@paquier.xyz>
2026-04-27 11:07     ` JoongHyuk Shin <sjh910805@gmail.com>
2026-05-18 20:32     ` Álvaro Herrera <alvherre@kurilemu.de>
2026-05-22 08:41       ` JoongHyuk Shin <sjh910805@gmail.com>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox