public inbox for pgsql-hackers@postgresql.org
help / color / mirror / Atom feed[PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
6+ messages / 4 participants
[nested] [flat]
* [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
@ 2026-04-19 05:46 JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Fujii Masao <masao.fujii@gmail.com>
0 siblings, 1 reply; 6+ messages in thread
From: JoongHyuk Shin @ 2026-04-19 05:46 UTC (permalink / raw)
To: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
The caller (LockBufferForCleanup) loops back, sets up another
deadlock_timeout,
and the signal gets sent again every interval.
The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
by adding a second ProcWaitForSignal() after the deadlock-check signal.
The buffer-pin path was left with an XXX comment asking "should we fix
this?".
The attached patch applies the same fix: after sending the deadlock-check
signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
so the startup process waits for UnpinBuffer() rather than looping
and re-signaling.
Patch attached.
Attachments:
[application/octet-stream] 0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patch (2.4K, 3-0001-Prevent-repeated-deadlock-check-signals-in-standby-b.patch)
download | inline diff:
From 58239700edf0c669f4807da6140a595c2a1e8a5e Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <sjh910805@gmail.com>
Date: Fri, 17 Apr 2026 16:58:31 +0900
Subject: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin
waits
After sending RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, the startup process
returned without waiting, so the caller's loop would fire another
deadlock_timeout and re-send the signal every interval. This added
unnecessary overhead in both the startup process and backends.
Fix by adding a ProcWaitForSignal() call after the deadlock-check
signal, mirroring the approach already used in the lock-conflict path
(commit 8900b5a9d59a). This ensures the signal is sent at most once
per deadlock_timeout period rather than repeatedly.
Also remove the XXX comment that noted this problem.
---
src/backend/storage/ipc/standby.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 29af7733948..9744db5715c 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -851,17 +851,19 @@ ResolveRecoveryConflictWithBufferPin(void)
/*
* Send out a request for hot-standby backends to check themselves for
* deadlocks.
- *
- * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
- * to be signaled by UnpinBuffer() again and send a request for
- * deadlocks check if deadlock_timeout happens. This causes the
- * request to continue to be sent every deadlock_timeout until the
- * buffer is unpinned or ltime is reached. This would increase the
- * workload in the startup process and backends. In practice it may
- * not be so harmful because the period that the buffer is kept pinned
- * is basically no so long. But we should fix this?
*/
SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+ /*
+ * Wait here to be signaled by UnpinBuffer(), to prevent the
+ * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+ * caller's loop) from firing another deadlock_timeout and re-sending
+ * the deadlock-check signal. Without this, the signal would be sent
+ * every deadlock_timeout interval until the buffer is unpinned or
+ * ltime is reached.
+ */
+ got_standby_deadlock_timeout = false;
+ ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
}
/*
--
2.52.0
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
@ 2026-04-21 05:42 ` Fujii Masao <masao.fujii@gmail.com>
2026-04-21 05:55 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Michael Paquier <michael@paquier.xyz>
0 siblings, 1 reply; 6+ messages in thread
From: Fujii Masao @ 2026-04-21 05:42 UTC (permalink / raw)
To: JoongHyuk Shin <sjh910805@gmail.com>; +Cc: pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
On Sun, Apr 19, 2026 at 2:47 PM JoongHyuk Shin <sjh910805@gmail.com> wrote:
>
> In ResolveRecoveryConflictWithBufferPin(), when deadlock_timeout fires,
> the function sends RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK and returns.
> The caller (LockBufferForCleanup) loops back, sets up another deadlock_timeout,
> and the signal gets sent again every interval.
>
> The lock-conflict path had the same problem and was fixed in 8900b5a9d59a
> by adding a second ProcWaitForSignal() after the deadlock-check signal.
> The buffer-pin path was left with an XXX comment asking "should we fix this?".
>
> The attached patch applies the same fix: after sending the deadlock-check
> signal, reset got_standby_deadlock_timeout and call ProcWaitForSignal()
> so the startup process waits for UnpinBuffer() rather than looping
> and re-signaling.
>
> Patch attached.
Thanks for the patch! LGTM.
Since this change improves recovery-conflict behavior rather than fixing a bug,
it doesn't seem to need backpatching and we may need to wait until v20
development opens (probably July) before committing it.
While reading the patch and ResolveRecoveryConflictWithBufferPin(), I also
noticed that got_standby_delay_timeout is not initialized to false before
enabling the timeout. This is unrelated to the patch, and I think it is
harmless in the current code, but would it be better to initialize it there,
as we already do for got_standby_deadlock_timeout?
if (ltime != 0)
{
+ got_standby_delay_timeout = false;
timeouts[cnt].id = STANDBY_TIMEOUT;
timeouts[cnt].type = TMPARAM_AT;
timeouts[cnt].fin_time = ltime;
Regards,
--
Fujii Masao
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Fujii Masao <masao.fujii@gmail.com>
@ 2026-04-21 05:55 ` Michael Paquier <michael@paquier.xyz>
2026-04-27 11:07 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-05-18 20:32 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Álvaro Herrera <alvherre@kurilemu.de>
0 siblings, 2 replies; 6+ messages in thread
From: Michael Paquier @ 2026-04-21 05:55 UTC (permalink / raw)
To: Fujii Masao <masao.fujii@gmail.com>; +Cc: JoongHyuk Shin <sjh910805@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> Since this change improves recovery-conflict behavior rather than fixing a bug,
> it doesn't seem to need backpatching and we may need to wait until v20
> development opens (probably July) before committing it.
Yeah, this one is an improvement, not an actual bug, so let's wait for
v20 if worth doing (I did not check it).
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Fujii Masao <masao.fujii@gmail.com>
2026-04-21 05:55 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Michael Paquier <michael@paquier.xyz>
@ 2026-04-27 11:07 ` JoongHyuk Shin <sjh910805@gmail.com>
1 sibling, 0 replies; 6+ messages in thread
From: JoongHyuk Shin @ 2026-04-27 11:07 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: Fujii Masao <masao.fujii@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>
Thanks for the review.
v2 attached, with the suggested initialization added for symmetry.
Agreed this is an improvement rather than a bug fix,
so I've updated the CF tag to Performance accordingly.
I also verified the fix locally on a primary-standby setup,
using the buffer-pin conflict scenario from src/test/recovery/t/
031_recovery_conflict.pl
(aborted INSERT + cursor on standby + VACUUM FREEZE on primary).
On master, strace showed 9 SIGUSR1 broadcasts to the conflicting backend
over a 10-second window (one per deadlock_timeout).
With the patch applied, only 1 broadcast over the same window.
Patch attached.
--
JoongHyuk Shin
On Tue, Apr 21, 2026 at 2:55 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > Since this change improves recovery-conflict behavior rather than fixing
> a bug,
> > it doesn't seem to need backpatching and we may need to wait until v20
> > development opens (probably July) before committing it.
>
> Yeah, this one is an improvement, not an actual bug, so let's wait for
> v20 if worth doing (I did not check it).
> --
> Michael
>
Attachments:
[application/octet-stream] v2-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch (2.8K, 3-v2-0001-Prevent-repeated-deadlock-check-signals-in-standb.patch)
download | inline diff:
From e70a2fc74a63d4c1e3d1277e27a37b1e26710fff Mon Sep 17 00:00:00 2001
From: JoongHyuk Shin <sjh910805@gmail.com>
Date: Fri, 17 Apr 2026 16:58:31 +0900
Subject: [PATCH v2] Prevent repeated deadlock-check signals in standby buffer
pin waits
After sending RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK, the startup process
returned without waiting, so the caller's loop would fire another
deadlock_timeout and re-send the signal every interval. This added
unnecessary overhead in both the startup process and backends.
Fix by adding a ProcWaitForSignal() call after the deadlock-check
signal, mirroring the approach already used in the lock-conflict path
(commit 8900b5a9d59a). This ensures the signal is sent at most once
per deadlock_timeout period rather than repeatedly.
Additionally, reset got_standby_delay_timeout to false before enabling
STANDBY_TIMEOUT, for symmetry with the existing got_standby_deadlock_timeout
reset.
Remove the XXX comment that noted this problem.
---
src/backend/storage/ipc/standby.c | 21 ++++++++++++---------
1 file changed, 12 insertions(+), 9 deletions(-)
diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c
index 29af7733948..0dba1fb4289 100644
--- a/src/backend/storage/ipc/standby.c
+++ b/src/backend/storage/ipc/standby.c
@@ -818,6 +818,7 @@ ResolveRecoveryConflictWithBufferPin(void)
if (ltime != 0)
{
+ got_standby_delay_timeout = false;
timeouts[cnt].id = STANDBY_TIMEOUT;
timeouts[cnt].type = TMPARAM_AT;
timeouts[cnt].fin_time = ltime;
@@ -851,17 +852,19 @@ ResolveRecoveryConflictWithBufferPin(void)
/*
* Send out a request for hot-standby backends to check themselves for
* deadlocks.
- *
- * XXX The subsequent ResolveRecoveryConflictWithBufferPin() will wait
- * to be signaled by UnpinBuffer() again and send a request for
- * deadlocks check if deadlock_timeout happens. This causes the
- * request to continue to be sent every deadlock_timeout until the
- * buffer is unpinned or ltime is reached. This would increase the
- * workload in the startup process and backends. In practice it may
- * not be so harmful because the period that the buffer is kept pinned
- * is basically no so long. But we should fix this?
*/
SendRecoveryConflictWithBufferPin(RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK);
+
+ /*
+ * Wait here to be signaled by UnpinBuffer(), to prevent the
+ * subsequent ResolveRecoveryConflictWithBufferPin() call (from the
+ * caller's loop) from firing another deadlock_timeout and re-sending
+ * the deadlock-check signal. Without this, the signal would be sent
+ * every deadlock_timeout interval until the buffer is unpinned or
+ * ltime is reached.
+ */
+ got_standby_deadlock_timeout = false;
+ ProcWaitForSignal(WAIT_EVENT_BUFFER_CLEANUP);
}
/*
--
2.52.0
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Fujii Masao <masao.fujii@gmail.com>
2026-04-21 05:55 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Michael Paquier <michael@paquier.xyz>
@ 2026-05-18 20:32 ` Álvaro Herrera <alvherre@kurilemu.de>
2026-05-22 08:41 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
1 sibling, 1 reply; 6+ messages in thread
From: Álvaro Herrera @ 2026-05-18 20:32 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: Fujii Masao <masao.fujii@gmail.com>; JoongHyuk Shin <sjh910805@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>; Vitaly Davydov <v.davydov@postgrespro.ru>
Hello,
On 2026-Apr-21, Michael Paquier wrote:
> On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > Since this change improves recovery-conflict behavior rather than fixing a bug,
> > it doesn't seem to need backpatching and we may need to wait until v20
> > development opens (probably July) before committing it.
>
> Yeah, this one is an improvement, not an actual bug, so let's wait for
> v20 if worth doing (I did not check it).
Hmm, is this related to
https://postgr.es/m/44c24dcf-5710-410f-b1b6-d10b315f3d51@postgrespro.ru ?
In there, Vitaly claims to be reporting a bug that goes back to pg15,
which contradicts this assessment.
Regards,
--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"¿Qué importan los años? Lo que realmente importa es comprobar que
a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
^ permalink raw reply [nested|flat] 6+ messages in thread
* Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Fujii Masao <masao.fujii@gmail.com>
2026-04-21 05:55 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Michael Paquier <michael@paquier.xyz>
2026-05-18 20:32 ` Re: [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits Álvaro Herrera <alvherre@kurilemu.de>
@ 2026-05-22 08:41 ` JoongHyuk Shin <sjh910805@gmail.com>
0 siblings, 0 replies; 6+ messages in thread
From: JoongHyuk Shin @ 2026-05-22 08:41 UTC (permalink / raw)
To: Álvaro Herrera <alvherre@kurilemu.de>; +Cc: Michael Paquier <michael@paquier.xyz>; Fujii Masao <masao.fujii@gmail.com>; pgsql-hackers@lists.postgresql.org <pgsql-hackers@lists.postgresql.org>; Vitaly Davydov <v.davydov@postgrespro.ru>
Hello.
Same function, different races, I think.
Vitaly reports a missed wake-up where deadlock_timeout never fires
(spurious SIGALRM from log_startup_progress_interval plus the lazy
setitimer in 09cf1d52).
This patch addresses the opposite,
deadlock_timeout does fire, but LockBufferForCleanup loops back and re-arms
it,
so the signal repeats once per second.
The two fixes do not overlap
(the added ProcWaitForSignal sits inside the deadlock branch that Vitaly's
scenario never reaches).
--
JH Shin
On Wed, May 20, 2026 at 7:15 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Hello,
>
> On 2026-Apr-21, Michael Paquier wrote:
>
> > On Tue, Apr 21, 2026 at 02:42:38PM +0900, Fujii Masao wrote:
> > > Since this change improves recovery-conflict behavior rather than
> fixing a bug,
> > > it doesn't seem to need backpatching and we may need to wait until v20
> > > development opens (probably July) before committing it.
> >
> > Yeah, this one is an improvement, not an actual bug, so let's wait for
> > v20 if worth doing (I did not check it).
>
> Hmm, is this related to
> https://postgr.es/m/44c24dcf-5710-410f-b1b6-d10b315f3d51@postgrespro.ru ?
> In there, Vitaly claims to be reporting a bug that goes back to pg15,
> which contradicts this assessment.
>
> Regards,
>
> --
> Álvaro Herrera Breisgau, Deutschland —
> https://www.EnterpriseDB.com/
> "¿Qué importan los años? Lo que realmente importa es comprobar que
> a fin de cuentas la mejor edad de la vida es estar vivo" (Mafalda)
>
^ permalink raw reply [nested|flat] 6+ messages in thread
end of thread, other threads:[~2026-05-22 08:41 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-19 05:46 [PATCH] Prevent repeated deadlock-check signals in standby buffer pin waits JoongHyuk Shin <sjh910805@gmail.com>
2026-04-21 05:42 ` Fujii Masao <masao.fujii@gmail.com>
2026-04-21 05:55 ` Michael Paquier <michael@paquier.xyz>
2026-04-27 11:07 ` JoongHyuk Shin <sjh910805@gmail.com>
2026-05-18 20:32 ` Álvaro Herrera <alvherre@kurilemu.de>
2026-05-22 08:41 ` JoongHyuk Shin <sjh910805@gmail.com>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox