public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed
From: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
To: Radim Marek <radim@boringsql.com>
To: Andrey Borodin <x4mmm@yandex-team.ru>
To: Heikki Linnakangas <hlinnaka@iki.fi>
Cc: Marko Tiikkaja <marko@joh.to>
Cc: PostgreSQL mailing lists <pgsql-bugs@lists.postgresql.org>
Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
Date: Fri, 22 May 2026 22:21:32 +0530
Message-ID: <CAJTYsWU6tdEvVh4YKLxz7+amZ7+Wb7_s-FBjsMMeLNj1fKeSNg@mail.gmail.com> (raw)
In-Reply-To: <CAJgoLkKCu0wCwPQZSo5no=XATU-4LMK4QfKBwV928o2uKcxe=g@mail.gmail.com>
References: <19490-9c59c6a583513b99@postgresql.org>
	<46FE61C9-F273-45FD-BED7-0F8CDA6EB992@yandex-team.ru>
	<CAL9smLBMxKBCmsA9UGcmf93bT2_MsZ+POH-oHREuwKdmMU7jfQ@mail.gmail.com>
	<46DB3CAB-EA1C-41A5-9D6D-5F913A2AAF66@yandex-team.ru>
	<CAJgoLkJfFgL-V+pYB7=R81AbURTE6sMhzVHDQDhVGnfXRSJ9Wg@mail.gmail.com>
	<CAJgoLkKCu0wCwPQZSo5no=XATU-4LMK4QfKBwV928o2uKcxe=g@mail.gmail.com>

Hi,

On Thu, 21 May 2026 at 14:36, Radim Marek <radim@boringsql.com> wrote:

> Altough the culprit is known, I've got more data as requested.
>
> #0  0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #1  0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #2  0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
> #3  0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at
> ./build/src/backend/port/pg_sema.c:327
> #4  0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800,
> mode=mode@entry=LW_EXCLUSIVE) at
> ./build/../src/backend/storage/lmgr/lwlock.c:1314
> #5  0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040
> <MultiXactOffsetCtlData>, allow_redirtied=allow_redirtied@entry=false) at
> ./build/../src/backend/access/transam/slru.c:1174
> #6  0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449,
> nmembers=2, members=members@entry=0x7f20de6831ec) at
> ./build/../src/backend/access/transam/multixact.c:944
> #7  0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at
> ./build/../src/backend/access/transam/multixact.c:3464
> #8  0x000055a71774d9b8 in ApplyWalRecord (xlogreader=<optimized out>,
> record=0x7f20de6831b0, replayTLI=<synthetic pointer>) at
> ./build/../src/backend/access/transam/xlogrecovery.c:1951
> #9  PerformWalRecovery () at
> ./build/../src/backend/access/transam/xlogrecovery.c:1782
> #10 0x000055a717740def in StartupXLOG () at
> ./build/../src/backend/access/transam/xlog.c:5452
> #11 0x000055a71797c7e4 in StartupProcessMain () at
> ./build/../src/backend/postmaster/startup.c:282
> #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess)
> at ./build/../src/backend/postmaster/auxprocess.c:141
> #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at
> ./build/../src/backend/postmaster/postmaster.c:5381
> #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1,
> argv=argv@entry=0x55a73a8d0590) at
> ./build/../src/backend/postmaster/postmaster.c:1463
> #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at
> ./build/../src/backend/main/main.c:200
>
> and WAL dump
>
> rmgr: Btree       len (rec/tot):     64/    64, tx:     336098, lsn:
> 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel
> 1663/16384/16432 blk 536
> rmgr: MultiXact   len (rec/tot):     54/    54, tx:     336098, lsn:
> 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers
> 2: 336089 (keysh)
> 336098 (keysh)
> rmgr: Heap        len (rec/tot):     54/    54, tx:     336098, lsn:
> 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits:
> [IS_MULTI, LOCK_ONLY,
> KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0
> rmgr: Heap        len (rec/tot):     72/    72, tx:     336096, lsn:
> 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off:
> 52, old_infobits: [],
> flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401
> blk 22
> rmgr: Heap        len (rec/tot):     71/    71, tx:     336096, lsn:
> 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off:
> 149, old_infobits: [],
> flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399
> blk 6
> rmgr: Heap        len (rec/tot):     79/    79, tx:     336096, lsn:
> 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0:
> rel 1663/16384/16417
> blk 741
> rmgr: Heap        len (rec/tot):     72/    72, tx:     336097, lsn:
> 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off:
> 243, old_infobits: [],
> flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401
> blk 26
> rmgr: Transaction len (rec/tot):     34/    34, tx:     336096, lsn:
> 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC
>
> Radim
>

Thanks for the additional backtrace and WAL dump.  That makes the failure
mode much clearer.

The latest trace shows the startup process here:

  SimpleLruWriteAll(MultiXactOffsetCtl, false)
  RecordNewMultiXact(multi=79871, offset=218449, nmembers=2, ...)
  multixact_redo()

The WAL dump also shows the matching record:

  rmgr: MultiXact ... desc: CREATE_ID 79871 offset 218449 nmembers 2

79871 is the last multixact on its offsets page, so replaying that record
enters the next_pageno != pageno compatibility path added by 77dff5d937b.

On REL_14 through REL_16, RecordNewMultiXact() already holds
MultiXactOffsetSLRULock while executing that code.  SimpleLruWriteAll() then
tries to acquire MultiXactOffsetCtl's SLRU control lock, which is the same
MultiXactOffsetSLRULock on those branches.  That explains the standby
startup
process waiting forever on LWLock:MultiXactOffsetSLRU, with no corresponding
SLRU I/O activity.

I think the right fix is to remove that SimpleLruWriteAll() call while
keeping the missing-page initialization logic.  The flush is only meant to
make SimpleLruDoesPhysicalPageExist() see pages that exist in SLRU buffers
but have not reached disk.  In this fallback path, I don't see a way for
the tested next_pageno to be in that state: if RecordNewMultiXact() itself
initializes the page, it writes it synchronously with SimpleLruWritePage()
before setting last_initialized_offsets_page.

I attached a small patch for REL_16_STABLE.  The same self-deadlock pattern
is also present on PG 14 and 15.  PG 17 and
18 have the same compatibility call, but SLRU locking is banked
there, and RecordNewMultiXact() does not appear to hold the relevant bank
lock before calling SimpleLruWriteAll(), so I would not describe those
branches as having this exact self-deadlock, but needs more analysis.

Added both Andrey and Heikki in to-mail, since I'm not sure if this
is more extreme than the multixact offset issue we had with 16.12, or it
is at par with that.

Regards,
Ayush


Attachments:

  [application/octet-stream] v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch (2.5K, 3-v1-0001-Avoid-self-deadlock-on-MultiXactOffsetSLRULock-dur.patch)
  download | inline diff:
From b33abeede0847edac3603b87a478a832be1784f8 Mon Sep 17 00:00:00 2001
From: Ayush Tiwari <ayushtiwari.slg01@gmail.com>
Date: Thu, 21 May 2026 07:39:28 +0000
Subject: [PATCH REL_16_STABLE v1] Avoid self-deadlock on
 MultiXactOffsetSLRULock during WAL replay

Commit 77dff5d937b added a compatibility check in RecordNewMultiXact()
that can call SimpleLruWriteAll(MultiXactOffsetCtl, false) while already
holding MultiXactOffsetSLRULock.  In REL_16, SimpleLruWriteAll() tries
to acquire the same SLRU control lock, so WAL replay can self-deadlock
with the startup process waiting on LWLock:MultiXactOffsetSLRU.

The flush is not needed for the page tested in this fallback path.  If
RecordNewMultiXact() initializes that offsets page, it writes it
synchronously with SimpleLruWritePage() before updating
last_initialized_offsets_page.  Drop the unsafe flush and keep the
existing missing-page initialization logic.

Reported-by: Radim Marek <radim@boringsql.com>
Reported-by: Marko Tiikkaja <marko@joh.to>
Diagnosed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/19490-9c59c6a583513b99@postgresql.org
---
 src/backend/access/transam/multixact.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/src/backend/access/transam/multixact.c b/src/backend/access/transam/multixact.c
index f825579e888..5b6b48eb79c 100644
--- a/src/backend/access/transam/multixact.c
+++ b/src/backend/access/transam/multixact.c
@@ -934,16 +934,17 @@ RecordNewMultiXact(MultiXactId multi, MultiXactOffset offset,
 		 * seen any XLOG_MULTIXACT_ZERO_OFF_PAGE records yet, which should
 		 * happen at most once after starting WAL recovery.
 		 *
-		 * As an extra safety measure, if we do resort to
-		 * SimpleLruDoesPhysicalPageExist(), flush the SLRU buffers first so
-		 * that it will return an accurate result.
+		 *
+		 * We cannot call SimpleLruWriteAll() to flush the SLRU buffers
+		 * here, because that would self-deadlock on MultiXactOffsetSLRULock,
+		 * which we already hold.  Fortunately we do not need to: every
+		 * page that this code path initializes is synchronously flushed via
+		 * SimpleLruWritePage() below before this lock is released, so there
+		 * are no relevant dirty pages.
 		 *----------
 		 */
 		if (last_initialized_offsets_page == -1)
-		{
-			SimpleLruWriteAll(MultiXactOffsetCtl, false);
 			init_needed = !SimpleLruDoesPhysicalPageExist(MultiXactOffsetCtl, next_pageno);
-		}
 		else
 			init_needed = (last_initialized_offsets_page == pageno);
 
-- 
2.43.0



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: ayushtiwari.slg01@gmail.com, radim@boringsql.com, x4mmm@yandex-team.ru, hlinnaka@iki.fi, marko@joh.to, pgsql-bugs@lists.postgresql.org
  Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8
  In-Reply-To: <CAJTYsWU6tdEvVh4YKLxz7+amZ7+Wb7_s-FBjsMMeLNj1fKeSNg@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox