Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPzMU-001CUh-1p for pgsql-bugs@arkaria.postgresql.org; Thu, 21 May 2026 09:06:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wPzMS-009kFx-0V for pgsql-bugs@arkaria.postgresql.org; Thu, 21 May 2026 09:06:37 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPzMR-009kFo-23 for pgsql-bugs@lists.postgresql.org; Thu, 21 May 2026 09:06:36 +0000 Received: from mail-lf1-x12a.google.com ([2a00:1450:4864:20::12a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wPzMQ-000000004jN-0UBQ for pgsql-bugs@lists.postgresql.org; Thu, 21 May 2026 09:06:35 +0000 Received: by mail-lf1-x12a.google.com with SMTP id 2adb3069b0e04-5a85b30dd54so6378178e87.2 for ; Thu, 21 May 2026 02:06:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779354391; cv=none; d=google.com; s=arc-20240605; b=MjQ6QulWpGhJdWEzSFJwVSO3WPGalwqhpKEc7H2zzUcfip+/IR4YCZBwxNFPYNkgRO 2qA2F+h4VBS1bxYkF/W6lvwQAQkA0pXx/DHIYPsZ32e9JZDzRZ9/8/AcSflYt9k9NPxE t8o7W5hQsq8gPPenpgbvLhCzVrZB3IVEDncXdIIeLOIvJvC233Cft/qvFGCyzYgJMnAv CJ5Z2pnWiSojmUbww56zOPdiR4O1H7zbxUdIk6zXw8N8A3xIMvWIb1rRVf/xH1UQR2Yn 3zyeVKoPHKV2supvA2JxSNBffFGLe16NCA4rPDlGtaDVmSSfC+oCCVFJ6X9VrpT+KSfe yK0w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=0MFooVB7xwNzZOKqcVss3UWZ3/7usQLvaGrZ/248a3w=; fh=POL4bNe22uj0I+ewldSXn1VF93AhxwQcj16FMCqysJI=; b=XNbdX7gff7uX8BwuyiOUNCtqfjUMSAzbO7Ym8UCPFSyvEMI2fpMS0otU80O7MqlPJa NBC8bgWAjz2ocVCwCGViHzS9N+QnRqVTKp50OSxWkAT8phR3TBPAWgw1TDckBLNXihUD pF63/1sZu53ItbBIg6XWC96hqS96nXxo3nF9XBj2m4J4Mxi+TtEjap0wLY+uwqYMmXXG iZbLWv8M5k2Z10kGhAnYmVloQgp1Fr3140EPNk4VZo+VmgIfb52HWnv8fFm3H1I3EXan S5SjTUXLR04MaeF2D3yVpr5LfJn7O8RKBVnNjNItaIaGF+D+slA7cCJBT2NdofIqW9jS 8JiA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=boringsql.com; s=google; t=1779354391; x=1779959191; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=0MFooVB7xwNzZOKqcVss3UWZ3/7usQLvaGrZ/248a3w=; b=NlXWtIWgVcDUlRgRw61jHP6ow2XXHjtTNcBilp9jtCB9kmHMsQeSRhsVICWNK1+4g+ ywTQNnCFw4KKdqFXjmj0lbi6iEMBHRusITE+jZsvfn9LRm8/fuXiLZ2Fh48TN8MHRTH3 hPw8DImPXo7VlGGMUlCyMsvPqQ8MNcYtG0CW7arf/QoQRq4JtVRfgYYf0Z5/dEHnUBxV rvBSHWNTcWbo3lRSOVhMAuiFtllpSVdsdtgiwlIu0TpUcA1LfYslvmDF8WDfIrZ50xAa kWqClHXaavoYKFuBZiYvONfUC2yLPNAJQY5iqwHqI2N9iHIa8Lq7QXGUlQveyS7KJAcA hyrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779354391; x=1779959191; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=0MFooVB7xwNzZOKqcVss3UWZ3/7usQLvaGrZ/248a3w=; b=QG+PKXIKRjlmgq94Kp80vM5E/y6pvGcSxoLA+9QH0c/BzjG0Nee4qWiRihceuRntkw 9VgPHAI2LK9TwH9Jy6jkrm2X/VLJXl1HzLOP/wV6ky53vTLmaQintzQVXs92akLpWgFP ga7l8loJ5qsiu8jiHS2uuOaLFUTGDhpTeDct5Pk8IdSrEQA+iymH437+VBAiHq+miomW nDMnsiMiVpAK2sCj/S+fT2NhrPEN6FfyMBn8TzWlIH75GN3XfQNQh1nhqMY7+DpY0OLp e8t0KgnUTiDNABisDd9LM6pAyEjeEMpaLgEvMn6bnss1shBnd40pZYY4E+00u11HE2UY ahbQ== X-Forwarded-Encrypted: i=1; AFNElJ/zt1ONfsvbQ6qvXcnN6kp2kxyv+00ngp+fbDDjyerRJRL6MTLMzACAQSLI9ByQKSxX4regdIftbFd7@lists.postgresql.org X-Gm-Message-State: AOJu0YwlW/zKAPChZke5KEIyx88fnwtTZYEXXup9Wz8mtO8bOZUSKSk0 em5aZbZzeI9WVdjxQgbsd0g/gWFxN7rm6N1n9jM6ByKyiERUFD2pkmcHAf8/7MxV5RS6EemsEvW wuY05siaPhJZZ4izzkcjf6doGHfMCKSYm3LQIMg1izw== X-Gm-Gg: Acq92OFIyhS+bhFVk8Z2Np2QzxT0w7FLk2sArOVMaJncVgKT90KSZPzCKxJ9SsX57H4 jgQ4PJC/Wiu/h/+qrwIkUiikoWDmST8O2ITPKzUSWP17NAyNw93IK4VMRTKHYxcMiYv/3VoWTJJ bxc/uJUtoiBgyvrv09QIA7vkeEuL1wZsMM2nrjDiZGwcgcJi/noGz2JSAUYzKXO7ARWBBoiWJHv 8TqDdgB2+ngsDcO8eY7GUE2B/VivjoFJp9WXtU9vOF3VmEbIlvcIPiuTRm55516FSRryx+ZfLdz 1rGtSJjan+0ja6OdWQx6Sehw3ViWHOVLdTDXTwQaw/ilAR/2G5VD516RU6XqsU+l4T+SB/ijs69 aVdJD/dwxNOJ19lvpK04bWXuwYfb3jIfJMzUXZXbulICZGR0= X-Received: by 2002:a05:6512:964:b0:5a8:80ce:ba55 with SMTP id 2adb3069b0e04-5aa2ba6ab3dmr497119e87.11.1779354390806; Thu, 21 May 2026 02:06:30 -0700 (PDT) MIME-Version: 1.0 References: <19490-9c59c6a583513b99@postgresql.org> <46FE61C9-F273-45FD-BED7-0F8CDA6EB992@yandex-team.ru> <46DB3CAB-EA1C-41A5-9D6D-5F913A2AAF66@yandex-team.ru> In-Reply-To: From: Radim Marek Date: Thu, 21 May 2026 11:06:18 +0200 X-Gm-Features: AVHnY4LbUguZZ_FRzHPwYgI7NnpccrUn-uRErynK6F-rMsPeELncBl3Xe8kh1KU Message-ID: Subject: Re: BUG #19490: Streaming standby on 16.14 stops applying WAL on MultiXactOffsetSLRU when primary is 16.8 To: Andrey Borodin Cc: Marko Tiikkaja , PostgreSQL mailing lists Content-Type: multipart/alternative; boundary="00000000000064d4980652503a71" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000064d4980652503a71 Content-Type: text/plain; charset="UTF-8" Altough the culprit is known, I've got more data as requested. #0 0x00007f20e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x00007f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #3 0x000055a71796e3ca in PGSemaphoreLock (sema=0x7f20de6d0e38) at ./build/src/backend/port/pg_sema.c:327 #4 0x000055a7179f57ed in LWLockAcquire (lock=0x7f20de6d1800, mode=mode@entry=LW_EXCLUSIVE) at ./build/../src/backend/storage/lmgr/lwlock.c:1314 #5 0x000055a71772dfb2 in SimpleLruWriteAll (ctl=ctl@entry=0x55a717e83040 , allow_redirtied=allow_redirtied@entry=false) at ./build/../src/backend/access/transam/slru.c:1174 #6 0x000055a717727b6f in RecordNewMultiXact (multi=79871, offset=218449, nmembers=2, members=members@entry=0x7f20de6831ec) at ./build/../src/backend/access/transam/multixact.c:944 #7 0x000055a71772a983 in multixact_redo (record=0x55a73a8d0fc8) at ./build/../src/backend/access/transam/multixact.c:3464 #8 0x000055a71774d9b8 in ApplyWalRecord (xlogreader=, record=0x7f20de6831b0, replayTLI=) at ./build/../src/backend/access/transam/xlogrecovery.c:1951 #9 PerformWalRecovery () at ./build/../src/backend/access/transam/xlogrecovery.c:1782 #10 0x000055a717740def in StartupXLOG () at ./build/../src/backend/access/transam/xlog.c:5452 #11 0x000055a71797c7e4 in StartupProcessMain () at ./build/../src/backend/postmaster/startup.c:282 #12 0x000055a717972b20 in AuxiliaryProcessMain (auxtype=auxtype@entry=StartupProcess) at ./build/../src/backend/postmaster/auxprocess.c:141 #13 0x000055a717977db3 in StartChildProcess (type=StartupProcess) at ./build/../src/backend/postmaster/postmaster.c:5381 #14 0x000055a71797bfb8 in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x55a73a8d0590) at ./build/../src/backend/postmaster/postmaster.c:1463 #15 0x000055a7176a05bc in main (argc=1, argv=0x55a73a8d0590) at ./build/../src/backend/main/main.c:200 and WAL dump rmgr: Btree len (rec/tot): 64/ 64, tx: 336098, lsn: 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkref #0: rel 1663/16384/16432 blk 536 rmgr: MultiXact len (rec/tot): 54/ 54, tx: 336098, lsn: 1/32DE7630, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers 2: 336089 (keysh) 336098 (keysh) rmgr: Heap len (rec/tot): 54/ 54, tx: 336098, lsn: 1/32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: [IS_MULTI, LOCK_ONLY, KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16418 blk 0 rmgr: Heap len (rec/tot): 72/ 72, tx: 336096, lsn: 1/32DE76A0, prev 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: 52, old_infobits: [], flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16401 blk 22 rmgr: Heap len (rec/tot): 71/ 71, tx: 336096, lsn: 1/32DE76E8, prev 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: 149, old_infobits: [], flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/16399 blk 6 rmgr: Heap len (rec/tot): 79/ 79, tx: 336096, lsn: 1/32DE7730, prev 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: rel 1663/16384/16417 blk 741 rmgr: Heap len (rec/tot): 72/ 72, tx: 336097, lsn: 1/32DE7780, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: 243, old_infobits: [], flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/16384/16401 blk 26 rmgr: Transaction len (rec/tot): 34/ 34, tx: 336096, lsn: 1/32DE77C8, prev 1/32DE7780, desc: COMMIT 2026-05-21 08:43:07.003572 UTC Radim On Thu, 21 May 2026 at 10:34, Radim Marek wrote: > Thank you for the follow-up. In mean-time I can confirm the > commit 77dff5d937b1 might be the source of the original reported issue. > > Unfortunately pinning version down to 16.12 only avoids the > MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery > after 12+ hours. > > FATAL: could not access status of transaction 24958976 DETAIL: Could not > read from file "pg_multixact/offsets/017C" at offset 221184: read too few > bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: > 24958975 offset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 > (keysh) > > We are going to try to pin 16.13 and try that before we can safely upgrade > of the primary/are confident we have working PITR recovery available should > we need it. > > Radim > > PS: Once I have some time I will try to setup a docker based harness to be > able to replicate original problem for later testing of the fix. > > On Thu, 21 May 2026 at 09:25, Andrey Borodin wrote: > >> >> >> > On 21 May 2026, at 00:12, Marko Tiikkaja wrote: >> > >> > #8 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=0x654c8b63e400 >> >> Thanks! >> >> This clearly points to SimpleLruWriteAll() added in 77dff5d937b1. >> If by chance you will have a backtrace of another deadlocking process - >> please post it. >> >> But it's not strictly necessary for analysis, I think we can figure out >> what >> happened from the backtrace you already posted. >> >> >> Best regards, Andrey Borodin. >> > --00000000000064d4980652503a71 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Altough the culprit is known, I'= ve got more data as requested.

#0 =C2=A00x00007f20= e9bdb687 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1 =C2=A00x00007f= 20e9bdbc8c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2 =C2=A00x0000= 7f20e9be6920 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#3 =C2=A00x00= 0055a71796e3ca in PGSemaphoreLock (sema=3D0x7f20de6d0e38) at ./build/src/ba= ckend/port/pg_sema.c:327
#4 =C2=A00x000055a7179f57ed in LWLockAcquire (l= ock=3D0x7f20de6d1800, mode=3Dmode@entry=3DLW_EXCLUSIVE) at ./build/../src/b= ackend/storage/lmgr/lwlock.c:1314
#5 =C2=A00x000055a71772dfb2 in SimpleL= ruWriteAll (ctl=3Dctl@entry=3D0x55a717e83040 <MultiXactOffsetCtlData>= , allow_redirtied=3Dallow_redirtied@entry=3Dfalse) at
./build/../src/bac= kend/access/transam/slru.c:1174
#6 =C2=A00x000055a717727b6f in RecordNew= MultiXact (multi=3D79871, offset=3D218449, nmembers=3D2, members=3Dmembers@= entry=3D0x7f20de6831ec) at
./build/../src/backend/access/transam/multixa= ct.c:944
#7 =C2=A00x000055a71772a983 in multixact_redo (record=3D0x55a73= a8d0fc8) at ./build/../src/backend/access/transam/multixact.c:3464
#8 = =C2=A00x000055a71774d9b8 in ApplyWalRecord (xlogreader=3D<optimized out&= gt;, record=3D0x7f20de6831b0, replayTLI=3D<synthetic pointer>) at
= ./build/../src/backend/access/transam/xlogrecovery.c:1951
#9 =C2=A0Perfo= rmWalRecovery () at ./build/../src/backend/access/transam/xlogrecovery.c:17= 82
#10 0x000055a717740def in StartupXLOG () at ./build/../src/backend/ac= cess/transam/xlog.c:5452
#11 0x000055a71797c7e4 in StartupProcessMain ()= at ./build/../src/backend/postmaster/startup.c:282
#12 0x000055a717972b= 20 in AuxiliaryProcessMain (auxtype=3Dauxtype@entry=3DStartupProcess) at ./= build/../src/backend/postmaster/auxprocess.c:141
#13 0x000055a717977db3 = in StartChildProcess (type=3DStartupProcess) at ./build/../src/backend/post= master/postmaster.c:5381
#14 0x000055a71797bfb8 in PostmasterMain (argc= =3Dargc@entry=3D1, argv=3Dargv@entry=3D0x55a73a8d0590) at ./build/../src/ba= ckend/postmaster/postmaster.c:1463
#15 0x000055a7176a05bc in main (argc= =3D1, argv=3D0x55a73a8d0590) at ./build/../src/backend/main/main.c:200

and WAL dump

rmgr: Btree =C2=A0 =C2=A0 = =C2=A0 len (rec/tot): =C2=A0 =C2=A0 64/ =C2=A0 =C2=A064, tx: =C2=A0 =C2=A0 = 336098, lsn: 1/32DE75F0, prev 1/32DE7580, desc: INSERT_LEAF off: 244, blkre= f #0: rel 1663/16384/16432 blk 536
rmgr: MultiXact =C2=A0 len (rec/tot):= =C2=A0 =C2=A0 54/ =C2=A0 =C2=A054, tx: =C2=A0 =C2=A0 336098, lsn: 1/32DE76= 30, prev 1/32DE75F0, desc: CREATE_ID 79871 offset 218449 nmembers 2: 336089= (keysh)
336098 (keysh)
rmgr: Heap =C2=A0 =C2=A0 =C2=A0 =C2=A0len (re= c/tot): =C2=A0 =C2=A0 54/ =C2=A0 =C2=A054, tx: =C2=A0 =C2=A0 336098, lsn: 1= /32DE7668, prev 1/32DE7630, desc: LOCK xmax: 79871, off: 1, infobits: [IS_M= ULTI, LOCK_ONLY,
KEYSHR_LOCK], flags: 0x00, blkref #0: rel 1663/16384/16= 418 blk 0
rmgr: Heap =C2=A0 =C2=A0 =C2=A0 =C2=A0len (rec/tot): =C2=A0 = =C2=A0 72/ =C2=A0 =C2=A072, tx: =C2=A0 =C2=A0 336096, lsn: 1/32DE76A0, prev= 1/32DE7668, desc: HOT_UPDATE old_xmax: 336096, old_off: 52, old_infobits: = [],
flags: 0x20, new_xmax: 0, new_off: 149, blkref #0: rel 1663/16384/16= 401 blk 22
rmgr: Heap =C2=A0 =C2=A0 =C2=A0 =C2=A0len (rec/tot): =C2=A0 = =C2=A0 71/ =C2=A0 =C2=A071, tx: =C2=A0 =C2=A0 336096, lsn: 1/32DE76E8, prev= 1/32DE76A0, desc: HOT_UPDATE old_xmax: 336096, old_off: 149, old_infobits:= [],
flags: 0x60, new_xmax: 0, new_off: 209, blkref #0: rel 1663/16384/1= 6399 blk 6
rmgr: Heap =C2=A0 =C2=A0 =C2=A0 =C2=A0len (rec/tot): =C2=A0 = =C2=A0 79/ =C2=A0 =C2=A079, tx: =C2=A0 =C2=A0 336096, lsn: 1/32DE7730, prev= 1/32DE76E8, desc: INSERT off: 150, flags: 0x00, blkref #0: rel 1663/16384/= 16417
blk 741
rmgr: Heap =C2=A0 =C2=A0 =C2=A0 =C2=A0len (rec/tot): = =C2=A0 =C2=A0 72/ =C2=A0 =C2=A072, tx: =C2=A0 =C2=A0 336097, lsn: 1/32DE778= 0, prev 1/32DE7730, desc: HOT_UPDATE old_xmax: 336097, old_off: 243, old_in= fobits: [],
flags: 0x20, new_xmax: 0, new_off: 228, blkref #0: rel 1663/= 16384/16401 blk 26
rmgr: Transaction len (rec/tot): =C2=A0 =C2=A0 34/ = =C2=A0 =C2=A034, tx: =C2=A0 =C2=A0 336096, lsn: 1/32DE77C8, prev 1/32DE7780= , desc: COMMIT 2026-05-21 08:43:07.003572 UTC

Radim
=

On Thu, 21 May 2026 at 10:34, Radim Marek <radim@boringsql.com> wrote:
=
Thank you for the follow-up. In mean-time I can confirm the commit=C2=A0= 77dff5d937b1 might be the source of the original reported issue.
=
Unfortunately pinning version down to 16.12 only=C2=A0avoids= the MultiXactOffsetSLRU self-deadlock, but the standby then fails recovery= after 12+ hours.

FATAL: could not access status of transaction = 24958976 DETAIL: Could not read from file "pg_multixact/offsets/017C" a= t offset 221184: read too few bytes. CONTEXT: WAL redo at 14770/873268E8 for MultiXact/CREATE_ID: 24958975 of= fset 61500431 nmembers 2: 3058927188 (fornokeyupd) 3058927189 (keysh)
=

We are = going to try to pin 16.13 and try that before we can safely upgrade of the = primary/are confident we have working PITR recovery available should we nee= d it.

Radim

PS: Once I ha= ve some time I will try to setup a docker based harness to be able to repli= cate original problem for later testing of the fix.

On Thu, 21 May 202= 6 at 09:25, Andrey Borodin <x4mmm@yandex-team.ru> wrote:


> On 21 May 2026, at 00:12, Marko Tiikkaja <marko@joh.to> wrote:
>
> #8=C2=A0 0x0000654c8ae2acba in SimpleLruWriteAll (ctl=3D0x654c8b63e400=

Thanks!

This clearly points to SimpleLruWriteAll() added in 77dff5d937b1.
If by chance you will have a backtrace of another deadlocking process -
please post it.

But it's not strictly necessary for analysis, I think we can figure out= what
happened from the backtrace you already posted.


Best regards, Andrey Borodin.
--00000000000064d4980652503a71--