public inbox for pgsql-hackers@postgresql.org
help / color / mirror / Atom feedRe: [PATCH] Release replication slot on error in SQL-callable slot functions
3+ messages / 3 participants
[nested] [flat]
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
@ 2026-05-21 06:49 vignesh C <vignesh21@gmail.com>
2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
0 siblings, 1 reply; 3+ messages in thread
From: vignesh C @ 2026-05-21 06:49 UTC (permalink / raw)
To: Fujii Masao <masao.fujii@gmail.com>; +Cc: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote:
>
> On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
> <satyanarlapuram@gmail.com> wrote:
> >
> > Hi Hackers,
> >
> > SQL-callable replication slot functions acquire a slot (setting
> > the process-global MyReplicationSlot) but can then ERROR before reaching
> > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
> > EXCEPTION block (which uses a subtransaction), MyReplicationSlot remains
> > set because there is no subtransaction-level cleanup hook for replication
> > slots.
> >
> > Any subsequent slot operation in the same session then hits
> > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
> > enabled builds. In release builds the stale MyReplicationSlot is silently overwritten,
> > permanently orphaning the old slot as "active." The orphaned slot blocks any other
> > session from acquiring it, vacuum and WAL deletion.
> >
> > Repro:
> >
> > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
> >
> > DO $$ BEGIN
> > PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
> > EXCEPTION WHEN others THEN
> > RAISE NOTICE 'caught: %', SQLERRM;
> > END $$;
> >
> > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL, NULL);
> >
> > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
> > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c", Line: 638, PID: 1096805
> >
> >
> > Attached a patch to address this by wrapping error-prone paths in PG_TRY/PG_CATCH blocks
> > and call ReplicationSlotRelease().
>
> Thanks for the report and the patch!
>
> I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
> a good direction for addressing the issue you reported.
>
>
> + PG_CATCH();
> + {
> + ReplicationSlotRelease();
>
> When create_logical_replication_slot() is called with temporary = true,
> the created logical replication slot has RS_TEMPORARY persistency. Such a slot
> is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot is
> dropped via ReplicationSlotDropAcquired().
>
> So even with the v1 patch, a temporary logical replication slot can remain
> unexpectedly if pg_create_logical_replication_slot() throws an error.
> In this case, should create_logical_replication_slot() explicitly drop the slot
> with ReplicationSlotDropAcquired(), or temporarily change the slot persistency
> to RS_EPHEMERAL before calling ReplicationSlotRelease()?
>
>
> Does a newly created logical replication slot created by
> pg_copy_logical_replication_slot() have the same issue?
Additionally pg_logical_slot_get_changes also has the same issue, it
can be reproduced by the following:
SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');
DO $$
BEGIN
-- This will ERROR if the slot_get changes fails for the slot.
PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
NULL, 'nonexistent-option', 'val');
EXCEPTION WHEN others THEN
RAISE NOTICE 'caught: %', SQLERRM;
END $$;
SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL, NULL);
TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
Line: 638, PID: 80308
postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
[0x642e7b2ebae1]
postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
[0x642e7b00d732]
Regards,
Vignesh
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <vignesh21@gmail.com>
@ 2026-05-21 14:38 ` SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
2026-05-22 09:15 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions shveta malik <shveta.malik@gmail.com>
0 siblings, 1 reply; 3+ messages in thread
From: SATYANARAYANA NARLAPURAM @ 2026-05-21 14:38 UTC (permalink / raw)
To: vignesh C <vignesh21@gmail.com>; +Cc: Fujii Masao <masao.fujii@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Hi
On Wed, May 20, 2026 at 11:49 PM vignesh C <vignesh21@gmail.com> wrote:
> On Mon, 11 May 2026 at 08:31, Fujii Masao <masao.fujii@gmail.com> wrote:
> >
> > On Sun, May 10, 2026 at 5:45 AM SATYANARAYANA NARLAPURAM
> > <satyanarlapuram@gmail.com> wrote:
> > >
> > > Hi Hackers,
> > >
> > > SQL-callable replication slot functions acquire a slot (setting
> > > the process-global MyReplicationSlot) but can then ERROR before
> reaching
> > > ReplicationSlotRelease(). If such an error is caught by a PL/pgSQL
> > > EXCEPTION block (which uses a subtransaction), MyReplicationSlot
> remains
> > > set because there is no subtransaction-level cleanup hook for
> replication
> > > slots.
> > >
> > > Any subsequent slot operation in the same session then hits
> > > Assert(MyReplicationSlot == NULL) and crashes the backend on assert
> > > enabled builds. In release builds the stale MyReplicationSlot is
> silently overwritten,
> > > permanently orphaning the old slot as "active." The orphaned slot
> blocks any other
> > > session from acquiring it, vacuum and WAL deletion.
> > >
> > > Repro:
> > >
> > > SELECT pg_create_logical_replication_slot('adv_test', 'test_decoding');
> > >
> > > DO $$ BEGIN
> > > PERFORM pg_replication_slot_advance('adv_test', '0/1'::pg_lsn);
> > > EXCEPTION WHEN others THEN
> > > RAISE NOTICE 'caught: %', SQLERRM;
> > > END $$;
> > >
> > > SELECT count(*) FROM pg_logical_slot_get_changes('adv_test', NULL,
> NULL);
> > >
> > > 2026-05-09 19:45:06.619 UTC [1096805] STATEMENT: SELECT
> pg_create_logical_replication_slot('adv_test', 'test_decoding');
> > > TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
> Line: 638, PID: 1096805
> > >
> > >
> > > Attached a patch to address this by wrapping error-prone paths in
> PG_TRY/PG_CATCH blocks
> > > and call ReplicationSlotRelease().
> >
> > Thanks for the report and the patch!
> >
> > I think wrapping the slot-processing code with PG_TRY()/PG_CATCH() seems
> > a good direction for addressing the issue you reported.
> >
> >
> > + PG_CATCH();
> > + {
> > + ReplicationSlotRelease();
> >
> > When create_logical_replication_slot() is called with temporary = true,
> > the created logical replication slot has RS_TEMPORARY persistency. Such
> a slot
> > is not dropped by ReplicationSlotRelease(), whereas an RS_EPHEMERAL slot
> is
> > dropped via ReplicationSlotDropAcquired().
> >
> > So even with the v1 patch, a temporary logical replication slot can
> remain
> > unexpectedly if pg_create_logical_replication_slot() throws an error.
> > In this case, should create_logical_replication_slot() explicitly drop
> the slot
> > with ReplicationSlotDropAcquired(), or temporarily change the slot
> persistency
> > to RS_EPHEMERAL before calling ReplicationSlotRelease()?
> >
> >
> > Does a newly created logical replication slot created by
> > pg_copy_logical_replication_slot() have the same issue?
>
> Additionally pg_logical_slot_get_changes also has the same issue, it
> can be reproduced by the following:
> SELECT pg_create_logical_replication_slot('test_slot_1', 'test_decoding');
>
> DO $$
> BEGIN
> -- This will ERROR if the slot_get changes fails for the slot.
> PERFORM 1 FROM pg_logical_slot_get_changes('test_slot_1', NULL,
> NULL, 'nonexistent-option', 'val');
> EXCEPTION WHEN others THEN
> RAISE NOTICE 'caught: %', SQLERRM;
> END $$;
>
> SELECT count(*) FROM pg_logical_slot_get_changes('test_slot_1', NULL,
> NULL);
>
> TRAP: failed Assert("MyReplicationSlot == NULL"), File: "slot.c",
> Line: 638, PID: 80308
> postgres: vignesh postgres [local] SELECT(ExceptionalCondition+0xba)
> [0x642e7b2ebae1]
> postgres: vignesh postgres [local] SELECT(ReplicationSlotAcquire+0x6e)
> [0x642e7b00d732]
Thank you for letting me know. Fixing these cases in the next update, will
send it shortly.
Thanks,
Satya
>
>
>
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: [PATCH] Release replication slot on error in SQL-callable slot functions
2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <vignesh21@gmail.com>
2026-05-21 14:38 ` Re: [PATCH] Release replication slot on error in SQL-callable slot functions SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
@ 2026-05-22 09:15 ` shveta malik <shveta.malik@gmail.com>
0 siblings, 0 replies; 3+ messages in thread
From: shveta malik @ 2026-05-22 09:15 UTC (permalink / raw)
To: SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>; +Cc: vignesh C <vignesh21@gmail.com>; Fujii Masao <masao.fujii@gmail.com>; PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>; shveta malik <shvetamalik@gmail.com>
Thanks for reporting the issue. I could reproduce the same issue with
all these as well:
pg_logical_slot_peek_changes
pg_logical_slot_get_binary_changes
pg_logical_slot_peek_binary_changes
thanks
Shveta
^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2026-05-22 09:15 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-21 06:49 Re: [PATCH] Release replication slot on error in SQL-callable slot functions vignesh C <vignesh21@gmail.com>
2026-05-21 14:38 ` SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>
2026-05-22 09:15 ` shveta malik <shveta.malik@gmail.com>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox