public inbox for pgsql-general@postgresql.org  
help / color / mirror / Atom feed
From: Amit Kapila <amit.kapila16@gmail.com>
To: Masahiko Sawada <sawada.mshk@gmail.com>
Cc: Vitaly Davydov <v.davydov@postgrespro.ru>
Cc: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Cc: PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Subject: Re: Support logical replication of DDLs, take2
Date: Tue, 28 Apr 2026 12:02:17 +0530
Message-ID: <CAA4eK1KMDya3brZdgDwKCoLCKM11t=sqP4QNCEbLai_NRKQF=A@mail.gmail.com> (raw)
In-Reply-To: <CAD21AoCzT3sytVbimRNdjRF=N3R-8ddaWKW95EzsdderXqcm4g@mail.gmail.com>
References: <OSZPR01MB63102C42A24D59FACF6D9CD6FD4A9@OSZPR01MB6310.jpnprd01.prod.outlook.com>
	<ZIjmGhPWdoJUfjjE@paquier.xyz>
	<CAJpy0uDaubBHyqPc1k0OysuBYDOVdoUgTWG4jXDCYj-OVSU8hg@mail.gmail.com>
	<CAA4eK1+K8KMsB=+jJO6wDUSt7wF1RiXKtF-HN48nCOEOv-J-3Q@mail.gmail.com>
	<CAJpy0uDLLBYAOzCePYObZ51k1epBU0hef4vbfcujKJprJVsEcQ@mail.gmail.com>
	<CAJpy0uAhLjQZ0Dh0KWDFP8mrnG0rbx99_heavwn8Ke8ZuD-Umg@mail.gmail.com>
	<OS0PR01MB5716A47D23EFAF988475D4A99431A@OS0PR01MB5716.jpnprd01.prod.outlook.com>
	<CAD21AoCXCAQ5QyXu9-xs30ViUHtUxQMmf-818d8GX--5pTmZ7g@mail.gmail.com>
	<OS0PR01MB57163E6487EFF7378CB8E17C9438A@OS0PR01MB5716.jpnprd01.prod.outlook.com>
	<B38B7239-92B0-4212-A8BF-FE506C3B0F66@yandex-team.ru>
	<CAA4eK1JpAcvnfOqF2DQo79pf7Cqp5=3HU5UDwBonWXW4V9ot=w@mail.gmail.com>
	<080f9394-c127-4cef-865e-10f2f997125c@postgrespro.ru>
	<CAD21AoCzjzr5VqVkpVWkN7V6vrFhac=yXSNMz6rkSE+KOPPU0w@mail.gmail.com>
	<CAExHW5ufb1CAcEP1o03M3c=q6AgCzdSLOFaxbC-dJN-k7ymqaA@mail.gmail.com>
	<38690b0e-f91b-46fa-b72a-57775612e463@postgrespro.ru>
	<CAD21AoCTdgHGcEQLwbqXR1E6aC1VyO8wOB4S4U6nU50bZQ+NzQ@mail.gmail.com>
	<CAD21AoCzT3sytVbimRNdjRF=N3R-8ddaWKW95EzsdderXqcm4g@mail.gmail.com>

On Tue, Apr 21, 2026 at 4:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Feb 23, 2026 at 5:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > One idea I'm experimenting with is that we define an abstract data
> > type that can represent a DDL (like CollectedCommand) and write it to
> > a new WAL record so that logical decoding processes it. For CREATE
> > DDLs, we can use pg_get_xxx_def() function while using a historical
> > snapshot to get the DDLs. We would need to implement the codes to
> > generate DROP and ALTER DDLs from the data. I believe DROP DDLs would
> > not be hard. For ALTER DDLs, we would incur the initial implementation
> > costs, but we would not change these codes often.
> >
>
> DDL support for logical replication is one of the biggest missing
> pieces in logical replication. I'd like to resume this work for PG20.
>
> We made a lot of effort on this feature through 2022 and 2023, but the
> development is currently inactive. The last patch was submitted on Jul
> 18, 2023. I've reviewed the previous patches and discussions, and I
> would like to summarize how DDL replication was implemented, the main
> reasons it stalled, and propose an alternative design to address those
> problems.
>
> The overall idea of the previous patch set was to implement DDL
> deparsing and utilize it for DDL replication. It converted a parse
> tree into a JSON string. For instance, if a user executes "DROP TABLE
> t1", the deparser generates from its parse tree:
>
> {DROPSTMT :objects (("t1")) :removeType 41 :behavior 0 :missing_ok
> false :concurrent false}
>
> to:
>
> {"fmt": "DROP TABLE %{objidentity}s", "objidentity": "public.t1"}
>
> This JSON string is self-documenting, meaning someone who gets it can
> easily reconstruct the original DDL with schema-qualified object
> names. In a dedicated event trigger for logical replication, we
> deparsed the parse tree of a DDL, wrote it into a WAL record, and then
> the logical decoding processed it similarly to DML changes.
>
> While there are several benefits to the JSON data approach mentioned
> in the wiki [1] -- most notably the flexibility to easily remap
> schemas (e.g., mapping "schema A" on the publisher to "schema B" on
> the subscriber) -- there was a major concern: the huge maintenance
> burden.
>

Yes, there will be a maintenance cost of JSON-based deparsing
approach. But note that multiple senior people (Alvaro Herrera, Robert
Haas) [1] seems to favor that approach. So, I am not sure we can
conclude to abandon that approach without those people or some other
senior people agreeing to abandon it. To be clear, I am not against
considering a new/different approach for DDL replication but just that
it is not clear that old/existing approach can be ruled out without
more discussion on it,

 We would need to maintain the JSON serialization code whenever
> creating or modifying parse nodes, regardless of whether the changes
> were related to DDL replication. IIUC, this was the primary reason the
> feature didn't cross the finish line.
>
> Additionally, I think there is another design issue: it is not
> output-plugin agnostic. Since the deparsed DDL was written by a
> logical-replication-specific event trigger, third-party logical
> decoding plugins cannot easily detect DDL events.
>

Why RmgrId like RM_LOGICALDDLMSG_ID and XLOG_LOGICAL_DDL_MESSAGE wal
info is not sufficient for this? Decoder will add a message like
REORDER_BUFFER_CHANGE_DDL which can be used to detect DDL message, no?

> Ideally, we should
> write DDL information into a WAL record natively when
> wal_level='logical' (or additionally when a GUC enables DDL events
> WAL-logging) so that all decoding plugins can detect them. This also
> allows us to test DDL logical decoding with test_decoding without
> setting up a full logical replication subscription.
>
> To address these two points, I'd like to propose an alternative
> approach: we introduce a new data type, say DDLCommand, that is
> self-contained to represent a DDL (like CollectedCommand), and don't
> rely on event triggers. It would have the command type (and subtype if
> required), the OIDs of the target object and its namespace, and the
> OID of the user who executed the DDL. We write it to a new WAL record
> at appropriate places during DDL execution, and the logical decoding
> layer passes the data to output plugins. That way, any logical
> decoding plugin can detect DDL changes, and it's up to the plugins how
> to decode the DDL information.
>
> In pgoutput, for CREATE DDLs, we can use the pg_get_xxx_ddl()
> functions while using a historical snapshot to get the DDLs, saving
> maintenance costs. We would still need to implement the code to
> generate DROP and ALTER DDLs from the data. I believe DROP DDLs would
> not be hard. For ALTER DDLs, we would incur an initial implementation
> cost, but we would not need to change this code often. We can
> implement the DDL generation code in a way that improves ddlutils.c.
>
> Also, because DDLCommand is separated from parse nodes, we only need
> to change the DDL deparse/replication code when it is actually needed.
> Additionally, this approach would eliminate the code around the
> two-step process (using DCT_TableDropStart and DCT_TableDropEnd) for
> DROP TABLE. While it would miss the flexibility benefits that the JSON
> deparsing approach has, I guess it would not be very hard to implement
> the mapping in the deparse layer even without the JSON data.
>

Possible but the point was flexibility and ease with which users can
implement mapping with JSON approach.

>
> FYI I've experimented with auto-generation approaches too. For
> instance, gen_node_support.pl generates C code that converts parse
> nodes to the corresponding text representations. Or
> gen_node_support.pl generates C code that makes all objects in the
> given SQL query text fully-schema qualified. While these ideas are
> promising they didn't help reduce the maintenance burden much as the
> parse node definitions are already complex and vary on nodes much.
>

Yeah, this is my recollection of a previous attempt for
auto_generating the deparsing code.

[1]: https://www.postgresql.org/message-id/CA%2BTgmoauXRQ3yDZNGTzXv_m1kdUnH1Ww%2BhwKmKUSjtyBh0Em2Q%40mail...
-- 
With Regards,
Amit Kapila.






reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-general@postgresql.org
  Cc: amit.kapila16@gmail.com, sawada.mshk@gmail.com, v.davydov@postgrespro.ru, ashutosh.bapat.oss@gmail.com, pgsql-hackers@lists.postgresql.org
  Subject: Re: Support logical replication of DDLs, take2
  In-Reply-To: <CAA4eK1KMDya3brZdgDwKCoLCKM11t=sqP4QNCEbLai_NRKQF=A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox