MIME-Version: 1.0
References: 
 <CAAD30ULtoGp8L_GKbV15Wnm+X5r=SE7MOnYHuqBr396m26jJSA@mail.gmail.com>
 <202203162206.7spggyktx63e@alvherre.pgsql>
 <CAAD30UKRUusq8JyyHzAv71=ncN22OE8OkOOyAWvRHW3wXNjyyA@mail.gmail.com>
 <CAAD30UKTp87+kvGZYL3M2Suxq=WEvFUG24ZRT0yT9rqdkP=uMA@mail.gmail.com>
 <MEYP282MB1669863D5C31D7F6A1D996D8B6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM>
 <CAAD30UKc=GiGQzE8H7+Ofo18hwMOfK4qUm_KUyw6c09q4JvA5Q@mail.gmail.com>
 <MEYP282MB16691E383140844437FB0633B6139@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM>
 <CAAD30U+ZTBXLH0wWsW9+Zu2RECGKeaQNynLs7wKA0o86w8C-fw@mail.gmail.com>
 <MEYP282MB166926E46397CBFC113B4A7EB6189@MEYP282MB1669.AUSP282.PROD.OUTLOOK.COM>
 <CAA4eK1J4AekmEKgmfp6e-zZz4M02m7w7uxvC2tjqmjF-LDSGDA@mail.gmail.com>
 <CAAD30UKvv5=k6BY+JAF1fWzrYNbGcB0DEdNi1FMokULzOwSxcQ@mail.gmail.com>
 <CAAD30U+CRgUgkAg33KzNBKwCbsgiSc5z3NYvxNzEfS0Zg2S1WA@mail.gmail.com>
 <CAD21AoAv_wsBEK8jcqjBpatspiP=5E+qLokw9zCESBSvCAiRMg@mail.gmail.com>
 <CAAD30UK6T8bfW1JMaSSRDSynB6W05HjNrmvSp+tvXp-jdu9xFQ@mail.gmail.com>
 <CAA4eK1JQhz4y-1rYxwFxHYEAN-1JKeO0iT+Nip0N7jJUj_g7RA@mail.gmail.com>
 <CAD21AoCnGwx2F+Ph3dpoJVq0YR8ke3P59XCs439pW=BRfdzgTQ@mail.gmail.com>
 <OS0PR01MB571695EDF9EAB2422FBF2C1094DE9@OS0PR01MB5716.jpnprd01.prod.outlook.com>
 <CAAD30ULe-cZTELQJbcAahsOFoUO-ftMxorwBTmj7uYK=_=mwxg@mail.gmail.com>
 <OS0PR01MB571684CBF660D05B63B4412C94AB9@OS0PR01MB5716.jpnprd01.prod.outlook.com>
In-Reply-To: 
 <OS0PR01MB571684CBF660D05B63B4412C94AB9@OS0PR01MB5716.jpnprd01.prod.outlook.com>
From: Zheng Li <zhengli10@gmail.com>
Date: Tue, 14 Jun 2022 20:14:14 -0400
Message-ID: 
 <CAAD30U+oi6e6Vh_zAzhuXzkqUhagmLGD+_iyn2N9w_sNRKsoag@mail.gmail.com>
Subject: Re: Support logical replication of DDLs
To: "houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>
Cc: Masahiko Sawada <sawada.mshk@gmail.com>,
 Amit Kapila <amit.kapila16@gmail.com>,
	Japin Li <japinli@hotmail.com>, Alvaro Herrera <alvherre@alvh.no-ip.org>,
	Dilip Kumar <dilipbalaut@gmail.com>,
 rajesh singarapu <rajesh.rs0541@gmail.com>,
	PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: 
 <https://www.postgresql.org/message-id/CAAD30U%2Boi6e6Vh_zAzhuXzkqUhagmLGD%2B_iyn2N9w_sNRKsoag%40mail.gmail.com>
Precedence: bulk

> Thanks for providing this idea.
>
> I looked at the string that is used for replication:
>
> """
> {ALTERTABLESTMT :relation {RANGEVAR :schemaname public :relname foo
> :inh true :relpersistence p :alias <> :location 12} :cmds ({ALTERTABLECMD
> :subtype 0 :name <> :num 0 :newowner <> :def {COLUMNDEF :colname b
> :typeName {TYPENAME :names ("public" "timestamptz") :typeOid 0 :setof
> false :pct_type false :typmods <> :typemod -1 :arrayBounds <> :location
> 29} :compression <> :inhcount 0 :is_local true :is_not_null false
> :is_from_type false :storage <> :raw_default <> :cooked_default <>
> :identity <> :identitySequence <> :generated <> :collClause <> :collOid 0
> :constraints <> :fdwoptions <> :location 27} :behavior 0 :missing_ok
> false}) :objtype 41 :missing_ok false}
> """
>
> I think the converted parsetree string includes lots of internal
> objects(e.g typeOid/pct_type/objtype/collOid/location/...). These are
> unnecessary stuff for replication and we cannot make sure all the internal
> stuff are consistent among pub/sub. So I am not sure whether replicating
> this string is better.
>
> Besides, replicating the string from nodetostring() means we would need to
> deal with the structure difference between the publisher and the
> subscriber if any related structure has been changed which seems not good.

Yeah, this existing format is not designed to be portable between different
major versions. So it can't directly be used for replication without
serious modification.

> IMO, The advantages of the deparsing approach(as implemented in the POC
> patch set[1]) are:
>
> 1) We can generate a command representation that can be
> parsed/processed/transformed arbitrarily by the subscriber using generic
> rules it(for example: user can easily replace the schema name in it) while
> the results of nodetostring() seems not a standard json string, so I am
> not sure can user reuse it without traversing the parsetree again.
>
> 2) With event_trigger + deparser, we can filter the unpublished objects
> easier. For example: "DROP TABLE table_pub, table_unpub;". We can deparse
> it into two commands "DROP TABLE table_pub" and "DROP TABLE table_pub" and
> only publish the first one.
>
> 3) With deparser, we are able to query the catalog in the deparser to
> build a complete command(filled with schemaname...) which user don't need
> to do any other work for it. We don't need to force the subscriber to set
> the same search_path as the publisher which give user more flexibility.
>
> 4) For CREATE TABLE AS, we can separate out the CREATE TABLE part with the
> help of deparser and event trigger. This can avoid executing the subquery
> on subcriber.
>
> 5) For ALTER TABLE command. We might want to filter out the DDL which use
> volatile function as discussed in [2]. We can achieve this easier by
> extending the deparser to check the functions used. We can even rebuild a
> command without unsupported functions to replicate by using deparser.
>
> There may be more cases I am missing as we are still analyzing other DDLs.

How does the deparser deparses CREATE FUNCTION STATEMENT? Will it schema qualify
objects inside the function definition?

While I agree that the deparser is needed to handle the potential
syntax differences between
the pub/sub, I think it's only relevant for the use cases where only a
subset of tables in the database
are replicated. For other use cases where all tables, functions and
other objects need to be replicated,
(for example, creating a logical replica for major version upgrade)
there won't be any syntax difference to
handle and the schemas are supposed to match exactly between the
pub/sub. In other words the user seeks to create an identical replica
of the source database and the DDLs should be replicated
as is in this case. So I think it's an overkill to use deparser for
such use cases. It also costs more space and
time using deparsing. For example, the following simple ALTER TABLE
command incurs 11 times more space
in the WAL record if we were to use the format from the deparser,
there will also be time and CPU overhead from the deparser.

ALTER TABLE t1 ADD c INT; serach_path: "$user", public
VS
{\"fmt\": \"ALTER TABLE %{identity}D %{subcmds:, }s\", \"subcmds\":
[{\"fmt\": \"ADD COLUMN %{definition}s\", \"type\": \"add column\",
\"definiti
on\": {\"fmt\": \"%{name}I %{coltype}T %{default}s %{not_null}s
%{collation}s\", \"name\": \"c\", \"type\": \"column\", \"coltype\":
{\"typmod\": \"\", \"typarray
\": false, \"typename\": \"int4\", \"schemaname\": \"pg_catalog\"},
\"default\": {\"fmt\": \"DEFAULT %{default}s\", \"present\": false},
\"not_null\": \"\", \"col
lation\": {\"fmt\": \"COLLATE %{name}D\", \"present\": false}}}],
\"identity\": {\"objname\": \"t1\", \"schemaname\": \"public\"}}

So I think it's better to define DDL replication levels [1] to tailor
for the two different use cases. We can use different logging format
based on the DDL replication level. For example,
we can simply log the DDL query string and the search_path for
database level DDL replication. But for table level DDL replication we
need to use the deparser format in order to
handle the potential syntax differences and schema mapping requests.

[1] https://www.postgresql.org/message-id/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com

Thoughts?

With Regards,
Zheng