MIME-Version: 1.0
References: 
 <CAAD30ULCxqOJp0sffm_y9jNC4BVPYv7Q7_va_JE8qyfRXkfu+g@mail.gmail.com>
 <CAFPTHDaaewvYUznZD1YjUQnvHycgvgMKNK4w=V8Q-8MKTpVDrw@mail.gmail.com>
 <OS3PR01MB6275025C397CC22E446B1F589EBC9@OS3PR01MB6275.jpnprd01.prod.outlook.com>
 <OS0PR01MB5716765B2D38786DA3943E0294809@OS0PR01MB5716.jpnprd01.prod.outlook.com>
 <CAFPTHDbAJPccMcZnOraBy14hM6qBJxYqcRwK6iV6=gtL6VZQwQ@mail.gmail.com>
 <CALDaNm3NUO8ofK64N7HMtNmUP=52R8_jWzrekqAm7m7wqZjwaQ@mail.gmail.com>
 <CALDaNm3XUKfD+nD1AVvSuZyUY_zRk_eyz+Pt9t13N8WXViR6pw@mail.gmail.com>
 <3032112.1679865718@sss.pgh.pa.us>
In-Reply-To: <3032112.1679865718@sss.pgh.pa.us>
From: Amit Kapila <amit.kapila16@gmail.com>
Date: Mon, 27 Mar 2023 12:07:55 +0530
Message-ID: 
 <CAA4eK1K3VXfTWXbLADcH81J==7ussvNdqLFHN68sEokDPueu7w@mail.gmail.com>
Subject: Re: Support logical replication of DDLs
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: vignesh C <vignesh21@gmail.com>, Ajin Cherian <itsajin@gmail.com>,
	"houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>,
 "wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>,
	Runqi Tian <runqidev@gmail.com>, Peter Smith <smithpb2250@gmail.com>,
 li jie <ggysxcq@gmail.com>,
	Dilip Kumar <dilipbalaut@gmail.com>,
 Alvaro Herrera <alvherre@alvh.no-ip.org>,
	Masahiko Sawada <sawada.mshk@gmail.com>, Japin Li <japinli@hotmail.com>,
	rajesh singarapu <rajesh.rs0541@gmail.com>,
	PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>,
 Zheng Li <zhengli10@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: 
 <https://www.postgresql.org/message-id/CAA4eK1K3VXfTWXbLADcH81J%3D%3D7ussvNdqLFHN68sEokDPueu7w%40mail.gmail.com>
Precedence: bulk

On Mon, Mar 27, 2023 at 2:52=E2=80=AFAM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> vignesh C <vignesh21@gmail.com> writes:
> > [ YA patch set ]
>
...
>
> I'm also less than sold on the technical details, specifically
> the notion of "let's translate utility parse trees into JSON and
> send that down the wire".  You can probably make that work for now,
> but I wonder if it will be any more robust against cross-version
> changes than just shipping the outfuncs.c representation.  (Perhaps
> it can be made more robust than the raw parse trees, but I see no
> evidence that anyone's thought much about how.)
>

AFAIR, we have discussed this aspect. For example, in email [1] and
other follow-on emails, there is some discussion on the benefits of
using JSON over outfuncs.c. Then also various senior members seem to
be in favor of using JSON format because of the flexibility it brings
[2]. The few points that I could gather from the discussion are as
follows: (a) it is convenient to transform the JSON format, for
example, if one wants to change the schema in the command before
applying it on the downstream node; (b) parse-tree representation
would be less portable across versions as compared to JSON format, say
if the node name or some other field is changed in the parsetree; (c)
a JSON format string would be easier to understand for logical
replication consumers which don't understand the original parsetree;
(d) as mentioned in [1], we sometimes need to transform the command
into multiple sub-commands or filter part of it which I think will be
difficult to achieve with parsetree and outfuncs.c.

> And TBH, I don't think that I quite believe the premise in the
> first place.  The whole point of using logical rather than physical
> replication is that the subscriber installation(s) aren't exactly like
> the publisher.  Given that, how can we expect that automated DDL
> replication is going to do the right thing often enough to be a useful
> tool rather than a disastrous foot-gun?
>

One of the major use cases as mentioned in the initial email was for
online version upgrades. And also, people would be happy to
automatically sync the schema for cases where the logical replication
is set up to get a subset of the data via features like row filters.
Having said that, I agree with you that it is very important to define
the scope of this feature if we want to see it becoming reality.

  The more you expand the scope
> of what gets replicated, the worse that problem becomes --- for
> example, I don't buy for one second that "let's replicate roles"
> is a credible solution for the problems that come from the roles
> not being the same on publisher and subscriber.
>
> I'm not sure how we get from here to a committable and useful feature,
> but I don't think we're close to that yet, and I'm not sure that minor
> iterations on a 2MB patchset will accomplish much.  I'm afraid that
> a whole lot of work is going to end up going down the drain, which
> would be a shame because surely there are use-cases here.
>

I think the idea was to build a POC to see what kind of difficulties
we may face down the road. I also don't think we can get all of this
in one version or rather some of this may not be required at all but
OTOH it gives us a good idea of problems we may need to solve and
allow us to evaluate if the base design is extendable enough.

> I suggest taking a couple of steps back from the minutiae of the
> patch, and spending some hard effort thinking about how the thing
> would be controlled in a useful fashion (that is, a real design for
> the filtering that was mentioned at the very outset), and about the
> security issues, and about how we could get to a committable patch.
>

Agreed. I'll try to summarize the discussion we have till now on this
and share my thoughts on the same in a separate email.

Thanks for paying attention to this work!

[1] - https://www.postgresql.org/message-id/OS0PR01MB571684CBF660D05B63B441=
2C94AB9%40OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] - https://www.postgresql.org/message-id/CA%2BTgmoauXRQ3yDZNGTzXv_m1kdUn=
H1Ww%2BhwKmKUSjtyBh0Em2Q%40mail.gmail.com

--=20
With Regards,
Amit Kapila.