public inbox for pgsql-docs@postgresql.org  
help / color / mirror / Atom feed
From: Robert Treat <rob@xzilla.net>
To: Bruce Momjian <bruce@momjian.us>
To: Paul A Jungwirth <pj@illuminatedcomputing.com>
To: Laurenz Albe <laurenz.albe@cybertec.at>
Cc: pgsql-docs@lists.postgresql.org
Subject: Re: Streaming Replication vs Logical
Date: Wed, 19 Feb 2025 23:15:38 -0500
Message-ID: <CAJSLCQ1MiUw6S982GuJ+FH6b7=vR68T+RrUMdqYs7Wp+At6E_A@mail.gmail.com> (raw)
In-Reply-To: <Zwqlw6RGs4mCnHz9@momjian.us>
References: <CA+renyULt3VBS1cRFKUfT2=5dr61xBOZdAZ-CqX3XLGXqY-aTQ@mail.gmail.com>
	<2c392993640661b817c5c779f6aaf44c103510bf.camel@cybertec.at>
	<Zwqlw6RGs4mCnHz9@momjian.us>

On Wed, Feb 19, 2025 at 10:07 PM Bruce Momjian <bruce@momjian.us> wrote:

> On Sat, Oct 12, 2024 at 07:01:31AM +0200, Laurenz Albe wrote:
> > On Fri, 2024-10-11 at 15:53 -0700, Paul A Jungwirth wrote:
> > > Our docs seem to contrast "streaming replication" to logical, but
> > > these are not really opposites. Sometimes when they say "streaming"
> > > they mean "physical".
> > >
> > > Probably this is historical: at first physical replication was the
> > > only kind of streaming we had.
> > >
> > > Personally this has caused me a lot of confusion. For example,
> > > recently when I read "Synchronous replication (see Section 26.2.8) is
> > > only supported on replication slots used over the streaming
> > > replication interface," I took it to mean synchronous replication only
> > > worked for physical replication, not logical.
> >
> > What you are saying makes a lot of sense, and improving some of this
> > is a good thing.
> >
> > Our current trminology is a mess.  There are some places in the
> documentation
> > that speak of physical vs. logical replication, while most places use the
> > term "streaming replication" for physical replication.  I myself
> consequently
> > speak of "streaming replication" vs. "logical replication", even though
> both
> > stream data.  The protocol section of the documentation describes the
> > "streaming replication protocol" and the "logical streaming replication
> protocol".
> >
> > This is confusing, and I am also sometimes confused in the way you
> described
> > above.
> >
> > I think the mess is too well established to be really cleaned up.  But
> adding
> > some clarity is a good thing, so +1.
>
>
The attached patch expands on Paul's original patch, further
consolidating around the terms "streaming physical replication" and
"streaming logical replication" in places where it makes sense. I would
note that there are places where "streaming replication" makes sense (when
it applies to both types) and potentially when "physical replication" might
make sense when we could be talking about either streaming or wal shipping,
so I don't think we can completely eliminate that, but hopefully this
improves what we have.


> I don't think our current setup is sustainable so I think it does need
> to be cleaned up.  Also, physical/logical replication slots also needs
> help, I think.
>
>
I took a look through some of the replication slot stuff and ISTM that it
basically gets the streaming logical/physical replication distinctions
correct, and I *think*
it gets the slot distinctions correct as well, but to the degree there
might be some issue there, I think it could be addressed separately.

Robert Treat
https://xzilla.net


Attachments:

  [application/octet-stream] v2-0001-Distinguish-between-streaming-replication-and-phy.patch (10.8K, 3-v2-0001-Distinguish-between-streaming-replication-and-phy.patch)
  download | inline diff:
From 8e80fd6a889544756863a9641b23eae8484ebb4e Mon Sep 17 00:00:00 2001
From: Robert Treat <rob@xzilla.net>
Date: Fri, 24 Jan 2025 00:12:45 -0500
Subject: [PATCH v2] Distinguish between streaming replication and physical
Content-Type: text/plain; charset="utf-8"

Our docs still use the term "streaming replication" in places where they
really mean physical replication. These changes try to clarify the
language around streaming replication, physical replication, and logical
replication. In particular we should avoid suggesting that "streaming"
and "logical" are opposites or alternatives.
---
 doc/src/sgml/config.sgml              | 29 ++++++++++++++-------------
 doc/src/sgml/high-availability.sgml   | 12 +++++++++--
 doc/src/sgml/logical-replication.sgml |  6 +++---
 doc/src/sgml/logicaldecoding.sgml     |  6 +++---
 4 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a782f10998..c4cceb8466 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2026,7 +2026,7 @@ include_dir 'conf.d'
        <para>
         Specifies the maximum amount of memory to be used by logical decoding,
         before some of the decoded changes are written to local disk. This
-        limits the amount of memory used by logical streaming replication
+        limits the amount of memory used by streaming logical replication
         connections. It defaults to 64 megabytes (<literal>64MB</literal>).
         Since each replication connection only uses a single buffer of this size,
         and an installation normally doesn't have many such connections
@@ -3592,7 +3592,7 @@ include_dir 'conf.d'
         difference between the two modes, but when set to <literal>always</literal>
         the WAL archiver is enabled also during archive recovery or standby
         mode. In <literal>always</literal> mode, all files restored from the archive
-        or streamed with streaming replication will be archived (again). See
+        or streamed with streaming physical replication will be archived (again). See
         <xref linkend="continuous-archiving-in-standby"/> for details.
        </para>
        <para>
@@ -3698,7 +3698,7 @@ include_dir 'conf.d'
         full files.  Therefore, it is unwise to use a very short
         <varname>archive_timeout</varname> &mdash; it will bloat your archive
         storage.  <varname>archive_timeout</varname> settings of a minute or so are
-        usually reasonable.  You should consider using streaming replication,
+        usually reasonable.  You should consider using streaming physical replication,
         instead of archiving, if you want data to be copied off the primary
         server more quickly than that.
         If this value is specified without units, it is taken as seconds.
@@ -3723,7 +3723,7 @@ include_dir 'conf.d'
 
     <para>
      This section describes the settings that apply to recovery in general,
-     affecting crash recovery, streaming replication and archive-based
+     affecting crash recovery, streaming physical replication and archive-based
      replication.
     </para>
 
@@ -3833,7 +3833,7 @@ include_dir 'conf.d'
        <para>
         The local shell command to execute to retrieve an archived segment of
         the WAL file series. This parameter is required for archive recovery,
-        but optional for streaming replication.
+        but optional for streaming physical replication.
         Any <literal>%f</literal> in the string is
         replaced by the name of the file to retrieve from the archive,
         and any <literal>%p</literal> is replaced by the copy destination path name
@@ -4259,15 +4259,16 @@ restore_command = 'copy "C:\\server\\archivedir\\%f" "%p"'  # Windows
     <title>Replication</title>
 
     <para>
-     These settings control the behavior of the built-in
-     <firstterm>streaming replication</firstterm> feature (see
-     <xref linkend="streaming-replication"/>), and the built-in
-     <firstterm>logical replication</firstterm> feature (see
+     These settings control the behavior of
+     <firstterm>streaming replication</firstterm>,
+     both <firstterm>physical replication</firstterm>
+     (see <xref linkend="streaming-replication"/>) and
+     <firstterm>logical replication</firstterm> (see
      <xref linkend="logical-replication"/>).
     </para>
 
     <para>
-     For <emphasis>streaming replication</emphasis>, servers will be either a
+     For <emphasis>physical replication</emphasis>, servers will be either a
      primary or a standby server.  Primaries can send data, while standbys
      are always receivers of replicated data.  When cascading replication
      (see <xref linkend="cascading-replication"/>) is used, standby servers
@@ -4664,7 +4665,7 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       These settings control the behavior of a
       <link linkend="standby-server-operation">standby server</link>
       that is
-      to receive replication data.  Their values on the primary server
+      to receive physical replication data.  Their values on the primary server
       are irrelevant.
      </para>
 
@@ -4802,7 +4803,7 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         conflict with about-to-be-applied WAL entries, as described in
         <xref linkend="hot-standby-conflict"/>.
         <varname>max_standby_streaming_delay</varname> applies when WAL data is
-        being received via streaming replication.
+        being received via streaming physical replication.
         If this value is specified without units, it is taken as milliseconds.
         The default is 30 seconds.
         A value of -1 allows the standby to wait forever for conflicting
@@ -4931,7 +4932,7 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
       <listitem>
        <para>
         Specifies how long the standby server should wait when WAL data is not
-        available from any sources (streaming replication,
+        available from any sources (streaming physical replication,
         local <filename>pg_wal</filename> or WAL archive) before trying
         again to retrieve WAL data.
         If this value is specified without units, it is taken as milliseconds.
@@ -5008,7 +5009,7 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
         <filename>pg_wal</filename> directory.
        </para>
        <para>
-        This parameter is intended for use with streaming replication deployments;
+        This parameter is intended for use with streaming physical replication deployments;
         however, if the parameter is specified it will be honored in all cases
         except crash recovery.
 
diff --git a/doc/src/sgml/high-availability.sgml b/doc/src/sgml/high-availability.sgml
index acf3ac0601..331f90e452 100644
--- a/doc/src/sgml/high-availability.sgml
+++ b/doc/src/sgml/high-availability.sgml
@@ -151,7 +151,7 @@ protocol to make nodes agree on a serializable transactional order.
     </para>
     <para>
      A standby server can be implemented using file-based log shipping
-     (<xref linkend="warm-standby"/>) or streaming replication (see
+     (<xref linkend="warm-standby"/>) or streaming physical replication (see
      <xref linkend="streaming-replication"/>), or a combination of both. For
      information on hot standby, see <xref linkend="hot-standby"/>.
     </para>
@@ -628,7 +628,7 @@ protocol to make nodes agree on a serializable transactional order.
     In standby mode, the server continuously applies WAL received from the
     primary server. The standby server can read WAL from a WAL archive
     (see <xref linkend="guc-restore-command"/>) or directly from the primary
-    over a TCP connection (streaming replication). The standby server will
+    over a TCP connection (streaming physical replication). The standby server will
     also attempt to restore any WAL found in the standby cluster's
     <filename>pg_wal</filename> directory. That typically happens after a server
     restart, when the standby replays again WAL that was streamed from the
@@ -772,6 +772,14 @@ archive_cleanup_command = 'pg_archivecleanup /path/to/archive %r'
     generated, without waiting for the WAL file to be filled.
    </para>
 
+   <note>
+    <para>
+     This discussion of streaming replication assumes physical replication.
+     Although you could treat a logical replication subscriber as a warm standby,
+     it would require some differences to what is described here.
+    </para>
+   </note>
+
    <para>
     Streaming replication is asynchronous by default
     (see <xref linkend="synchronous-replication"/>), in which case there is
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index ab683cf111..b0361fc061 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -6,7 +6,7 @@
  <para>
   Logical replication is a method of replicating data objects and their
   changes, based upon their replication identity (usually a primary key).  We
-  use the term logical in contrast to physical replication, which uses exact
+  use the term logical replication in contrast to physical replication, which uses exact
   block addresses and byte-by-byte replication.  PostgreSQL supports both
   mechanisms concurrently, see <xref linkend="high-availability"/>.  Logical
   replication allows fine-grained control over both data replication and
@@ -2057,8 +2057,8 @@ CONTEXT:  processing remote data for replication origin "pg_16395" during "INSER
   <title>Monitoring</title>
 
   <para>
-   Because logical replication is based on a similar architecture as
-   <link linkend="streaming-replication">physical streaming replication</link>,
+   Because streaming logical replication is based on a similar architecture as
+   <link linkend="streaming-replication">streaming physical replication</link>,
    the monitoring on a publication node is similar to monitoring of a
    physical replication primary
    (see <xref linkend="streaming-replication-monitoring"/>).
diff --git a/doc/src/sgml/logicaldecoding.sgml b/doc/src/sgml/logicaldecoding.sgml
index 1c4ae38f1b..706f0ea6fb 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -275,9 +275,9 @@ postgres=# select * from pg_logical_slot_get_changes('regression_slot', NULL, NU
     </para>
 
     <note>
-     <para><productname>PostgreSQL</productname> also has streaming replication slots
-     (see <xref linkend="streaming-replication"/>), but they are used somewhat
-     differently there.
+     <para><productname>PostgreSQL</productname> can also use streaming replication slots
+     to maintain a standby server (see <xref linkend="streaming-replication"/>), but
+     typically those use physical replication, not logical.
      </para>
     </note>
 
-- 
2.24.3 (Apple Git-128)



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-docs@postgresql.org
  Cc: rob@xzilla.net, bruce@momjian.us, pj@illuminatedcomputing.com, laurenz.albe@cybertec.at, pgsql-docs@lists.postgresql.org
  Subject: Re: Streaming Replication vs Logical
  In-Reply-To: <CAJSLCQ1MiUw6S982GuJ+FH6b7=vR68T+RrUMdqYs7Wp+At6E_A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox