public inbox for pgsql-hackers@postgresql.org
help / color / mirror / Atom feedFrom: Chao Li <li.evan.chao@gmail.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Cc: Michael Paquier <michael.paquier@gmail.com>
Cc: Xuneng Zhou <xunengzhou@gmail.com>
Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
Date: Fri, 22 May 2026 10:06:33 +0800
Message-ID: <93526C6D-DE0A-4B7D-B908-366735FC211D@gmail.com> (raw)
In-Reply-To: <1B9D2BAF-C73A-4A79-852C-17FB2A474AC0@gmail.com>
References: <EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com>
<agxr29Hsz7FjxzlN@paquier.xyz>
<1F153E64-B791-42FA-A60A-64813B20B81E@gmail.com>
<ag00OeH1sbt5ie_6@paquier.xyz>
<75CDE990-29D5-4D5C-BFE1-3840F19C0163@gmail.com>
<ag4dCGAPBc5VFhCi@paquier.xyz>
<1B695040-F544-447C-A6A8-C8BFF7F799D1@gmail.com>
<E2A0E5FB-E841-4D4B-96F8-B4016359CDF0@gmail.com>
<ag711gjcxuaBqpp8@paquier.xyz>
<1B9D2BAF-C73A-4A79-852C-17FB2A474AC0@gmail.com>
> On May 21, 2026, at 20:29, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On May 21, 2026, at 20:08, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Thu, May 21, 2026 at 03:20:13PM +0800, Chao Li wrote:
>>> I spent more time here, and found that it is still possible to leak
>>> conninfo in the WAL receiver reuse path:
>>>
>>> * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
>>> * Then RequestXLogStreaming() copies raw conninfo into
>>> * walrcv->conninfo and sets the state to WALRCV_RESTARTING.
>>> * WalRcvWaitForStartPosition() then moves the state to
>>> * WALRCV_CONNECTING, but this path does not clear walrcv->conninfo
>>> * again.
>>>
>>> The attached nocfbot_test.diff demonstrates the leak.
>>
>> File is missing, but I get it. This is a legit bug from what I can
>> see, that also affects all the stable branches, not only HEAD.
>>
>>> Initially I thought we could also set ready_to_display to false when
>>> setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(),
>>> and set it back to true when switching back to
>>> WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and
>>> WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
>>
>> Nah, we should not do that. We want to track the waiting and
>> restarting states in the view.
>>
>>> I ended up with a solution that copies the primary connection info
>>> to walrcv->conninfo only when RequestXLogStreaming() is switching to
>>> WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver
>>> keeps using the existing wrconn, so it does not need raw conninfo to
>>> be copied into shared memory again. See the attached
>>> nocfbot_walreceiverfuncs.c.diff.
>>
>> Ah, yeah. This solution to this problem makes sense. We should not
>> clobber conninfo either in this case, or we'd lose the
>> user-displayable string returned by walrcv_get_conninfo() (conninfo
>> cannot be NULL based on the in-core callers of RequestXLogStreaming()
>> AFAIK, but who knows for things out there). As mentioned above, this
>> is a different issue than the visibility of the connection information
>> while we are connecting, and it should be backpatched. Would you like
>> to send a patch?
>> --
>> Michael
>
> Sorry for missing the attachments. Please take a look first. It’s late here, I can spend more time tomorrow.
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
> <nocfbot_test.diff><nocfbot_walreceiverfuncs.c.diff>
Here comes the patch set:
* v3-0001 is the exactly same as v2-0001
* In v3-0002, the change in walreceiverfuncs.c is the same as the previous diff, and I tuned the test change a little bit.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] v3-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch (3.4K, 2-v3-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch)
download | inline diff:
From f0091c46ddfba03cfc7f7a1e4154041694edcda6 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 19 May 2026 22:52:38 +0900
Subject: [PATCH v3 1/2] Improve pg_stat_wal_receiver for CONNECTING status
Commit a36164e7465 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver returned no information while the connection to the
primary was attempted, limiting the usability of the feature in
high-latency environments where the connection attempt to the primary
could take time.
This commit improves the report of the status by splitting the way the
shared memory state of the WAL receiver is filled before and after the
connection to the primary is attempted:
- Before the attempt, reset all the connection fields, switch
ready_to_display to true.
- After the attempt, fill in the connection fields.
This change means two spinlock acquisitions instead of one, but at least
monitoring tools can know about the connection attempt before its
completion, enlarging the usability of the feature.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/XXX
---
src/backend/replication/walreceiver.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce..d19317703c1 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -267,6 +267,20 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
/* Unblock signals (they were blocked when the postmaster forked us) */
sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+ /*
+ * Switch the WAL receiver state as ready for display before doing a
+ * connection attempt, so as its connecting state is visible before
+ * attempting to contact the primary server. Note that this resets the
+ * original conninfo, sender_port and sender_host, for security. These
+ * fields are filled once the connection is fully established.
+ */
+ SpinLockAcquire(&walrcv->mutex);
+ memset(walrcv->conninfo, 0, MAXCONNINFO);
+ memset(walrcv->sender_host, 0, NI_MAXHOST);
+ walrcv->sender_port = 0;
+ walrcv->ready_to_display = true;
+ SpinLockRelease(&walrcv->mutex);
+
/* Establish the connection to the primary for XLOG streaming */
appname = cluster_name[0] ? cluster_name : "walreceiver";
wrconn = walrcv_connect(conninfo, true, false, false, appname, &err);
@@ -277,23 +291,17 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
appname, err)));
/*
- * Save user-visible connection string. This clobbers the original
- * conninfo, for security. Also save host and port of the sender server
- * this walreceiver is connected to.
+ * Save user-visible connection string, now that the connection has been
+ * achieved.
*/
tmp_conninfo = walrcv_get_conninfo(wrconn);
walrcv_get_senderinfo(wrconn, &sender_host, &sender_port);
SpinLockAcquire(&walrcv->mutex);
- memset(walrcv->conninfo, 0, MAXCONNINFO);
if (tmp_conninfo)
strlcpy(walrcv->conninfo, tmp_conninfo, MAXCONNINFO);
-
- memset(walrcv->sender_host, 0, NI_MAXHOST);
if (sender_host)
strlcpy(walrcv->sender_host, sender_host, NI_MAXHOST);
-
walrcv->sender_port = sender_port;
- walrcv->ready_to_display = true;
SpinLockRelease(&walrcv->mutex);
if (tmp_conninfo)
--
2.50.1 (Apple Git-155)
[application/octet-stream] v3-0002-Avoid-exposing-raw-WAL-receiver-conninfo-during-t.patch (3.9K, 3-v3-0002-Avoid-exposing-raw-WAL-receiver-conninfo-during-t.patch)
download | inline diff:
From 4dfab1d904ac6b86a7452d011e86379fba86623b Mon Sep 17 00:00:00 2001
From: "Chao Li (Evan)" <lic@highgo.com>
Date: Fri, 22 May 2026 09:46:16 +0800
Subject: [PATCH v3 2/2] Avoid exposing raw WAL receiver conninfo during
timeline jumps
When reusing an existing WAL receiver after it has reached
WALRCV_WAITING, RequestXLogStreaming() copied PrimaryConnInfo into
WalRcv->conninfo before switching the state to WALRCV_RESTARTING. At
that point ready_to_display could still be true, so pg_stat_wal_receiver
could expose the raw connection string, including sensitive fields.
WALRCV_RESTARTING does not establish a new connection. The waiting WAL
receiver reuses its existing connection and only needs a new startpoint
and timeline, so there is no need to copy the raw connection string into
shared memory again. Only copy conninfo when launching a new WAL receiver
from WALRCV_STOPPED.
Add coverage to the timeline-switch test to verify that the WAL receiver
process remains visible in pg_stat_wal_receiver across the jump, while a
raw password in primary_conninfo is not exposed.
Author: Chao Li <lic@highgo.com>
Reviewed-by:
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
---
src/backend/replication/walreceiverfuncs.c | 9 ++++-----
src/test/recovery/t/004_timeline_switch.pl | 16 ++++++++++++++--
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index a0ed853e2f6..279c6c8a7e1 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -281,11 +281,6 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
Assert(walrcv->walRcvState == WALRCV_STOPPED ||
walrcv->walRcvState == WALRCV_WAITING);
- if (conninfo != NULL)
- strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
- else
- walrcv->conninfo[0] = '\0';
-
/*
* Use configured replication slot if present, and ignore the value of
* create_temp_slot as the slot name should be persistent. Otherwise, use
@@ -307,6 +302,10 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
{
launch = true;
walrcv->walRcvState = WALRCV_STARTING;
+ if (conninfo != NULL)
+ strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
+ else
+ walrcv->conninfo[0] = '\0';
}
else
walrcv->walRcvState = WALRCV_RESTARTING;
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
index 5afd2f44466..eb7cfa9f8e0 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -47,11 +47,15 @@ $node_standby_1->psql(
stdout => \$psql_out);
is($psql_out, 't', "promotion of standby with pg_promote");
-# Switch standby 2 to replay from standby 1
+# Switch standby 2 to replay from standby 1. During the timeline switch,
+# the WAL receiver process on standby 2 should not be stopped, and the
+# new primary connection string should not be visible
+# in pg_stat_wal_receiver.
+my $secret = 'dont_show_me';
my $connstr_1 = $node_standby_1->connstr;
$node_standby_2->append_conf(
'postgresql.conf', qq(
-primary_conninfo='$connstr_1'
+primary_conninfo='$connstr_1 password=$secret'
));
# Rotate logfile before restarting, for the log checks done below.
@@ -93,6 +97,14 @@ my $wr_pid_after_switch = $node_standby_2->safe_psql('postgres',
is($wr_pid_before_switch, $wr_pid_after_switch,
'WAL receiver PID matches across timeline jumps');
+my $raw_conninfo_count = $node_standby_2->safe_psql(
+ 'postgres',
+ "SELECT count(*) FROM pg_stat_wal_receiver WHERE conninfo LIKE '%$secret%'");
+
+is(
+ $raw_conninfo_count, '0',
+ 'raw primary_conninfo password is not visible after timeline jumps');
+
# Ensure that a standby is able to follow a primary on a newer timeline
# when WAL archiving is enabled.
--
2.50.1 (Apple Git-155)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-hackers@postgresql.org
Cc: li.evan.chao@gmail.com, michael@paquier.xyz, michael.paquier@gmail.com, xunengzhou@gmail.com
Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
In-Reply-To: <93526C6D-DE0A-4B7D-B908-366735FC211D@gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox