public inbox for pgsql-hackers@postgresql.org
help / color / mirror / Atom feedFrom: Chao Li <li.evan.chao@gmail.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Cc: Michael Paquier <michael.paquier@gmail.com>
Cc: Xuneng Zhou <xunengzhou@gmail.com>
Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
Date: Thu, 21 May 2026 20:29:35 +0800
Message-ID: <1B9D2BAF-C73A-4A79-852C-17FB2A474AC0@gmail.com> (raw)
In-Reply-To: <ag711gjcxuaBqpp8@paquier.xyz>
References: <EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com>
<agxr29Hsz7FjxzlN@paquier.xyz>
<1F153E64-B791-42FA-A60A-64813B20B81E@gmail.com>
<ag00OeH1sbt5ie_6@paquier.xyz>
<75CDE990-29D5-4D5C-BFE1-3840F19C0163@gmail.com>
<ag4dCGAPBc5VFhCi@paquier.xyz>
<1B695040-F544-447C-A6A8-C8BFF7F799D1@gmail.com>
<E2A0E5FB-E841-4D4B-96F8-B4016359CDF0@gmail.com>
<ag711gjcxuaBqpp8@paquier.xyz>
> On May 21, 2026, at 20:08, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, May 21, 2026 at 03:20:13PM +0800, Chao Li wrote:
>> I spent more time here, and found that it is still possible to leak
>> conninfo in the WAL receiver reuse path:
>>
>> * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
>> * Then RequestXLogStreaming() copies raw conninfo into
>> * walrcv->conninfo and sets the state to WALRCV_RESTARTING.
>> * WalRcvWaitForStartPosition() then moves the state to
>> * WALRCV_CONNECTING, but this path does not clear walrcv->conninfo
>> * again.
>>
>> The attached nocfbot_test.diff demonstrates the leak.
>
> File is missing, but I get it. This is a legit bug from what I can
> see, that also affects all the stable branches, not only HEAD.
>
>> Initially I thought we could also set ready_to_display to false when
>> setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(),
>> and set it back to true when switching back to
>> WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and
>> WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
>
> Nah, we should not do that. We want to track the waiting and
> restarting states in the view.
>
>> I ended up with a solution that copies the primary connection info
>> to walrcv->conninfo only when RequestXLogStreaming() is switching to
>> WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver
>> keeps using the existing wrconn, so it does not need raw conninfo to
>> be copied into shared memory again. See the attached
>> nocfbot_walreceiverfuncs.c.diff.
>
> Ah, yeah. This solution to this problem makes sense. We should not
> clobber conninfo either in this case, or we'd lose the
> user-displayable string returned by walrcv_get_conninfo() (conninfo
> cannot be NULL based on the in-core callers of RequestXLogStreaming()
> AFAIK, but who knows for things out there). As mentioned above, this
> is a different issue than the visibility of the connection information
> while we are connecting, and it should be backpatched. Would you like
> to send a patch?
> --
> Michael
Sorry for missing the attachments. Please take a look first. It’s late here, I can spend more time tomorrow.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] nocfbot_test.diff (2.1K, 2-nocfbot_test.diff)
download | inline diff:
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
index 5afd2f44466..85df9ad422c 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -48,10 +48,11 @@ $node_standby_1->psql(
is($psql_out, 't', "promotion of standby with pg_promote");
# Switch standby 2 to replay from standby 1
+my $secret = 'dont_show_me';
my $connstr_1 = $node_standby_1->connstr;
$node_standby_2->append_conf(
'postgresql.conf', qq(
-primary_conninfo='$connstr_1'
+primary_conninfo='$connstr_1 password=$secret'
));
# Rotate logfile before restarting, for the log checks done below.
@@ -62,9 +63,9 @@ $node_standby_2->restart;
# verify that after reconnection, the walreceiver stays alive during
# the timeline switch.
$node_standby_2->poll_query_until('postgres',
- "SELECT EXISTS(SELECT 1 FROM pg_stat_wal_receiver)");
+ "SELECT EXISTS(SELECT 1 FROM pg_stat_activity WHERE backend_type = 'walreceiver')");
my $wr_pid_before_switch = $node_standby_2->safe_psql('postgres',
- "SELECT pid FROM pg_stat_wal_receiver");
+ "SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
# Insert some data in standby 1 and check its presence in standby 2
# to ensure that the timeline switch has been done.
@@ -88,11 +89,19 @@ ok( !$node_standby_2->log_contains(
# Verify that the walreceiver process stayed alive across the timeline
# switch, check its PID.
my $wr_pid_after_switch = $node_standby_2->safe_psql('postgres',
- "SELECT pid FROM pg_stat_wal_receiver");
+ "SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
is($wr_pid_before_switch, $wr_pid_after_switch,
'WAL receiver PID matches across timeline jumps');
+my $raw_conninfo_count = $node_standby_2->safe_psql(
+ 'postgres',
+ "SELECT count(*) FROM pg_stat_wal_receiver WHERE conninfo LIKE '%$secret%'");
+
+is(
+ $raw_conninfo_count, '0',
+ 'raw primary_conninfo password is not visible after timeline jumps');
+
# Ensure that a standby is able to follow a primary on a newer timeline
# when WAL archiving is enabled.
[application/octet-stream] nocfbot_walreceiverfuncs.c.diff (1.0K, 3-nocfbot_walreceiverfuncs.c.diff)
download | inline diff:
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index a0ed853e2f6..279c6c8a7e1 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -281,11 +281,6 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
Assert(walrcv->walRcvState == WALRCV_STOPPED ||
walrcv->walRcvState == WALRCV_WAITING);
- if (conninfo != NULL)
- strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
- else
- walrcv->conninfo[0] = '\0';
-
/*
* Use configured replication slot if present, and ignore the value of
* create_temp_slot as the slot name should be persistent. Otherwise, use
@@ -307,6 +302,10 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
{
launch = true;
walrcv->walRcvState = WALRCV_STARTING;
+ if (conninfo != NULL)
+ strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
+ else
+ walrcv->conninfo[0] = '\0';
}
else
walrcv->walRcvState = WALRCV_RESTARTING;
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-hackers@postgresql.org
Cc: li.evan.chao@gmail.com, michael@paquier.xyz, michael.paquier@gmail.com, xunengzhou@gmail.com
Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
In-Reply-To: <1B9D2BAF-C73A-4A79-852C-17FB2A474AC0@gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox