public inbox for pgsql-hackers@postgresql.org  
help / color / mirror / Atom feed
From: Chao Li <li.evan.chao@gmail.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Cc: Michael Paquier <michael.paquier@gmail.com>
Cc: Xuneng Zhou <xunengzhou@gmail.com>
Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
Date: Thu, 21 May 2026 20:29:35 +0800
Message-ID: <1B9D2BAF-C73A-4A79-852C-17FB2A474AC0@gmail.com> (raw)
In-Reply-To: <ag711gjcxuaBqpp8@paquier.xyz>
References: <EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com>
	<agxr29Hsz7FjxzlN@paquier.xyz>
	<1F153E64-B791-42FA-A60A-64813B20B81E@gmail.com>
	<ag00OeH1sbt5ie_6@paquier.xyz>
	<75CDE990-29D5-4D5C-BFE1-3840F19C0163@gmail.com>
	<ag4dCGAPBc5VFhCi@paquier.xyz>
	<1B695040-F544-447C-A6A8-C8BFF7F799D1@gmail.com>
	<E2A0E5FB-E841-4D4B-96F8-B4016359CDF0@gmail.com>
	<ag711gjcxuaBqpp8@paquier.xyz>



> On May 21, 2026, at 20:08, Michael Paquier <michael@paquier.xyz> wrote:
> 
> On Thu, May 21, 2026 at 03:20:13PM +0800, Chao Li wrote:
>> I spent more time here, and found that it is still possible to leak
>> conninfo in the WAL receiver reuse path:
>> 
>> * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
>> * Then RequestXLogStreaming() copies raw conninfo into
>> * walrcv->conninfo and sets the state to WALRCV_RESTARTING.
>> * WalRcvWaitForStartPosition() then moves the state to
>> * WALRCV_CONNECTING, but this path does not clear walrcv->conninfo
>> * again.
>> 
>> The attached nocfbot_test.diff demonstrates the leak.
> 
> File is missing, but I get it.  This is a legit bug from what I can
> see, that also affects all the stable branches, not only HEAD.
> 
>> Initially I thought we could also set ready_to_display to false when
>> setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(),
>> and set it back to true when switching back to
>> WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and
>> WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
> 
> Nah, we should not do that.  We want to track the waiting and
> restarting states in the view.
> 
>> I ended up with a solution that copies the primary connection info
>> to walrcv->conninfo only when RequestXLogStreaming() is switching to
>> WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver
>> keeps using the existing wrconn, so it does not need raw conninfo to
>> be copied into shared memory again. See the attached
>> nocfbot_walreceiverfuncs.c.diff.
> 
> Ah, yeah.  This solution to this problem makes sense.  We should not
> clobber conninfo either in this case, or we'd lose the
> user-displayable string returned by walrcv_get_conninfo() (conninfo
> cannot be NULL based on the in-core callers of RequestXLogStreaming()
> AFAIK, but who knows for things out there).  As mentioned above, this
> is a different issue than the visibility of the connection information
> while we are connecting, and it should be backpatched.  Would you like
> to send a patch?
> --
> Michael

Sorry for missing the attachments. Please take a look first. It’s late here, I can spend more time tomorrow.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Attachments:

  [application/octet-stream] nocfbot_test.diff (2.1K, 2-nocfbot_test.diff)
  download | inline diff:
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
index 5afd2f44466..85df9ad422c 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -48,10 +48,11 @@ $node_standby_1->psql(
 is($psql_out, 't', "promotion of standby with pg_promote");
 
 # Switch standby 2 to replay from standby 1
+my $secret = 'dont_show_me';
 my $connstr_1 = $node_standby_1->connstr;
 $node_standby_2->append_conf(
 	'postgresql.conf', qq(
-primary_conninfo='$connstr_1'
+primary_conninfo='$connstr_1 password=$secret'
 ));
 
 # Rotate logfile before restarting, for the log checks done below.
@@ -62,9 +63,9 @@ $node_standby_2->restart;
 # verify that after reconnection, the walreceiver stays alive during
 # the timeline switch.
 $node_standby_2->poll_query_until('postgres',
-	"SELECT EXISTS(SELECT 1 FROM pg_stat_wal_receiver)");
+	"SELECT EXISTS(SELECT 1 FROM pg_stat_activity WHERE backend_type = 'walreceiver')");
 my $wr_pid_before_switch = $node_standby_2->safe_psql('postgres',
-	"SELECT pid FROM pg_stat_wal_receiver");
+	"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
 
 # Insert some data in standby 1 and check its presence in standby 2
 # to ensure that the timeline switch has been done.
@@ -88,11 +89,19 @@ ok( !$node_standby_2->log_contains(
 # Verify that the walreceiver process stayed alive across the timeline
 # switch, check its PID.
 my $wr_pid_after_switch = $node_standby_2->safe_psql('postgres',
-	"SELECT pid FROM pg_stat_wal_receiver");
+	"SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
 
 is($wr_pid_before_switch, $wr_pid_after_switch,
 	'WAL receiver PID matches across timeline jumps');
 
+my $raw_conninfo_count = $node_standby_2->safe_psql(
+	'postgres',
+	"SELECT count(*) FROM pg_stat_wal_receiver WHERE conninfo LIKE '%$secret%'");
+
+is(
+	$raw_conninfo_count, '0',
+	'raw primary_conninfo password is not visible after timeline jumps');
+
 # Ensure that a standby is able to follow a primary on a newer timeline
 # when WAL archiving is enabled.
 


  [application/octet-stream] nocfbot_walreceiverfuncs.c.diff (1.0K, 3-nocfbot_walreceiverfuncs.c.diff)
  download | inline diff:
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index a0ed853e2f6..279c6c8a7e1 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -281,11 +281,6 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
 	Assert(walrcv->walRcvState == WALRCV_STOPPED ||
 		   walrcv->walRcvState == WALRCV_WAITING);
 
-	if (conninfo != NULL)
-		strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
-	else
-		walrcv->conninfo[0] = '\0';
-
 	/*
 	 * Use configured replication slot if present, and ignore the value of
 	 * create_temp_slot as the slot name should be persistent.  Otherwise, use
@@ -307,6 +302,10 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
 	{
 		launch = true;
 		walrcv->walRcvState = WALRCV_STARTING;
+		if (conninfo != NULL)
+			strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
+		else
+			walrcv->conninfo[0] = '\0';
 	}
 	else
 		walrcv->walRcvState = WALRCV_RESTARTING;


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-hackers@postgresql.org
  Cc: li.evan.chao@gmail.com, michael@paquier.xyz, michael.paquier@gmail.com, xunengzhou@gmail.com
  Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
  In-Reply-To: <1B9D2BAF-C73A-4A79-852C-17FB2A474AC0@gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox