public inbox for pgsql-hackers@postgresql.org
help / color / mirror / Atom feedFix pg_stat_wal_receiver to show CONNECTING status
11+ messages / 2 participants
[nested] [flat]
* Fix pg_stat_wal_receiver to show CONNECTING status
@ 2026-05-19 05:55 Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
0 siblings, 1 reply; 11+ messages in thread
From: Chao Li @ 2026-05-19 05:55 UTC (permalink / raw)
To: pgsql-hackers; +Cc: Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
Hi,
I just tested "Add WALRCV_CONNECTING state to the WAL receiver” and found an issue.
Commit a36164e74 added the feature, and the commit message says:
```
...
the WAL receiver is ready to stream changes. This change is useful for
monitoring purposes, especially in environments with a high latency
where a connection could take some time to be established, giving some
room between the [re]start phase and the streaming activity.
```
However, I failed to see the CONNECTING status. To simulate a high-latency primary connection, I shut down the real primary server and created a fake socket server:
```
chaol@ChaodeMacBook-Air ~ % perl -MIO::Socket::INET -e '
$s = IO::Socket::INET->new(
LocalAddr => "127.0.0.1",
LocalPort => 5432,
Listen => 1,
ReuseAddr => 1
) or die $!;
$c = $s->accept;
sleep 600;
'
```
Then pg_stat_wal_receiver only shows an empty result:
```
evantest=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo
-----+--------+-------------------+-------------------+-------------+-------------+--------------+--------------------+-----------------------+----------------+-----------------+-----------+-------------+-------------+----------
(0 rows)
```
I also tried restarting the standby server, and the result was the same.
The problem is that pg_stat_wal_receiver is gated by WalRcv->ready_to_display, and when the status is CONNECTING, WalRcv->ready_to_display is false.
Given that the original commit message explicitly mentions “monitoring purposes”, I think hiding this status during the connecting phase is a bug. I tried to fix it by showing only the PID and CONNECTING status when WalRcv->ready_to_display is false, like this:
```
evantest=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo
------+------------+-------------------+-------------------+-------------+-------------+--------------+--------------------+-----------------------+----------------+-----------------+-----------+-------------+-------------+----------
3256 | connecting | | | | | | | | | | | | |
(1 row)
```
See the attached patch for details.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] v1-0001-Fix-pg_stat_wal_receiver-to-show-CONNECTING-statu.patch (2.8K, 2-v1-0001-Fix-pg_stat_wal_receiver-to-show-CONNECTING-statu.patch)
download | inline diff:
From 817d3c881dd6d3912635c89d7aa24407576a85d8 Mon Sep 17 00:00:00 2001
From: "Chao Li (Evan)" <lic@highgo.com>
Date: Tue, 19 May 2026 13:32:15 +0800
Subject: [PATCH v1] Fix pg_stat_wal_receiver to show CONNECTING status
Commit a36164e7465 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver still returned no row while ready_to_display was
false. That made the new status invisible during the connection setup
phase.
Allow pg_stat_wal_receiver to show the WAL receiver PID and status once
they have been advertised, even before connection details are ready to
display. Keep the remaining fields NULL until conninfo has been
obfuscated.
Author: Chao Li <lic@highgo.com>
---
src/backend/replication/walreceiver.c | 36 +++++++++++++++++----------
1 file changed, 23 insertions(+), 13 deletions(-)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce..78d948adc49 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1474,21 +1474,10 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
strlcpy(conninfo, WalRcv->conninfo, sizeof(conninfo));
SpinLockRelease(&WalRcv->mutex);
- /*
- * No WAL receiver (or not ready yet), just return a tuple with NULL
- * values
- */
- if (pid == 0 || !ready_to_display)
+ /* No WAL receiver, just return a tuple with NULL values */
+ if (pid == 0)
PG_RETURN_NULL();
- /*
- * Read "writtenUpto" without holding a spinlock. Note that it may not be
- * consistent with the other shared variables of the WAL receiver
- * protected by a spinlock, but this should not be used for data integrity
- * checks.
- */
- written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
-
/* determine result type */
if (get_call_result_type(fcinfo, NULL, &tupdesc) != TYPEFUNC_COMPOSITE)
elog(ERROR, "return type must be a row type");
@@ -1512,6 +1501,27 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
{
values[1] = CStringGetTextDatum(WalRcvGetStateString(state));
+ /*
+ * The WAL receiver advertises its PID and state before connection
+ * details are safe to display. Show the state, but keep all other
+ * details hidden until conninfo has been obfuscated.
+ */
+ if (!ready_to_display)
+ {
+ memset(&nulls[2], true, sizeof(bool) * (tupdesc->natts - 2));
+
+ PG_RETURN_DATUM(HeapTupleGetDatum(heap_form_tuple(tupdesc, values,
+ nulls)));
+ }
+
+ /*
+ * Read "writtenUpto" without holding a spinlock. Note that it may
+ * not be consistent with the other shared variables of the WAL
+ * receiver protected by a spinlock, but this should not be used for
+ * data integrity checks.
+ */
+ written_lsn = pg_atomic_read_u64(&WalRcv->writtenUpto);
+
if (!XLogRecPtrIsValid(receive_start_lsn))
nulls[2] = true;
else
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
@ 2026-05-19 13:55 ` Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Michael Paquier @ 2026-05-19 13:55 UTC (permalink / raw)
To: Chao Li <li.evan.chao@gmail.com>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
On Tue, May 19, 2026 at 01:55:14PM +0800, Chao Li wrote:
> I also tried restarting the standby server, and the result was the same.
>
> The problem is that pg_stat_wal_receiver is gated by
> WalRcv->ready_to_display, and when the status is CONNECTING,
> WalRcv->ready_to_display is false.
Initially, I was thinking that the walrcv_connect() delay would not be
that important to track in this context, but you are right that this
stands for improvement before the release.
@@ -1474,21 +1474,10 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
- if (pid == 0 || !ready_to_display)
+ /* No WAL receiver, just return a tuple with NULL values */
+ if (pid == 0)
PG_RETURN_NULL();
This suggestion is making the SQL function call feebler, IMO,
impacting the readability around ready_to_display that we want to act
as a gate to the data provided in the view. This flag is important to
check at an early state of the function call, and I don't really want
to change that. A better thing to do would be to split into two steps
how the WAL receiver data is filled between the walrcv_connect() call:
1) Before the call, reset all the connection-related fields because
they are not relevant before the connection to the remote is
completed, set ready_for_display to true to make the connecting state
visible in the view. The connection information does not matter
anyway here: we cannot be sure which point we are connected to until
the connection is fully established.
2) After the call, fill in the connection-related fields.
This means taking twice the WAL receiver spinlock instead of once,
which is not going to matter in practice as the latency of the
connection attempt is much larger than that.
What do you think about the attached, then?
--
Michael
From 3c381a90b1270fdd3f1b01e8eefb85f1ac4af3d8 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 19 May 2026 22:52:38 +0900
Subject: [PATCH v2] Improve pg_stat_wal_receiver for CONNECTING status
Commit a36164e7465 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver returned no information while the connection to the
primary was attempted, limiting the usability of the feature in
high-latency environments where the connection attempt to the primary
could take time.
This commit improves the report of the status by splitting the way the
shared memory state of the WAL receiver is filled before and after the
connection to the primary is attempted:
- Before the attempt, reset all the connection fields, switch
ready_to_display to true.
- After the attempt, fill in the connection fields.
This change means two spinlock acquisitions instead of one, but at least
monitoring tools can know about the connection attempt before its
completion, enlarging the usability of the feature.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/XXX
---
src/backend/replication/walreceiver.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce4..d19317703c1f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -267,6 +267,20 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
/* Unblock signals (they were blocked when the postmaster forked us) */
sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+ /*
+ * Switch the WAL receiver state as ready for display before doing a
+ * connection attempt, so as its connecting state is visible before
+ * attempting to contact the primary server. Note that this resets the
+ * original conninfo, sender_port and sender_host, for security. These
+ * fields are filled once the connection is fully established.
+ */
+ SpinLockAcquire(&walrcv->mutex);
+ memset(walrcv->conninfo, 0, MAXCONNINFO);
+ memset(walrcv->sender_host, 0, NI_MAXHOST);
+ walrcv->sender_port = 0;
+ walrcv->ready_to_display = true;
+ SpinLockRelease(&walrcv->mutex);
+
/* Establish the connection to the primary for XLOG streaming */
appname = cluster_name[0] ? cluster_name : "walreceiver";
wrconn = walrcv_connect(conninfo, true, false, false, appname, &err);
@@ -277,23 +291,17 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
appname, err)));
/*
- * Save user-visible connection string. This clobbers the original
- * conninfo, for security. Also save host and port of the sender server
- * this walreceiver is connected to.
+ * Save user-visible connection string, now that the connection has been
+ * achieved.
*/
tmp_conninfo = walrcv_get_conninfo(wrconn);
walrcv_get_senderinfo(wrconn, &sender_host, &sender_port);
SpinLockAcquire(&walrcv->mutex);
- memset(walrcv->conninfo, 0, MAXCONNINFO);
if (tmp_conninfo)
strlcpy(walrcv->conninfo, tmp_conninfo, MAXCONNINFO);
-
- memset(walrcv->sender_host, 0, NI_MAXHOST);
if (sender_host)
strlcpy(walrcv->sender_host, sender_host, NI_MAXHOST);
-
walrcv->sender_port = sender_port;
- walrcv->ready_to_display = true;
SpinLockRelease(&walrcv->mutex);
if (tmp_conninfo)
--
2.54.0
Attachments:
[text/plain] v2-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch (3.4K, 2-v2-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch)
download | inline diff:
From 3c381a90b1270fdd3f1b01e8eefb85f1ac4af3d8 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 19 May 2026 22:52:38 +0900
Subject: [PATCH v2] Improve pg_stat_wal_receiver for CONNECTING status
Commit a36164e7465 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver returned no information while the connection to the
primary was attempted, limiting the usability of the feature in
high-latency environments where the connection attempt to the primary
could take time.
This commit improves the report of the status by splitting the way the
shared memory state of the WAL receiver is filled before and after the
connection to the primary is attempted:
- Before the attempt, reset all the connection fields, switch
ready_to_display to true.
- After the attempt, fill in the connection fields.
This change means two spinlock acquisitions instead of one, but at least
monitoring tools can know about the connection attempt before its
completion, enlarging the usability of the feature.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/XXX
---
src/backend/replication/walreceiver.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce4..d19317703c1f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -267,6 +267,20 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
/* Unblock signals (they were blocked when the postmaster forked us) */
sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+ /*
+ * Switch the WAL receiver state as ready for display before doing a
+ * connection attempt, so as its connecting state is visible before
+ * attempting to contact the primary server. Note that this resets the
+ * original conninfo, sender_port and sender_host, for security. These
+ * fields are filled once the connection is fully established.
+ */
+ SpinLockAcquire(&walrcv->mutex);
+ memset(walrcv->conninfo, 0, MAXCONNINFO);
+ memset(walrcv->sender_host, 0, NI_MAXHOST);
+ walrcv->sender_port = 0;
+ walrcv->ready_to_display = true;
+ SpinLockRelease(&walrcv->mutex);
+
/* Establish the connection to the primary for XLOG streaming */
appname = cluster_name[0] ? cluster_name : "walreceiver";
wrconn = walrcv_connect(conninfo, true, false, false, appname, &err);
@@ -277,23 +291,17 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
appname, err)));
/*
- * Save user-visible connection string. This clobbers the original
- * conninfo, for security. Also save host and port of the sender server
- * this walreceiver is connected to.
+ * Save user-visible connection string, now that the connection has been
+ * achieved.
*/
tmp_conninfo = walrcv_get_conninfo(wrconn);
walrcv_get_senderinfo(wrconn, &sender_host, &sender_port);
SpinLockAcquire(&walrcv->mutex);
- memset(walrcv->conninfo, 0, MAXCONNINFO);
if (tmp_conninfo)
strlcpy(walrcv->conninfo, tmp_conninfo, MAXCONNINFO);
-
- memset(walrcv->sender_host, 0, NI_MAXHOST);
if (sender_host)
strlcpy(walrcv->sender_host, sender_host, NI_MAXHOST);
-
walrcv->sender_port = sender_port;
- walrcv->ready_to_display = true;
SpinLockRelease(&walrcv->mutex);
if (tmp_conninfo)
--
2.54.0
[application/pgp-signature] signature.asc (833B, 3-signature.asc)
download
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
@ 2026-05-20 01:47 ` Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
0 siblings, 1 reply; 11+ messages in thread
From: Chao Li @ 2026-05-20 01:47 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
> On May 19, 2026, at 21:55, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, May 19, 2026 at 01:55:14PM +0800, Chao Li wrote:
>> I also tried restarting the standby server, and the result was the same.
>>
>> The problem is that pg_stat_wal_receiver is gated by
>> WalRcv->ready_to_display, and when the status is CONNECTING,
>> WalRcv->ready_to_display is false.
>
> Initially, I was thinking that the walrcv_connect() delay would not be
> that important to track in this context, but you are right that this
> stands for improvement before the release.
>
> @@ -1474,21 +1474,10 @@ pg_stat_get_wal_receiver(PG_FUNCTION_ARGS)
> - if (pid == 0 || !ready_to_display)
> + /* No WAL receiver, just return a tuple with NULL values */
> + if (pid == 0)
> PG_RETURN_NULL();
>
> This suggestion is making the SQL function call feebler, IMO,
> impacting the readability around ready_to_display that we want to act
> as a gate to the data provided in the view. This flag is important to
> check at an early state of the function call, and I don't really want
> to change that. A better thing to do would be to split into two steps
> how the WAL receiver data is filled between the walrcv_connect() call:
> 1) Before the call, reset all the connection-related fields because
> they are not relevant before the connection to the remote is
> completed, set ready_for_display to true to make the connecting state
> visible in the view. The connection information does not matter
> anyway here: we cannot be sure which point we are connected to until
> the connection is fully established.
> 2) After the call, fill in the connection-related fields.
>
> This means taking twice the WAL receiver spinlock instead of once,
> which is not going to matter in practice as the latency of the
> connection attempt is much larger than that.
>
> What do you think about the attached, then?
> --
> Michael
> <v2-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch>
Hi Micheal,
Thanks for your patch.
I just read v2, and it is actually the first solution I tried. The reason I gave up on that approach and switched to the implementation in v1 is that it may wrongly report last_msg_send_time, last_msg_receipt_time, and latest_end_time. See my test with v2:
```
evantest=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo
-------+------------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------+-----------+-------------+-------------+----------
83930 | connecting | 0/03000000 | 1 | 0/03000000 | 0/03000000 | 1 | 2026-05-20 09:24:09.121679+08 | 2026-05-20 09:24:09.121679+08 | | 2026-05-20 09:24:09.121679+08 | | | |
(1 row)
evantest=# \c
You are now connected to database "evantest" as user "chaol".
evantest=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo
-------+------------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------+-----------+-------------+-------------+----------
84709 | connecting | 0/03000000 | 1 | 0/03000000 | 0/03000000 | 1 | 2026-05-20 09:27:37.407117+08 | 2026-05-20 09:27:37.407117+08 | | 2026-05-20 09:27:37.407117+08 | | | |
(1 row)
evantest=# \c
You are now connected to database "evantest" as user "chaol".
evantest=# SELECT * FROM pg_stat_wal_receiver;
pid | status | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time | slot_name | sender_host | sender_port | conninfo
-------+------------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------+-----------+-------------+-------------+----------
84805 | connecting | 0/03000000 | 1 | 0/03000000 | 0/03000000 | 1 | 2026-05-20 09:28:03.251298+08 | 2026-05-20 09:28:03.251298+08 | | 2026-05-20 09:28:03.251298+08 | | | |
(1 row)
```
As shown above, every time I restarted the standby server, last_msg_send_time, last_msg_receipt_time, and latest_end_time were updated to the standby server start time. But in this test, the standby was connecting to a fake primary, so no WAL receiver message had been sent or received.
I tried to avoid more complicated changes, so I ended up with the v1 approach. I think it's okay to leave the other columns NULL while the receiver is still connecting, because at that point the only reliable information available is the receiver process's PID and status.
For v1, maybe we could clarify the meaning of ready_to_display with a comment. It seems to be intended to indicate that the connection-related information, such as LSNs and timestamps, is ready to display. In that sense, pid and status don't need to be gated by it.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
@ 2026-05-20 04:10 ` Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Michael Paquier @ 2026-05-20 04:10 UTC (permalink / raw)
To: Chao Li <li.evan.chao@gmail.com>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
On Wed, May 20, 2026 at 09:47:40AM +0800, Chao Li wrote:
> I just read v2, and it is actually the first solution I tried. The
> reason I gave up on that approach and switched to the implementation
> in v1 is that it may wrongly report last_msg_send_time,
> last_msg_receipt_time, and latest_end_time.
As of the code, we have the following at the top of WalReceiverMain()
before the first connection attempt:
/* Initialise to a sanish value */
now = GetCurrentTimestamp();
walrcv->lastMsgSendTime =
walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = now;
And the state of v2 is actually fine, because we finish by reporting
in the SQL calls values that represent the state the WAL receiver is
initialized at based on what the code does. It would be IMO an issue
to hide this information, as they can offer hits about the moment when
we've begun a connection.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
@ 2026-05-20 07:53 ` Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
0 siblings, 1 reply; 11+ messages in thread
From: Chao Li @ 2026-05-20 07:53 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
> On May 20, 2026, at 12:10, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, May 20, 2026 at 09:47:40AM +0800, Chao Li wrote:
>> I just read v2, and it is actually the first solution I tried. The
>> reason I gave up on that approach and switched to the implementation
>> in v1 is that it may wrongly report last_msg_send_time,
>> last_msg_receipt_time, and latest_end_time.
>
> As of the code, we have the following at the top of WalReceiverMain()
> before the first connection attempt:
> /* Initialise to a sanish value */
> now = GetCurrentTimestamp();
> walrcv->lastMsgSendTime =
> walrcv->lastMsgReceiptTime = walrcv->latestWalEndTime = now;
>
Was that okay because walrcv->ready_to_display was false, so the sane initial value would not show up through pg_stat_wal_receiver?
> And the state of v2 is actually fine, because we finish by reporting
> in the SQL calls values that represent the state the WAL receiver is
> initialized at based on what the code does. It would be IMO an issue
> to hide this information, as they can offer hits about the moment when
> we've begun a connection.
> --
> Michael
With v2, slot_name, sender_host, sender_port, and conninfo are already left NULL while the receiver is in CONNECTING state. I feel we don't have to show the timestamp fields either. Since the columns are named last_msg_send_time and last_msg_receipt_time, users may naturally interpret them as the last time a message was sent to or received from
the primary. If we show the standby server start time in those columns, I am afraid that could be confusing.
But I think it might be useful to show the *_lsn and *_tli values in CONNECTING state if they are available.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
@ 2026-05-20 20:43 ` Michael Paquier <michael@paquier.xyz>
2026-05-20 23:06 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Michael Paquier @ 2026-05-20 20:43 UTC (permalink / raw)
To: Chao Li <li.evan.chao@gmail.com>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
On Wed, May 20, 2026 at 03:53:38PM +0800, Chao Li wrote:
> With v2, slot_name, sender_host, sender_port, and conninfo are
> already left NULL while the receiver is in CONNECTING state. I feel
> we don't have to show the timestamp fields either. Since the columns
> are named last_msg_send_time and last_msg_receipt_time, users may
> naturally interpret them as the last time a message was sent to or
> received from
> the primary. If we show the standby server start time in those
> columns, I am afraid that could be confusing.
>
> But I think it might be useful to show the *_lsn and *_tli values in
> CONNECTING state if they are available.
The original reason why ready_to_display has been introduced is this
one, where we wanted to have a strict control over the connection
information across multiple calls of pg_stat_get_wal_receiver():
https://www.postgresql.org/message-id/CAB7nPqQNbHQ7F7wDD_2qvGA_FUW-Leds9HQNM6kJnto7RFNhUg@mail.gmail...
With v2, ready_to_display is still able to do the job it is defined
for. This does not need to apply on the time fields, so IMO showing
them to the values they are initialized is not a big deal, and they
can actually be useful to know even in the early stage of connection
as they reveal the state of the code.
Note also that the time values could still show up based on their
initial values at the early connection stage, even after completing
walrcv_connect() and after ready_to_display is switched to true, so
it's not like these values are that confusing: we just expose them a
bit more at an earlier stage of the connection attempt process. As a
whole v2 is fine, and addresses your issue.
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
@ 2026-05-20 23:06 ` Chao Li <li.evan.chao@gmail.com>
2026-05-21 07:20 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Chao Li @ 2026-05-20 23:06 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
> On May 21, 2026, at 04:43, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, May 20, 2026 at 03:53:38PM +0800, Chao Li wrote:
>> With v2, slot_name, sender_host, sender_port, and conninfo are
>> already left NULL while the receiver is in CONNECTING state. I feel
>> we don't have to show the timestamp fields either. Since the columns
>> are named last_msg_send_time and last_msg_receipt_time, users may
>> naturally interpret them as the last time a message was sent to or
>> received from
>> the primary. If we show the standby server start time in those
>> columns, I am afraid that could be confusing.
>>
>> But I think it might be useful to show the *_lsn and *_tli values in
>> CONNECTING state if they are available.
>
> The original reason why ready_to_display has been introduced is this
> one, where we wanted to have a strict control over the connection
> information across multiple calls of pg_stat_get_wal_receiver():
> https://www.postgresql.org/message-id/CAB7nPqQNbHQ7F7wDD_2qvGA_FUW-Leds9HQNM6kJnto7RFNhUg@mail.gmail...
>
> With v2, ready_to_display is still able to do the job it is defined
> for. This does not need to apply on the time fields, so IMO showing
> them to the values they are initialized is not a big deal, and they
> can actually be useful to know even in the early stage of connection
> as they reveal the state of the code.
>
> Note also that the time values could still show up based on their
> initial values at the early connection stage, even after completing
> walrcv_connect() and after ready_to_display is switched to true, so
> it's not like these values are that confusing: we just expose them a
> bit more at an earlier stage of the connection attempt process. As a
> whole v2 is fine, and addresses your issue.
> --
> Michael
Thanks for the detailed explanation.
Now I see that, based on the original discussion you pointed out, as long as v2 clears conninfo before setting ready_to_display to true, it is okay to do that earlier while the state is still CONNECTING. On that point, I’m good with v2.
I’m still not fully convinced about displaying the *_time fields, but I don’t have a stronger argument either, so I’m fine with that. Maybe we can add a brief description in the doc like the attached diff?
Overall, v2 looks good to me now.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] nocfbot_monitoring.sgml.diff (697B, 2-nocfbot_monitoring.sgml.diff)
download | inline diff:
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 08d5b824552..d2c8d547a91 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -1743,6 +1743,12 @@ description | Waiting for a newly initialized WAL file to reach durable storage
connected server.
</para>
+ <para>
+ When the WAL receiver is in <literal>connecting</literal> state, the time
+ fields in this view may reflect their initial values rather than values
+ received from the WAL sender.
+ </para>
+
<table id="pg-stat-wal-receiver-view" xreflabel="pg_stat_wal_receiver">
<title><structname>pg_stat_wal_receiver</structname> View</title>
<tgroup cols="1">
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 23:06 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
@ 2026-05-21 07:20 ` Chao Li <li.evan.chao@gmail.com>
2026-05-21 12:08 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
0 siblings, 1 reply; 11+ messages in thread
From: Chao Li @ 2026-05-21 07:20 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
> On May 21, 2026, at 07:06, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On May 21, 2026, at 04:43, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Wed, May 20, 2026 at 03:53:38PM +0800, Chao Li wrote:
>>> With v2, slot_name, sender_host, sender_port, and conninfo are
>>> already left NULL while the receiver is in CONNECTING state. I feel
>>> we don't have to show the timestamp fields either. Since the columns
>>> are named last_msg_send_time and last_msg_receipt_time, users may
>>> naturally interpret them as the last time a message was sent to or
>>> received from
>>> the primary. If we show the standby server start time in those
>>> columns, I am afraid that could be confusing.
>>>
>>> But I think it might be useful to show the *_lsn and *_tli values in
>>> CONNECTING state if they are available.
>>
>> The original reason why ready_to_display has been introduced is this
>> one, where we wanted to have a strict control over the connection
>> information across multiple calls of pg_stat_get_wal_receiver():
>> https://www.postgresql.org/message-id/CAB7nPqQNbHQ7F7wDD_2qvGA_FUW-Leds9HQNM6kJnto7RFNhUg@mail.gmail...
>>
>> With v2, ready_to_display is still able to do the job it is defined
>> for. This does not need to apply on the time fields, so IMO showing
>> them to the values they are initialized is not a big deal, and they
>> can actually be useful to know even in the early stage of connection
>> as they reveal the state of the code.
>>
>> Note also that the time values could still show up based on their
>> initial values at the early connection stage, even after completing
>> walrcv_connect() and after ready_to_display is switched to true, so
>> it's not like these values are that confusing: we just expose them a
>> bit more at an earlier stage of the connection attempt process. As a
>> whole v2 is fine, and addresses your issue.
>> --
>> Michael
>
> Thanks for the detailed explanation.
>
> Now I see that, based on the original discussion you pointed out, as long as v2 clears conninfo before setting ready_to_display to true, it is okay to do that earlier while the state is still CONNECTING. On that point, I’m good with v2.
>
> I’m still not fully convinced about displaying the *_time fields, but I don’t have a stronger argument either, so I’m fine with that. Maybe we can add a brief description in the doc like the attached diff?
>
> Overall, v2 looks good to me now.
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
> <nocfbot_monitoring.sgml.diff>
I spent more time here, and found that it is still possible to leak conninfo in the WAL receiver reuse path:
* WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
* Then RequestXLogStreaming() copies raw conninfo into walrcv->conninfo and sets the state to WALRCV_RESTARTING.
* WalRcvWaitForStartPosition() then moves the state to WALRCV_CONNECTING, but this path does not clear walrcv->conninfo again.
The attached nocfbot_test.diff demonstrates the leak.
Initially I thought we could also set ready_to_display to false when setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(), and set it back to true when switching back to WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
I ended up with a solution that copies the primary connection info to walrcv->conninfo only when RequestXLogStreaming() is switching to WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver keeps using the existing wrconn, so it does not need raw conninfo to be copied into shared memory again. See the attached nocfbot_walreceiverfuncs.c.diff.
With that change, the new test passes. I also ran "make check-world" successfully.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 23:06 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-21 07:20 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
@ 2026-05-21 12:08 ` Michael Paquier <michael@paquier.xyz>
2026-05-21 12:29 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Michael Paquier @ 2026-05-21 12:08 UTC (permalink / raw)
To: Chao Li <li.evan.chao@gmail.com>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
On Thu, May 21, 2026 at 03:20:13PM +0800, Chao Li wrote:
> I spent more time here, and found that it is still possible to leak
> conninfo in the WAL receiver reuse path:
>
> * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
> * Then RequestXLogStreaming() copies raw conninfo into
> * walrcv->conninfo and sets the state to WALRCV_RESTARTING.
> * WalRcvWaitForStartPosition() then moves the state to
> * WALRCV_CONNECTING, but this path does not clear walrcv->conninfo
> * again.
>
> The attached nocfbot_test.diff demonstrates the leak.
File is missing, but I get it. This is a legit bug from what I can
see, that also affects all the stable branches, not only HEAD.
> Initially I thought we could also set ready_to_display to false when
> setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(),
> and set it back to true when switching back to
> WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and
> WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
Nah, we should not do that. We want to track the waiting and
restarting states in the view.
> I ended up with a solution that copies the primary connection info
> to walrcv->conninfo only when RequestXLogStreaming() is switching to
> WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver
> keeps using the existing wrconn, so it does not need raw conninfo to
> be copied into shared memory again. See the attached
> nocfbot_walreceiverfuncs.c.diff.
Ah, yeah. This solution to this problem makes sense. We should not
clobber conninfo either in this case, or we'd lose the
user-displayable string returned by walrcv_get_conninfo() (conninfo
cannot be NULL based on the in-core callers of RequestXLogStreaming()
AFAIK, but who knows for things out there). As mentioned above, this
is a different issue than the visibility of the connection information
while we are connecting, and it should be backpatched. Would you like
to send a patch?
--
Michael
Attachments:
[application/pgp-signature] signature.asc (833B, 2-signature.asc)
download
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 23:06 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-21 07:20 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-21 12:08 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
@ 2026-05-21 12:29 ` Chao Li <li.evan.chao@gmail.com>
2026-05-22 02:06 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
0 siblings, 1 reply; 11+ messages in thread
From: Chao Li @ 2026-05-21 12:29 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
> On May 21, 2026, at 20:08, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, May 21, 2026 at 03:20:13PM +0800, Chao Li wrote:
>> I spent more time here, and found that it is still possible to leak
>> conninfo in the WAL receiver reuse path:
>>
>> * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
>> * Then RequestXLogStreaming() copies raw conninfo into
>> * walrcv->conninfo and sets the state to WALRCV_RESTARTING.
>> * WalRcvWaitForStartPosition() then moves the state to
>> * WALRCV_CONNECTING, but this path does not clear walrcv->conninfo
>> * again.
>>
>> The attached nocfbot_test.diff demonstrates the leak.
>
> File is missing, but I get it. This is a legit bug from what I can
> see, that also affects all the stable branches, not only HEAD.
>
>> Initially I thought we could also set ready_to_display to false when
>> setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(),
>> and set it back to true when switching back to
>> WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and
>> WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
>
> Nah, we should not do that. We want to track the waiting and
> restarting states in the view.
>
>> I ended up with a solution that copies the primary connection info
>> to walrcv->conninfo only when RequestXLogStreaming() is switching to
>> WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver
>> keeps using the existing wrconn, so it does not need raw conninfo to
>> be copied into shared memory again. See the attached
>> nocfbot_walreceiverfuncs.c.diff.
>
> Ah, yeah. This solution to this problem makes sense. We should not
> clobber conninfo either in this case, or we'd lose the
> user-displayable string returned by walrcv_get_conninfo() (conninfo
> cannot be NULL based on the in-core callers of RequestXLogStreaming()
> AFAIK, but who knows for things out there). As mentioned above, this
> is a different issue than the visibility of the connection information
> while we are connecting, and it should be backpatched. Would you like
> to send a patch?
> --
> Michael
Sorry for missing the attachments. Please take a look first. It’s late here, I can spend more time tomorrow.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] nocfbot_test.diff (2.1K, 2-nocfbot_test.diff)
download | inline diff:
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
index 5afd2f44466..85df9ad422c 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -48,10 +48,11 @@ $node_standby_1->psql(
is($psql_out, 't', "promotion of standby with pg_promote");
# Switch standby 2 to replay from standby 1
+my $secret = 'dont_show_me';
my $connstr_1 = $node_standby_1->connstr;
$node_standby_2->append_conf(
'postgresql.conf', qq(
-primary_conninfo='$connstr_1'
+primary_conninfo='$connstr_1 password=$secret'
));
# Rotate logfile before restarting, for the log checks done below.
@@ -62,9 +63,9 @@ $node_standby_2->restart;
# verify that after reconnection, the walreceiver stays alive during
# the timeline switch.
$node_standby_2->poll_query_until('postgres',
- "SELECT EXISTS(SELECT 1 FROM pg_stat_wal_receiver)");
+ "SELECT EXISTS(SELECT 1 FROM pg_stat_activity WHERE backend_type = 'walreceiver')");
my $wr_pid_before_switch = $node_standby_2->safe_psql('postgres',
- "SELECT pid FROM pg_stat_wal_receiver");
+ "SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
# Insert some data in standby 1 and check its presence in standby 2
# to ensure that the timeline switch has been done.
@@ -88,11 +89,19 @@ ok( !$node_standby_2->log_contains(
# Verify that the walreceiver process stayed alive across the timeline
# switch, check its PID.
my $wr_pid_after_switch = $node_standby_2->safe_psql('postgres',
- "SELECT pid FROM pg_stat_wal_receiver");
+ "SELECT pid FROM pg_stat_activity WHERE backend_type = 'walreceiver'");
is($wr_pid_before_switch, $wr_pid_after_switch,
'WAL receiver PID matches across timeline jumps');
+my $raw_conninfo_count = $node_standby_2->safe_psql(
+ 'postgres',
+ "SELECT count(*) FROM pg_stat_wal_receiver WHERE conninfo LIKE '%$secret%'");
+
+is(
+ $raw_conninfo_count, '0',
+ 'raw primary_conninfo password is not visible after timeline jumps');
+
# Ensure that a standby is able to follow a primary on a newer timeline
# when WAL archiving is enabled.
[application/octet-stream] nocfbot_walreceiverfuncs.c.diff (1.0K, 3-nocfbot_walreceiverfuncs.c.diff)
download | inline diff:
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index a0ed853e2f6..279c6c8a7e1 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -281,11 +281,6 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
Assert(walrcv->walRcvState == WALRCV_STOPPED ||
walrcv->walRcvState == WALRCV_WAITING);
- if (conninfo != NULL)
- strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
- else
- walrcv->conninfo[0] = '\0';
-
/*
* Use configured replication slot if present, and ignore the value of
* create_temp_slot as the slot name should be persistent. Otherwise, use
@@ -307,6 +302,10 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
{
launch = true;
walrcv->walRcvState = WALRCV_STARTING;
+ if (conninfo != NULL)
+ strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
+ else
+ walrcv->conninfo[0] = '\0';
}
else
walrcv->walRcvState = WALRCV_RESTARTING;
^ permalink raw reply [nested|flat] 11+ messages in thread
* Re: Fix pg_stat_wal_receiver to show CONNECTING status
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-20 23:06 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-21 07:20 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-21 12:08 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Michael Paquier <michael@paquier.xyz>
2026-05-21 12:29 ` Re: Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
@ 2026-05-22 02:06 ` Chao Li <li.evan.chao@gmail.com>
0 siblings, 0 replies; 11+ messages in thread
From: Chao Li @ 2026-05-22 02:06 UTC (permalink / raw)
To: Michael Paquier <michael@paquier.xyz>; +Cc: pgsql-hackers; Michael Paquier <michael.paquier@gmail.com>; Xuneng Zhou <xunengzhou@gmail.com>
> On May 21, 2026, at 20:29, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On May 21, 2026, at 20:08, Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Thu, May 21, 2026 at 03:20:13PM +0800, Chao Li wrote:
>>> I spent more time here, and found that it is still possible to leak
>>> conninfo in the WAL receiver reuse path:
>>>
>>> * WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
>>> * Then RequestXLogStreaming() copies raw conninfo into
>>> * walrcv->conninfo and sets the state to WALRCV_RESTARTING.
>>> * WalRcvWaitForStartPosition() then moves the state to
>>> * WALRCV_CONNECTING, but this path does not clear walrcv->conninfo
>>> * again.
>>>
>>> The attached nocfbot_test.diff demonstrates the leak.
>>
>> File is missing, but I get it. This is a legit bug from what I can
>> see, that also affects all the stable branches, not only HEAD.
>>
>>> Initially I thought we could also set ready_to_display to false when
>>> setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(),
>>> and set it back to true when switching back to
>>> WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and
>>> WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.
>>
>> Nah, we should not do that. We want to track the waiting and
>> restarting states in the view.
>>
>>> I ended up with a solution that copies the primary connection info
>>> to walrcv->conninfo only when RequestXLogStreaming() is switching to
>>> WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver
>>> keeps using the existing wrconn, so it does not need raw conninfo to
>>> be copied into shared memory again. See the attached
>>> nocfbot_walreceiverfuncs.c.diff.
>>
>> Ah, yeah. This solution to this problem makes sense. We should not
>> clobber conninfo either in this case, or we'd lose the
>> user-displayable string returned by walrcv_get_conninfo() (conninfo
>> cannot be NULL based on the in-core callers of RequestXLogStreaming()
>> AFAIK, but who knows for things out there). As mentioned above, this
>> is a different issue than the visibility of the connection information
>> while we are connecting, and it should be backpatched. Would you like
>> to send a patch?
>> --
>> Michael
>
> Sorry for missing the attachments. Please take a look first. It’s late here, I can spend more time tomorrow.
>
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
>
>
>
>
> <nocfbot_test.diff><nocfbot_walreceiverfuncs.c.diff>
Here comes the patch set:
* v3-0001 is the exactly same as v2-0001
* In v3-0002, the change in walreceiverfuncs.c is the same as the previous diff, and I tuned the test change a little bit.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
Attachments:
[application/octet-stream] v3-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch (3.4K, 2-v3-0001-Improve-pg_stat_wal_receiver-for-CONNECTING-statu.patch)
download | inline diff:
From f0091c46ddfba03cfc7f7a1e4154041694edcda6 Mon Sep 17 00:00:00 2001
From: Michael Paquier <michael@paquier.xyz>
Date: Tue, 19 May 2026 22:52:38 +0900
Subject: [PATCH v3 1/2] Improve pg_stat_wal_receiver for CONNECTING status
Commit a36164e7465 added a CONNECTING status for the WAL receiver, but
pg_stat_wal_receiver returned no information while the connection to the
primary was attempted, limiting the usability of the feature in
high-latency environments where the connection attempt to the primary
could take time.
This commit improves the report of the status by splitting the way the
shared memory state of the WAL receiver is filled before and after the
connection to the primary is attempted:
- Before the attempt, reset all the connection fields, switch
ready_to_display to true.
- After the attempt, fill in the connection fields.
This change means two spinlock acquisitions instead of one, but at least
monitoring tools can know about the connection attempt before its
completion, enlarging the usability of the feature.
Reported-by: Chao Li <li.evan.chao@gmail.com>
Author: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/XXX
---
src/backend/replication/walreceiver.c | 24 ++++++++++++++++--------
1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 07eac07b9ce..d19317703c1 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -267,6 +267,20 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
/* Unblock signals (they were blocked when the postmaster forked us) */
sigprocmask(SIG_SETMASK, &UnBlockSig, NULL);
+ /*
+ * Switch the WAL receiver state as ready for display before doing a
+ * connection attempt, so as its connecting state is visible before
+ * attempting to contact the primary server. Note that this resets the
+ * original conninfo, sender_port and sender_host, for security. These
+ * fields are filled once the connection is fully established.
+ */
+ SpinLockAcquire(&walrcv->mutex);
+ memset(walrcv->conninfo, 0, MAXCONNINFO);
+ memset(walrcv->sender_host, 0, NI_MAXHOST);
+ walrcv->sender_port = 0;
+ walrcv->ready_to_display = true;
+ SpinLockRelease(&walrcv->mutex);
+
/* Establish the connection to the primary for XLOG streaming */
appname = cluster_name[0] ? cluster_name : "walreceiver";
wrconn = walrcv_connect(conninfo, true, false, false, appname, &err);
@@ -277,23 +291,17 @@ WalReceiverMain(const void *startup_data, size_t startup_data_len)
appname, err)));
/*
- * Save user-visible connection string. This clobbers the original
- * conninfo, for security. Also save host and port of the sender server
- * this walreceiver is connected to.
+ * Save user-visible connection string, now that the connection has been
+ * achieved.
*/
tmp_conninfo = walrcv_get_conninfo(wrconn);
walrcv_get_senderinfo(wrconn, &sender_host, &sender_port);
SpinLockAcquire(&walrcv->mutex);
- memset(walrcv->conninfo, 0, MAXCONNINFO);
if (tmp_conninfo)
strlcpy(walrcv->conninfo, tmp_conninfo, MAXCONNINFO);
-
- memset(walrcv->sender_host, 0, NI_MAXHOST);
if (sender_host)
strlcpy(walrcv->sender_host, sender_host, NI_MAXHOST);
-
walrcv->sender_port = sender_port;
- walrcv->ready_to_display = true;
SpinLockRelease(&walrcv->mutex);
if (tmp_conninfo)
--
2.50.1 (Apple Git-155)
[application/octet-stream] v3-0002-Avoid-exposing-raw-WAL-receiver-conninfo-during-t.patch (3.9K, 3-v3-0002-Avoid-exposing-raw-WAL-receiver-conninfo-during-t.patch)
download | inline diff:
From 4dfab1d904ac6b86a7452d011e86379fba86623b Mon Sep 17 00:00:00 2001
From: "Chao Li (Evan)" <lic@highgo.com>
Date: Fri, 22 May 2026 09:46:16 +0800
Subject: [PATCH v3 2/2] Avoid exposing raw WAL receiver conninfo during
timeline jumps
When reusing an existing WAL receiver after it has reached
WALRCV_WAITING, RequestXLogStreaming() copied PrimaryConnInfo into
WalRcv->conninfo before switching the state to WALRCV_RESTARTING. At
that point ready_to_display could still be true, so pg_stat_wal_receiver
could expose the raw connection string, including sensitive fields.
WALRCV_RESTARTING does not establish a new connection. The waiting WAL
receiver reuses its existing connection and only needs a new startpoint
and timeline, so there is no need to copy the raw connection string into
shared memory again. Only copy conninfo when launching a new WAL receiver
from WALRCV_STOPPED.
Add coverage to the timeline-switch test to verify that the WAL receiver
process remains visible in pg_stat_wal_receiver across the jump, while a
raw password in primary_conninfo is not exposed.
Author: Chao Li <lic@highgo.com>
Reviewed-by:
Discussion: https://postgr.es/m/EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com
---
src/backend/replication/walreceiverfuncs.c | 9 ++++-----
src/test/recovery/t/004_timeline_switch.pl | 16 ++++++++++++++--
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
index a0ed853e2f6..279c6c8a7e1 100644
--- a/src/backend/replication/walreceiverfuncs.c
+++ b/src/backend/replication/walreceiverfuncs.c
@@ -281,11 +281,6 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
Assert(walrcv->walRcvState == WALRCV_STOPPED ||
walrcv->walRcvState == WALRCV_WAITING);
- if (conninfo != NULL)
- strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
- else
- walrcv->conninfo[0] = '\0';
-
/*
* Use configured replication slot if present, and ignore the value of
* create_temp_slot as the slot name should be persistent. Otherwise, use
@@ -307,6 +302,10 @@ RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
{
launch = true;
walrcv->walRcvState = WALRCV_STARTING;
+ if (conninfo != NULL)
+ strlcpy(walrcv->conninfo, conninfo, MAXCONNINFO);
+ else
+ walrcv->conninfo[0] = '\0';
}
else
walrcv->walRcvState = WALRCV_RESTARTING;
diff --git a/src/test/recovery/t/004_timeline_switch.pl b/src/test/recovery/t/004_timeline_switch.pl
index 5afd2f44466..eb7cfa9f8e0 100644
--- a/src/test/recovery/t/004_timeline_switch.pl
+++ b/src/test/recovery/t/004_timeline_switch.pl
@@ -47,11 +47,15 @@ $node_standby_1->psql(
stdout => \$psql_out);
is($psql_out, 't', "promotion of standby with pg_promote");
-# Switch standby 2 to replay from standby 1
+# Switch standby 2 to replay from standby 1. During the timeline switch,
+# the WAL receiver process on standby 2 should not be stopped, and the
+# new primary connection string should not be visible
+# in pg_stat_wal_receiver.
+my $secret = 'dont_show_me';
my $connstr_1 = $node_standby_1->connstr;
$node_standby_2->append_conf(
'postgresql.conf', qq(
-primary_conninfo='$connstr_1'
+primary_conninfo='$connstr_1 password=$secret'
));
# Rotate logfile before restarting, for the log checks done below.
@@ -93,6 +97,14 @@ my $wr_pid_after_switch = $node_standby_2->safe_psql('postgres',
is($wr_pid_before_switch, $wr_pid_after_switch,
'WAL receiver PID matches across timeline jumps');
+my $raw_conninfo_count = $node_standby_2->safe_psql(
+ 'postgres',
+ "SELECT count(*) FROM pg_stat_wal_receiver WHERE conninfo LIKE '%$secret%'");
+
+is(
+ $raw_conninfo_count, '0',
+ 'raw primary_conninfo password is not visible after timeline jumps');
+
# Ensure that a standby is able to follow a primary on a newer timeline
# when WAL archiving is enabled.
--
2.50.1 (Apple Git-155)
^ permalink raw reply [nested|flat] 11+ messages in thread
end of thread, other threads:[~2026-05-22 02:06 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-19 05:55 Fix pg_stat_wal_receiver to show CONNECTING status Chao Li <li.evan.chao@gmail.com>
2026-05-19 13:55 ` Michael Paquier <michael@paquier.xyz>
2026-05-20 01:47 ` Chao Li <li.evan.chao@gmail.com>
2026-05-20 04:10 ` Michael Paquier <michael@paquier.xyz>
2026-05-20 07:53 ` Chao Li <li.evan.chao@gmail.com>
2026-05-20 20:43 ` Michael Paquier <michael@paquier.xyz>
2026-05-20 23:06 ` Chao Li <li.evan.chao@gmail.com>
2026-05-21 07:20 ` Chao Li <li.evan.chao@gmail.com>
2026-05-21 12:08 ` Michael Paquier <michael@paquier.xyz>
2026-05-21 12:29 ` Chao Li <li.evan.chao@gmail.com>
2026-05-22 02:06 ` Chao Li <li.evan.chao@gmail.com>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox