public inbox for pgsql-hackers@postgresql.org  
help / color / mirror / Atom feed
From: Chao Li <li.evan.chao@gmail.com>
To: Michael Paquier <michael@paquier.xyz>
Cc: PostgreSQL-development <pgsql-hackers@postgresql.org>
Cc: Michael Paquier <michael.paquier@gmail.com>
Cc: Xuneng Zhou <xunengzhou@gmail.com>
Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
Date: Thu, 21 May 2026 15:20:13 +0800
Message-ID: <E2A0E5FB-E841-4D4B-96F8-B4016359CDF0@gmail.com> (raw)
In-Reply-To: <1B695040-F544-447C-A6A8-C8BFF7F799D1@gmail.com>
References: <EF91FF76-1E2B-4F3B-9162-290B4DC517FF@gmail.com>
	<agxr29Hsz7FjxzlN@paquier.xyz>
	<1F153E64-B791-42FA-A60A-64813B20B81E@gmail.com>
	<ag00OeH1sbt5ie_6@paquier.xyz>
	<75CDE990-29D5-4D5C-BFE1-3840F19C0163@gmail.com>
	<ag4dCGAPBc5VFhCi@paquier.xyz>
	<1B695040-F544-447C-A6A8-C8BFF7F799D1@gmail.com>



> On May 21, 2026, at 07:06, Chao Li <li.evan.chao@gmail.com> wrote:
> 
> 
> 
>> On May 21, 2026, at 04:43, Michael Paquier <michael@paquier.xyz> wrote:
>> 
>> On Wed, May 20, 2026 at 03:53:38PM +0800, Chao Li wrote:
>>> With v2, slot_name, sender_host, sender_port, and conninfo are
>>> already left NULL while the receiver is in CONNECTING state. I feel
>>> we don't have to show the timestamp fields either. Since the columns
>>> are named last_msg_send_time and last_msg_receipt_time, users may
>>> naturally interpret them as the last time a message was sent to or
>>> received from
>>> the primary. If we show the standby server start time in those
>>> columns, I am afraid that could be confusing.
>>> 
>>> But I think it might be useful to show the *_lsn and *_tli values in
>>> CONNECTING state if they are available.
>> 
>> The original reason why ready_to_display has been introduced is this
>> one, where we wanted to have a strict control over the connection
>> information across multiple calls of pg_stat_get_wal_receiver():
>> https://www.postgresql.org/message-id/CAB7nPqQNbHQ7F7wDD_2qvGA_FUW-Leds9HQNM6kJnto7RFNhUg@mail.gmail...
>> 
>> With v2, ready_to_display is still able to do the job it is defined
>> for.  This does not need to apply on the time fields, so IMO showing
>> them to the values they are initialized is not a big deal, and they
>> can actually be useful to know even in the early stage of connection
>> as they reveal the state of the code.  
>> 
>> Note also that the time values could still show up based on their
>> initial values at the early connection stage, even after completing
>> walrcv_connect() and after ready_to_display is switched to true, so
>> it's not like these values are that confusing: we just expose them a
>> bit more at an earlier stage of the connection attempt process.  As a
>> whole v2 is fine, and addresses your issue.
>> --
>> Michael
> 
> Thanks for the detailed explanation.
> 
> Now I see that, based on the original discussion you pointed out, as long as v2 clears conninfo before setting ready_to_display to true, it is okay to do that earlier while the state is still CONNECTING. On that point, I’m good with v2.
> 
> I’m still not fully convinced about displaying the *_time fields, but I don’t have a stronger argument either, so I’m fine with that. Maybe we can add a brief description in the doc like the attached diff?
> 
> Overall, v2 looks good to me now.
> 
> Best regards,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
> https://www.highgo.com/
> 
> 
> 
> 
> <nocfbot_monitoring.sgml.diff>

I spent more time here, and found that it is still possible to leak conninfo in the WAL receiver reuse path:

* WalRcvWaitForStartPosition() sets the state to WALRCV_WAITING.
* Then RequestXLogStreaming() copies raw conninfo into walrcv->conninfo and sets the state to WALRCV_RESTARTING.
* WalRcvWaitForStartPosition() then moves the state to WALRCV_CONNECTING, but this path does not clear walrcv->conninfo again.

The attached nocfbot_test.diff demonstrates the leak.

Initially I thought we could also set ready_to_display to false when setting the state to WALRCV_WAITING in WalRcvWaitForStartPosition(), and set it back to true when switching back to WALRCV_CONNECTING. However, that would make the WALRCV_WAITING and WALRCV_RESTARTING states invisible in pg_stat_wal_receiver.

I ended up with a solution that copies the primary connection info to walrcv->conninfo only when RequestXLogStreaming() is switching to WALRCV_STARTING. In the WALRCV_WAITING reuse path, the WAL receiver keeps using the existing wrconn, so it does not need raw conninfo to be copied into shared memory again. See the attached nocfbot_walreceiverfuncs.c.diff.

With that change, the new test passes. I also ran "make check-world" successfully.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/










reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-hackers@postgresql.org
  Cc: li.evan.chao@gmail.com, michael@paquier.xyz, michael.paquier@gmail.com, xunengzhou@gmail.com
  Subject: Re: Fix pg_stat_wal_receiver to show CONNECTING status
  In-Reply-To: <E2A0E5FB-E841-4D4B-96F8-B4016359CDF0@gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox