Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wAy1z-000c0e-36 for pgsql-performance@arkaria.postgresql.org; Thu, 09 Apr 2026 22:39:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wAy1y-0081Lj-10 for pgsql-performance@arkaria.postgresql.org; Thu, 09 Apr 2026 22:39:23 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wAy1x-0081LX-2z for pgsql-performance@lists.postgresql.org; Thu, 09 Apr 2026 22:39:22 +0000 Received: from mail-yw1-x1134.google.com ([2607:f8b0:4864:20::1134]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wAy1w-00000000Fsl-3M55 for pgsql-performance@lists.postgresql.org; Thu, 09 Apr 2026 22:39:22 +0000 Received: by mail-yw1-x1134.google.com with SMTP id 00721157ae682-797ab169454so18299987b3.3 for ; Thu, 09 Apr 2026 15:39:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775774357; cv=none; d=google.com; s=arc-20240605; b=jkKdxABHQCK/K/AlUiT5c//aOqm/o0KLF5nPACu303MO91bUa7mt6dF2BmVGouv64p bW5pYLp8Sa3aYypf5sy1l7xqjcWR/03SRiC8ozQevnz8+ZQ7XZG04VhO2UTcJdfNYOC7 eM60/eMia5LJdvPWbyp9rkCnJrCZ0spjGUZJPxoCnKkuSqywwoXuh67HBjyc4nv52Mt7 LH4S32GQ0vylqrhcYUbpb4oFSrLgS0MwcmXtbF+Be0t5Vcpe8I0HpPKbLMaSWAdlAOt8 4hWyqgrcBaLXkrCoFJYUrLWIW6FfUxzr6lesi3Og0VCRzgtCeQRPom7CyE4puHe4bQF1 hi4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:mime-version:dkim-signature; bh=WYLXXWrEnZFlNyehAYa1TVSl0swrSfcElbvVTNjHfjQ=; fh=z/JQ2tSXVGoYKKvtD/yyy3g3xqdpsR82Ngoo7GcsvL4=; b=GuRgF5fKzHD2h91ugYL6xIiGURXMu5zioTbqIcJTWVfztg0+JHGarEyXX6o9ziin4K +sr1YjuRCiYbOs4bB1abV5OzMe5zoLZ6T8Z2oGsKY9eXA8wo49uTuaU6Nd1Ca85kDNKr 1v1KcbORk2oV2OvmTJnl3+vDMJCrg35GqyUtQ2Kq3XFlhUwvmMT4qzb9oLhYm7Kg8YsV vF47Hyi7ua9Yyj4iFh7JWgHVuyGMuOMioP4Q1qzzyEWG1qO0Qac+sva/d0enMQp5LMo3 mShdsV97p1eTd688ZnbEsfFatHIsTopjPRW8LAScGbe8nCIwQqD3l6vPoEUbMQv1Avda BbbA==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775774357; x=1776379157; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=WYLXXWrEnZFlNyehAYa1TVSl0swrSfcElbvVTNjHfjQ=; b=i65ElgJXAX1tAKOqKsfCiPf0sPlUECZ89/kjIsRBtpkz8hTxHNNPXtEScTwvRfBe7p jWD/1l6R63Uo4Q5CtxQIgIKjl7EOLn80aEru9crRZdh48TqJO0XPv7x+NlJGWJiWFGbB auaIdWCmljDOvjiCu73iOL8kq/T+AxpM4sxXiFmScfVEYpPxvWfYbYxgQzePZ0zAP+cB TzYoV8ZYsBiBzLTlHeTOy+pTSUJmazgUWVI672zG1ltTmaDBFp+Y07qwO/Oh9QH0GTsh guZyRZ8K1DMV7n+RgSFoPOiCiuxws2nS4La5ENGVQzhDwTx8uYxvrjKmC5NFOV+dvhmP kYvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775774357; x=1776379157; h=to:subject:message-id:date:from:mime-version:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WYLXXWrEnZFlNyehAYa1TVSl0swrSfcElbvVTNjHfjQ=; b=cy4/N4fgpCmb9lYRE/Bjj5dBZjlqX+BYFMvQFpIETkB+jStr1GswYWBjNfYwEqAwJJ 9Jl7eGXH4EeD9WB8m8Jd2UUEfaYFqQHO1UHRlpamuXsDp5iPifbKdhA9WweGZ8SEbs/q ieADs8ZS+Js7XW2zvumbaPVxoMP4h6DEaYLg7E4+63iDEDmnYUltqnpbK8kNT69BbqEY KZxS5Ju210pow36Me2u7XmjOjTQuP4BLAah1PAXTcR39RcYtx3nrYIxDprJeEXbXdgsT DahrhXFlthFbUkd5r0wzu+D5hFAyT4KbzCzXI3H8zfiSIRtCk3bnL5oZDAUFrehjpilE VWIg== X-Gm-Message-State: AOJu0Yw2K/f1B8gdK32LJW4auGk9UwOpapjINz4EEyMRksDWN+gU2cWE SY86PM6CHHVJln8lU2KXSUtfcSJXf9bSk3Fzefy/Oqnvhg87B3UqKGtGrcBfdasdwu2ycg4x9yw j0ZYibWCMUsnxZR/RpbFdbr7TWMrByRCy//ri X-Gm-Gg: AeBDiesKioU8og4SodwGv7V66KzNqRNktoXUDTVI2amph5T94uRc3/dt3iDXoOF3Ct7 /0k/fXjUDtRiSm2qHmZLCmpPqzSjcl+UxGZ+LgvHc3qv0IKdhc3MQep69G+rWWBJR4WVqd97NKj xy+nqj5T6W/BxEWzKVfrKADoC4RAVPxvV1QXeJ6tU8flfe6xg9/ZN1aqzLT9JMRikxuj1VTWUX4 H+1M+sYymFa7vbXcLwX3Nndev1H+ClseGlJBL2kR2WRaoJxiZuojwBz4fmYCiz3uM+IHpNIvaUq k/9SprSv+yWo3A== X-Received: by 2002:a05:690c:e3c3:b0:79a:5fb9:62ad with SMTP id 00721157ae682-7af71f460fcmr8516227b3.43.1775774357375; Thu, 09 Apr 2026 15:39:17 -0700 (PDT) MIME-Version: 1.0 From: Priya V Date: Thu, 9 Apr 2026 17:39:06 -0500 X-Gm-Features: AQROBzBe-cCn2fUNZOo2xigdrxpl83d3_9atfJw3E9Boj0zlOpOjl6unyp8z4bA Message-ID: Subject: Async standby lag + physical slot + hot_standby_feedback=on appeared to degrade primary performance To: pgsql-performance@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000c5cfb0064f0eafd7" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c5cfb0064f0eafd7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi all, I=E2=80=99m looking for insight into a behavior we observed in a PostgreSQL physical replication setup. Environment: - PostgreSQL version:15.14 - DB size - 282 GB - Environment - AWS EC2 - PR =3D primary - HA =3D synchronous standby - DP =3D asynchronous standby - DP used a physical replication slot - hot_standby_feedback =3D on on DP Observed behavior: - DP fell behind PR by about 400 GB of replication lag - There were no user queries running on DP - During this period, query performance on PR degraded and application backlog built up on PR - After removing DP from replication, PR performance improved gradually over about 1 to 2 hours, not immediately Why this is confusing: - DP was async, so this does not appear to be synchronous commit wait - There were no active queries on DP at the time we checked - The delayed recovery on PR makes me wonder whether cleanup on PR had been held back for some time, causing dead tuple accumulation / bloat / autovacuum backlog, and whether removing DP only allowed PR to recover gradually afterward My questions: 1. In an async physical standby setup, can a lagging standby with a physical slot and hot_standby_feedback=3Don still hold back VACUUM clean= up on the primary even when no queries are currently running on the standby= ? 2. Can an old or stale slot xmin on the primary explain this kind of behavior? 3. Does the 1=E2=80=932 hour gradual recovery after removing DP point mo= re toward cleanup debt / dead tuple buildup / bloat on PR, WAL retention / storage pressure, or a combination of both? 4. What PR-side evidence would best confirm the root cause after the fact? For example: - pg_stat_replication.backend_xmin - pg_replication_slots.xmin - pg_replication_slots.restart_lsn - pg_stat_user_tables.n_dead_tup - autovacuum activity on heavily updated tables Any insights would be appreciated. Thanks. --000000000000c5cfb0064f0eafd7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hi all,

I=E2=80=99m looking fo= r insight into a behavior we observed in a PostgreSQL physical replication = setup.

Environment:=

  • PostgreSQL version:15.14
  • DB size -= =C2=A0282 GB
  • Environment - AWS EC2
  • PR =3D prim= ary
  • HA =3D synchronous standby
  • DP= =3D asynchronous standby
  • DP used a physical replicati= on slot
  • hot_standby_feedback =3D on<= /span> on DP

= Observed behavior:

  • DP fell behind PR by about= 400 GB of replication lag
  • There were no user queries = running on DP
  • During this period, query performance on= PR degraded and application backlog built up on PR
  • Af= ter removing DP from replication, PR performance improved gradually over ab= out 1 to 2 hours, not immediately

Why this is confusing:

  • DP was async, = so this does not appear to be synchronous commit wait
  • = There were no active queries on DP at the time we checked
  • The delayed recovery on PR makes me wonder whether cleanup on PR had be= en held back for some time, causing dead tuple accumulation / bloat / autov= acuum backlog, and whether removing DP only allowed PR to recover gradually= afterward

My q= uestions:

  1. In an async physical standby = setup, can a lagging standby with a physical slot and hot_standby_feedback=3Don still hold back VAC= UUM cleanup on the primary even when no queries are currently running on th= e standby?
  2. Can an old or stale slot xmin on the primary explain this kind of= behavior?
  3. Does the 1=E2=80=932 hour gradual recovery = after removing DP point more toward cleanup debt / dead tuple buildup / blo= at on PR, WAL retention / storage pressure, or a combination of both?
  4. What PR-side evidence would best confirm the root cause aft= er the fact? For example:
    • pg_stat_rep= lication.backend_xmin
    • pg_repl= ication_slots.xmin
    • pg_replica= tion_slots.restart_lsn
    • pg_sta= t_user_tables.n_dead_tup
    • autovacuum activity on= heavily updated tables

Any insights would be appreciated.

Thanks.=


--000000000000c5cfb0064f0eafd7--