Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vKoH7-009PcA-21 for pgsql-hackers@arkaria.postgresql.org; Mon, 17 Nov 2025 01:43:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vKoH5-00Bm2P-1N for pgsql-hackers@arkaria.postgresql.org; Mon, 17 Nov 2025 01:43:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vKoH4-00Bm2H-3C for pgsql-hackers@lists.postgresql.org; Mon, 17 Nov 2025 01:43:23 +0000 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vKoH2-007UDl-1o for pgsql-hackers@lists.postgresql.org; Mon, 17 Nov 2025 01:43:21 +0000 Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-640c6577120so6492452a12.1 for ; Sun, 16 Nov 2025 17:43:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763343798; x=1763948598; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=HgoN/wtGUTYKYSpLhoX3NhelqvEjZ427OfnFc0Vq4MQ=; b=EBpJIcAJmgQ6mwx7/+kcM+urb41EbLxpUZU6D+yjArukybtc0uPGkPWt65zc2N7k3p 5mYXMclq3ypErZaVFt6EDne1K3g7r/HpUOU5lSHuZMqU6Xvxt+/ucTTWpznQUUvxbU80 yxpbnWRcFqs/sZ4EC+8E4lssEXo4fpNr3Zs+PNDHSzirnzYGWxo3TKX8VZeT2f6n9Fxn oBqx5+QlZj2g5T8lKn4IO0Tw9TvaRM3dthdvfLDolIb2Ty/vuiuGlPuBG07KHi+KyZsF Hy+/Bu4alzrPUvVkWDF6TDAnNJi8zS5Ar14pvkC8V0j0ko7HwSgaVMPMFrUtGWkWiOAn wphg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763343798; x=1763948598; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=HgoN/wtGUTYKYSpLhoX3NhelqvEjZ427OfnFc0Vq4MQ=; b=ofb40gPIW03apw1clNJjKVkVIEHIWMYrQ0Y1Nq0p+U/i7xDRBPJ/wI2eQJJLAo7kEH 3sp7zK4oIDB9IsA2dBqYc3wqnvDL4bxr5cvG5hUjZSOc6czmRtpV+ebJkUVYAwsQkQvH 4IdI/uhUgrveuE/2D09iRR5PNH4OGaPjxRx4GSFAMSh5uMOgmsvjKv7jt1heJtOGloCm eTvw233sowTTNusPitFi6yIwCToBO/0DcZQ9lWRMxYC1OwC7w/K5iSSZ5M2zJQRcqiL+ g2oS52rE4ulvvKMV3NwLKpuwJJuIYbLW4kESu62tq0NrwC2qcaZjRcZm/MJmf97xl0gf A3pQ== X-Forwarded-Encrypted: i=1; AJvYcCUbQVIkuzto2TcmiL5HD6kSiyDTonuCtXP3Lcjjv1fAzUzzO3D6Va3hUhs3o8PmjBlHA06N0GPjD6SRH/V0@lists.postgresql.org X-Gm-Message-State: AOJu0Yz5GQuXJRiijPE/ptLL+DQPk647eRU2+GKA9XWJKvv09pUd7cfc 5TNHrbLLLEN0tWU5aDzJ7cjjLzks//Qg+5pF8JzeoO2DLs7CaZWvLWq0/aMi1Lcw2LhS1EKd3bw p/Up59oH91NaqTM0deH4Qszso4yCYYcA= X-Gm-Gg: ASbGncuZ4P9jmtR0DaAugisrl+UsrS5PTLVur1Aju9a3WivrdEbbfRkqqIsj6dgel4j QtRSq+9NDNQ5qIwGFFRe3c8D7hMVbU+p4trdkCFOfqPflCQDiLdiOAwZlsFbMTHNm6t8elUuGM9 GEXMoZqT1KqgZxEH+QLQjIq38fz/7WUYlsjtD8GHMa4Wq5Qcf1nEGZqiUtUwHnItf4X+sfd4dr7 YjVWLw5Ou8mmpMWuEu95n3+ZeHhKHjN8dQn3ctASoX7ydNjuxCw2/PC41JYUx/LlJb2DQ== X-Google-Smtp-Source: AGHT+IEIBxgw6P085dOGNzAUxA6rp8gXOaUgx5X2qw+/ybXeHHLIAm4J5s+KfzwZRHOjaXDmpPPuRSmR8mW8ZZypwl4= X-Received: by 2002:a05:6402:51c7:b0:640:9aed:6ab6 with SMTP id 4fb4d7f45d1cf-64350e8a031mr9005876a12.24.1763343797608; Sun, 16 Nov 2025 17:43:17 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: wenhui qiu Date: Mon, 17 Nov 2025 09:43:05 +0800 X-Gm-Features: AWmQ_bm7GAY-ujHrTpN7B4-s9sRUKqjg5r8VpcbD_a6s8Fyev2mvHB_k-BZC1NU Message-ID: Subject: Re: Report oldest xmin source when autovacuum cannot remove tuples To: Sami Imseih Cc: Shinya Kato , PostgreSQL Hackers Content-Type: multipart/alternative; boundary="000000000000ac63660643c07873" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000ac63660643c07873 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Sami > Thinking about point 3 above, I began to wonder if this > whole thing can be simplified with inspiration. Looking at the > existing BackendXidGetPid(), I think it can. > Based on BackendXidGetPid(), I tried a new routine called > BackendXidFindCutOffReason() which can take in the cutoff xmin, > passed in by vacuum and can walk though the proc array and > determine the reason. We don't need to touch ComputeXidHorizons() > to make this work, it seems to me. This comes with an additional > walk though the procarray holding a shared lock, but I don't think > this will be an issue. > Attached is a rough sketch of BackendXidFindCutOffReason() > For now, I just added NOTICE messages which will log with > VACUUM (verbose) for testing. I like your idea , I think we also could consider introducing a GUC parameter in the future, which would terminate sessions blocking vacuum operations when the table's age reaches vacuum_failsafe_age. Thanks On Sat, Nov 15, 2025 at 8:25=E2=80=AFAM Sami Imseih w= rote: > Thanks for starting this thread! This is a very useful > feature that users will find beneficial to easily narrow > down the reason the xmin horizon is being held back, > and take action. > > Adding this information to the vacuum logging is useful, but > I can see this information being exposed in a view as well in > the future. > > I have a few comments: > > A few minor ones: > > 1/ pid should be declared as "pid_t" > > 2/ last value of an enum should be have a traling comma > +typedef enum OldestXminSource > +{ > + OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION, > + OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK, > + OLDESTXMIN_SOURCE_PREPARED_TRANSACTION, > + OLDESTXMIN_SOURCE_REPLICATION_SLOT, > + OLDESTXMIN_SOURCE_OTHER > +} OldestXminSource; > > More importantly: > > 3/ As mentioned earlier in the thread, the "idle-in-transaction" > transactions is not being reported correctly, particularly for write > tansactions. I think that is an important missing case. The reason > for this is the cutoff xmin is not being looked up against the current > list of xid's, so we are not blaming the correct pid. > > 4/ > Thinking about point 3 above, I began to wonder if this > whole thing can be simplified with inspiration. Looking at the > existing BackendXidGetPid(), I think it can. > > Based on BackendXidGetPid(), I tried a new routine called > BackendXidFindCutOffReason() which can take in the cutoff xmin, > passed in by vacuum and can walk though the proc array and > determine the reason. We don't need to touch ComputeXidHorizons() > to make this work, it seems to me. This comes with an additional > walk though the procarray holding a shared lock, but I don't think > this will be an issue. > > Attached is a rough sketch of BackendXidFindCutOffReason() > For now, I just added NOTICE messages which will log with > VACUUM (verbose) for testing. > > This takes what you are doing in v1 inside ComputeXidHorizons() > into a new routine. I think this is a cleaner approach. > > 5/ Also, I think we should also include tests for serializable > transactions > > > What do you think? > > -- > > Sami Imseih > Amazon Web Services (AWS) > --000000000000ac63660643c07873 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Sami=C2=A0

> Thinking about point 3 above, I= began to wonder if this
>=C2=A0whole thing can be simplified with inspiration. Looking at th= e
>=C2=A0existing= =C2=A0 BackendXidGetPid(), I think it can.

>=C2=A0Based on BackendXidGetPid(), I tried a = new routine called
>=C2= =A0BackendXidFindCutOffReason() which can take in the cutoff xmin,>=C2=A0passed in by= vacuum and can walk though the proc array and
>=C2=A0determine the reason. We don't need= to touch ComputeXidHorizons()
>=C2=A0to make this work, it seems to me. This comes with an a= dditional
>=C2=A0w= alk though the procarray holding a shared lock, but I don't think
&g= t;=C2=A0this will be an = issue.

>=C2=A0= Attached is a rough sketch of BackendXidFindCutOffReason()
>=C2=A0For now, I just added NOTIC= E messages which will log with
>=C2=A0VACUUM (verbose) for testing.
I like your= idea , I think we also could consider introducing a GUC parameter in the f= uture, which would terminate sessions blocking vacuum operations when the t= able's age reaches vacuum_failsafe_age.

Thanks=C2=A0

On Sat, Nov 15, 2025 a= t 8:25=E2=80=AFAM Sami Imseih <sa= mimseih@gmail.com> wrote:
Thanks for start= ing this thread! This is a very useful
feature that users will find beneficial to easily narrow
down the reason the xmin horizon is being held back,
and take action.

Adding this information to the vacuum logging is useful, but
I can see this information being exposed in a view as well in
the future.

I have a few comments:

A few minor ones:

1/ pid should be declared as "pid_t"

2/ last value of an enum should be have a traling comma
+typedef enum OldestXminSource
+{
+=C2=A0 =C2=A0 =C2=A0 =C2=A0OLDESTXMIN_SOURCE_ACTIVE_TRANSACTION,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0OLDESTXMIN_SOURCE_HOT_STANDBY_FEEDBACK,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0OLDESTXMIN_SOURCE_PREPARED_TRANSACTION,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0OLDESTXMIN_SOURCE_REPLICATION_SLOT,
+=C2=A0 =C2=A0 =C2=A0 =C2=A0OLDESTXMIN_SOURCE_OTHER
+} OldestXminSource;

More importantly:

3/ As mentioned earlier in the thread, the "idle-in-transaction"<= br> transactions is not being reported correctly, particularly for write
tansactions. I think that is an important missing case. The reason
for this is the cutoff xmin is not being looked up against the current
list of xid's, so we are not blaming the correct pid.

4/
Thinking about point 3 above, I began to wonder if this
whole thing can be simplified with inspiration. Looking at the
existing=C2=A0 BackendXidGetPid(), I think it can.

Based on BackendXidGetPid(), I tried a new routine called
BackendXidFindCutOffReason() which can take in the cutoff xmin,
passed in by vacuum and can walk though the proc array and
determine the reason. We don't need to touch ComputeXidHorizons()
to make this work, it seems to me. This comes with an additional
walk though the procarray holding a shared lock, but I don't think
this will be an issue.

Attached is a rough sketch of BackendXidFindCutOffReason()
For now, I just added NOTICE messages which will log with
VACUUM (verbose) for testing.

This takes what you are doing in v1 inside ComputeXidHorizons()
into a new routine. I think this is a cleaner approach.

5/ Also, I think we should also include tests for serializable
transactions


What do you think?

--

Sami Imseih
Amazon Web Services (AWS)
--000000000000ac63660643c07873--