public inbox for pgsql-docs@postgresql.org
help / color / mirror / Atom feedFrom: Bernice Southey <bernice.southey@gmail.com>
To: Bruce Momjian <bruce@momjian.us>
Cc: pgsql-docs@lists.postgresql.org
Subject: Re: More guidance on ctid
Date: Sat, 29 Nov 2025 22:45:45 +0000
Message-ID: <CAEDh4nzDMoRDZMDGGxKHzJ5hZscmvT2FWioTzQ6A0aJVtCO8NQ@mail.gmail.com> (raw)
In-Reply-To: <aSTL3rgmztLq1UIH@momjian.us>
References: <CAEDh4nyn5swFYuSfcnGAbpQrKOc47Hh_ZyKVSPYJcu2P=51Luw@mail.gmail.com>
<aSTL3rgmztLq1UIH@momjian.us>
Bruce Momjian <bruce@momjian.us> wrote:
> I think the reporter above extrapolated this example to use ctid in
> other, non-appropriate cases. Should we add a warning to the docs to
> explain that the general use of ctid is discouraged?
I've been explaining this badly. Here's an example that I hope will
make clear why I'm so concerned.
(Read Committed default isolation level)
drop table if exists b1, b2;
create table b1 as select 1 id, 100 balance;
create table b2 as select 1 id, 100 balance;
With an immutable column self-join, it works as expected and
serializes the balance changes.
--session1
begin;
with x as (select id from b1 order by id for update)
update b1 set balance = balance + 1 from x where b1.id = x.id;
--session2
with x as (select id from b1 order by id for update)
update b1 set balance = balance - 100 from x where b1.id = x.id;
select * from b1;
--session1
commit;
--session2
UPDATE 1
id | balance
----+---------
1 | 1
(1 row)
But with ctid, the second update is lost.
--session1
begin;
with x as (select ctid from b2 order by id for update)
update b2 set balance = balance + 1 from x where b2.ctid = x.ctid;
--session2
with x as (select ctid from b2 order by id for update)
update b2 set balance = balance - 100 from x where b2.ctid = x.ctid;
select * from b2;
--session1
commit;
--session2
UPDATE 0
id | balance
----+---------
1 | 101
(1 row)
The session2 CTE blocks until it gets the updated version of the row
from session 1 commit. It gets b2.ctid = (0,2) giving x.ctid = (0,2).
But the UPDATE gets b2.ctid = (0,1) at the start of the transaction.
This doesn't change even after session 1 commits. The join fails. This
is correct, but not obvious.
This is why I think the docs should say something different to what
they currently do. The UPDATE example is fine only because it's run
repeatedly until there are no more rows to find.
I learned the primary key self-join pattern for order by/limit in
delete/update soon after I began with postgres. I'm pretty sure ctid
would have confused me. So I think the doc examples with an id column
will be very helpful, and safer.
If users discover ctid on internet forums or LLMs and then check the
docs, I think they should get some notice of the concurrency risk.
Here's a patch with an attempt to do the above.
Forum examples:
https://stackoverflow.com/questions/10245560/deadlocks-in-postgresql-when-running-update/71163671#71...
https://www.reddit.com/r/PostgreSQL/comments/1cn0q1c/comment/l36ppve/
Thanks, Bernice
Attachments:
[text/x-patch] v1-add-ctid-guidance-and-remove-from-examples.patch (3.1K, 2-v1-add-ctid-guidance-and-remove-from-examples.patch)
download | inline diff:
From 69595aeb9f1e6eda74fa29fde22bd117a8044812 Mon Sep 17 00:00:00 2001
From: Bernice Southey <bernice.southey@gmail.com>
Date: Thu, 20 Nov 2025 11:53:57 +0000
Subject: [PATCH] Add ctid guidance and remove from examples
---
doc/src/sgml/ddl.sgml | 5 ++++-
doc/src/sgml/ref/delete.sgml | 10 +++++-----
doc/src/sgml/ref/update.sgml | 9 +++------
3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 09ad8400fd0..99c8facfe0b 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1559,7 +1559,10 @@ CREATE TABLE circles (
<structfield>ctid</structfield> will change if it is
updated or moved by <command>VACUUM FULL</command>. Therefore
<structfield>ctid</structfield> is useless as a long-term row
- identifier. A primary key should be used to identify logical rows.
+ identifier. Be aware that ctids will change in data-modifying
+ concurrent transactions, and they are not unique across table partitions.
+ A primary key should be used to identify logical rows.
+
</para>
</listitem>
</varlistentry>
diff --git a/doc/src/sgml/ref/delete.sgml b/doc/src/sgml/ref/delete.sgml
index 5b52f77e28f..48eb1b6755a 100644
--- a/doc/src/sgml/ref/delete.sgml
+++ b/doc/src/sgml/ref/delete.sgml
@@ -313,15 +313,15 @@ DELETE FROM tasks WHERE CURRENT OF c_tasks;
documentation of <command>UPDATE</command></link>:
<programlisting>
WITH delete_batch AS (
- SELECT l.ctid FROM user_logs AS l
- WHERE l.status = 'archived'
- ORDER BY l.creation_date
+ SELECT a.id FROM accounts AS a
+ WHERE a.status = 'archived'
+ ORDER BY a.id
FOR UPDATE
LIMIT 10000
)
-DELETE FROM user_logs AS dl
+DELETE FROM accounts AS dl
USING delete_batch AS del
- WHERE dl.ctid = del.ctid;
+ WHERE dl.id = del.id;
</programlisting>
</para>
</refsect1>
diff --git a/doc/src/sgml/ref/update.sgml b/doc/src/sgml/ref/update.sgml
index 40cca063946..7374b884080 100644
--- a/doc/src/sgml/ref/update.sgml
+++ b/doc/src/sgml/ref/update.sgml
@@ -486,13 +486,10 @@ UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films;
no <literal>LIMIT</literal> clause for <command>UPDATE</command>, it is
possible to get a similar effect through the use of
a <link linkend="queries-with">Common Table Expression</link> and a
- self-join. With the standard <productname>PostgreSQL</productname>
- table access method, a self-join on the system
- column <link linkend="ddl-system-columns-ctid">ctid</link> is very
- efficient:
+ self-join:
<programlisting>
WITH exceeded_max_retries AS (
- SELECT w.ctid FROM work_item AS w
+ SELECT w.id FROM work_item AS w
WHERE w.status = 'active' AND w.num_retries > 10
ORDER BY w.retry_timestamp
FOR UPDATE
@@ -500,7 +497,7 @@ WITH exceeded_max_retries AS (
)
UPDATE work_item SET status = 'failed'
FROM exceeded_max_retries AS emr
- WHERE work_item.ctid = emr.ctid;
+ WHERE work_item.id = emr.id;
</programlisting>
This command will need to be repeated until no rows remain to be updated.
Use of an <literal>ORDER BY</literal> clause allows the command to
--
2.43.0
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-docs@postgresql.org
Cc: bernice.southey@gmail.com, bruce@momjian.us, pgsql-docs@lists.postgresql.org
Subject: Re: More guidance on ctid
In-Reply-To: <CAEDh4nzDMoRDZMDGGxKHzJ5hZscmvT2FWioTzQ6A0aJVtCO8NQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox