public inbox for pgsql-novice@postgresql.org  
help / color / mirror / Atom feed
From: Chris Papademetrious <Christopher.Papademetrious@synopsys.com>
To: pgsql-novice@lists.postgresql.org <pgsql-novice@lists.postgresql.org>
Subject: is there a way to automate deduplication of strings?
Date: Sat, 27 Dec 2025 12:36:20 +0000
Message-ID: <DM4PR12MB603953767048EE1B8A39283ADDB1A@DM4PR12MB6039.namprd12.prod.outlook.com> (raw)

Hello everyone! First time poster here.

I have a question about deduplicating text strings stored in a database. I am aware of the pattern of creating a separate table for unique values, then referencing those values by key. But this requires some transactional complexity for storage and retrieval, along with cleanup of no-longer-referenced values over time. And, this complexity grows with the number of primary-table columns that use this indirection.

I would only use this for (1) seldom-referenced columns that (2) have a high rate of duplication and (3) have an average string length that makes deduplication worthwhile.

Are there any native or extension-based methods to simplify this in Postgres? I searched and came up empty, but maybe I'm not searching with the right terms.

Thanks!


  *   Chris




reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-novice@postgresql.org
  Cc: Christopher.Papademetrious@synopsys.com, pgsql-novice@lists.postgresql.org
  Subject: Re: is there a way to automate deduplication of strings?
  In-Reply-To: <DM4PR12MB603953767048EE1B8A39283ADDB1A@DM4PR12MB6039.namprd12.prod.outlook.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox