Re: scaling up from t1n to 60 million records

public inbox for pgsql-general@postgresql.org  
help / color / mirror / Atom feed

From: Ron Johnson <ronljohnsonjr@gmail.com>
To: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Subject: Re: scaling up from t1n to 60 million records
Date: Tue, 19 May 2026 10:41:42 -0400
Message-ID: <CANzqJaA4S080-9tSOLZaWnfY6QbxZc8WQHv2JB4dOveWOkZh4g@mail.gmail.com> (raw)
In-Reply-To: <CY8PR05MB1010861EAD48ED098786C9690C4002@CY8PR05MB10108.namprd05.prod.outlook.com>
References: <CY8PR05MB1010861EAD48ED098786C9690C4002@CY8PR05MB10108.namprd05.prod.outlook.com>

On Tue, May 19, 2026 at 10:27 AM Martin Mueller <
martinmueller@northwestern.edu> wrote:

> I use Postgres with a GUI frontend (Aquafold) as a very large spreadsheet
> on steroids that analyzes rare or defective spellings in a corpus of 65,000
> texts and1.5 billion words.  I typically extract  data from the corpus with
> python scripts, turn them into tables and load them into the database.
>
>
> On my Mac with 32 GB of memory performance is OK with queries that
> typically within seconds extract data rows from tables  with up to ten
> million rows.  If the result set is large, I suspect that most of time
> machine's time is spent displaying result sets. I have used indexing
> sparingly. While it helps, the time savings often don't matter much.
>
>
> I am thinking about scaling up to table with about 60 million rows.  Are
> there things to do or watch out for?
>

Use the correct tool for the task at hand, even if you are not a carpenter
and thus only know how to use a hammer.

Or should I proceed on the assumption that that 60 million records are
> within scope and that the added timecost is roughly linear?
>

In my experience, database performance shows a hockey stick graph: good
while stuff fits in memory, and then suddenly not so good.

The correct tool for full text search is PG's Full Text Search (ts_vector)
facility, paired with GIN indexes.  Do you use them?  Probably not, based
on your comments, but that would "keep 'everything' in memory", thus
staving off performance degradation.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-general@postgresql.org
  Cc: ronljohnsonjr@gmail.com
  Subject: Re: scaling up from t1n to 60 million records
  In-Reply-To: <CANzqJaA4S080-9tSOLZaWnfY6QbxZc8WQHv2JB4dOveWOkZh4g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox