Re: scaling up from t1n to 60 million records

public inbox for pgsql-general@postgresql.org  
help / color / mirror / Atom feed

From: Adrian Klaver <adrian.klaver@aklaver.com>
To: Martin Mueller <martinmueller@northwestern.edu>
To: pgsql-general@postgresql.org <pgsql-general@postgresql.org>
Subject: Re: scaling up from t1n to 60 million records
Date: Tue, 19 May 2026 07:44:57 -0700
Message-ID: <ecd7305e-888b-43bb-9e16-4297c93e4904@aklaver.com> (raw)
In-Reply-To: <CY8PR05MB1010861EAD48ED098786C9690C4002@CY8PR05MB10108.namprd05.prod.outlook.com>
References: <CY8PR05MB1010861EAD48ED098786C9690C4002@CY8PR05MB10108.namprd05.prod.outlook.com>

On 5/19/26 7:27 AM, Martin Mueller wrote:
> I use Postgres with a GUI frontend (Aquafold) as a very large 
> spreadsheet on steroids that analyzes rare or defective spellings in a 
> corpus of 65,000 texts and1.5 billion words.  I typically extract  data 
> from the corpus with python scripts, turn them into tables and load them 
> into the database.
> 
> 
> On my Mac with 32 GB of memory performance is OK with queries that 
> typically within seconds extract data rows from tables  with up to ten 
> million rows.  If the result set is large, I suspect that most of time 
> machine's time is spent displaying result sets. I have used indexing 
> sparingly. While it helps, the time savings often don't matter much.

This is going to need more information:

1) Postgres version.

2) The table schema including indexes.

3) An example of the query.

4) Where you are measuring the time.

5) The client you are displaying the results in.

> 
> 
> I am thinking about scaling up to table with about 60 million rows.  Are 
> there things to do or watch out for? Or should I proceed on the 
> assumption that that 60 million records are within scope and that the 
> added timecost is roughly linear?
> 
> Martin Mueller
> 
> Professor emeritus of English and Classics
> 
> Northwestern University
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-general@postgresql.org
  Cc: adrian.klaver@aklaver.com, martinmueller@northwestern.edu
  Subject: Re: scaling up from t1n to 60 million records
  In-Reply-To: <ecd7305e-888b-43bb-9e16-4297c93e4904@aklaver.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox