MIME-Version: 1.0
References: 
 <CY8PR05MB1010861EAD48ED098786C9690C4002@CY8PR05MB10108.namprd05.prod.outlook.com>
 <ecd7305e-888b-43bb-9e16-4297c93e4904@aklaver.com>
 <CY8PR05MB10108C25DA7344760DAA414FEC4002@CY8PR05MB10108.namprd05.prod.outlook.com>
In-Reply-To: 
 <CY8PR05MB10108C25DA7344760DAA414FEC4002@CY8PR05MB10108.namprd05.prod.outlook.com>
From: Ron Johnson <ronljohnsonjr@gmail.com>
Date: Tue, 19 May 2026 16:18:04 -0400
Message-ID: 
 <CANzqJaBAf+Kf80dDXv-VvKpq2r1Gnwoxq3pQP1Fi+y+PJCbKDQ@mail.gmail.com>
Subject: Re: scaling up from t1n to 60 million records
To: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000017ad1706523161fc"
Archived-At: 
 <https://www.postgresql.org/message-id/CANzqJaBAf%2BKf80dDXv-VvKpq2r1Gnwoxq3pQP1Fi%2By%2BPJCbKDQ%40mail.gmail.com>
Precedence: bulk

--00000000000017ad1706523161fc
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Indices are your friend!  (Except when loading data.)

Add them on any relevant column.
https://www.postgresql.org/docs/16/sql-createindex.html

The "(expression)" clause might be useful in your situation, since it can
exclude some words from an index, exclude empty cells, index upper-case
versions of the word, etc.


On Tue, May 19, 2026 at 2:53=E2=80=AFPM Martin Mueller <
martinmueller@northwestern.edu> wrote:

> Here is  a more detailed version.
>
> I work on the curation of a corpus of some 65,000 Early Modern  texts wit=
h
> 1.5  billion words.  They exist as TEI-XML files and each word is wrapped
> in a <w> element.  Here are the first and last two words in the corpus
>
> <w lemma=3D"here" pos=3D"av" xml:id=3D"a73abc-001-b-0010">HEre</w>
>
>
> <w lemma=3D"begin" pos=3D"vvz" reg=3D"beginneth" xml:id=3D"a73abc-001-b-0=
020">begynneth</w>
>
> ...
>
>   <w lemma=3D"mercy" pos=3D"n1" xml:id=3D"e20ady-0008-3120">mercy</w>
>
>           <pc unit=3D"sentence" xml:id=3D"e20ady-0008-3130">.</pc>
>
>
> The corpus has many corrupt spellings and errors in the linguistic
> annotation. Most of them are low-frequency phenomena and occur in no more
> than 64 documents. In nearly all cases you have enough evidence to correc=
t
> a word or its annotation if you can see the word in the middle of a text
> string that  includes
>
>
>    1. the spelling
>    2. the lemma
>    3. the part of speech tag
>    4. a standard spelling (e.g. 'loue' for 'love')
>    5. up to seven previous words
>    6. up to seven next words
>    7. the spelling and POS tag of the previous
>    8. the lemma and POS tag of the next word
>    9. the Xpath of the current work
>
>
> My goal is to involve users of the corpus in identifying and correcting
> corrupt readings. I call this a "philological shopping cart" since the
> offering of a correction can be thought of as a sale. Instead of buying
> something, with the machine registering the who, what, when, and where of
> the purchase, I offer an emendation, with the machine registering the who=
,
> what, when, and where of my emendation.
>
> My hunch is that it would not be particularly difficult to build such a
> philological shopping cart and that in terms of scale it would not be a b=
ig
> thing.
>
> I am trying to mirror that "shopping cart" on my Mac.  There are about 60
> million word occurrences that occur in no more than 64 texts. The basic
> table has the columns described above, and half a dozen other columns for
> data entry and various counts.  There are some helper tables. The most
> important of them is a simple case-insensitive list of spellings with the=
ir
> document frequencies.  This is very useful for finding suspect spellings
> with queries like "show me all spellings in a low frequency range that
> contain 'tb' and look for words where replacing 'tb' with 'th' will find =
a
> word with a higher document frequency. That picks up spellings like 'tbe'=
,
> 'tbat', 'autboritie', etc.
>
> I've worked with KWIC tables of this kind for several years.  I have Aqua
> Data Studio as a front end for Postgres, currently version 17, running on=
 a
> five-year old Mac with an Intel processor and 32 GB of memory.  I know a
> lot less about the innards of a SQL database than I should.
>
> My largest kwic table has about 15 million rows with dozen columns for
> each row. Except for the left and right context, the columns consists of
> single words or numbers. The left and right context columns rarely add up
> to more than 35 characters each..  I have used plain indexes for some
> columns, with commands like "Create index on kwics16(keyword)", where
> 'kwics16' is the table name. My typical routine takes  a single-user
> interactive form: ask a query, wait for the results (typically seconds,
> sometimes a minute or more), and do something with the results.  I know
> next to nothing about the size of the database or tables, and it's not
> something Ihave needed to worry about. There are occasional memory bottle
> necks, because Aqua Data Studio isn't particularly good at release memory
> once it's no longer used. Closing and reopening the client fixes that.
>
> It takes an hour or so to upload a table of this kind into the database.
> Several tables of that size exist on my database and don't cause any
> trouble. I don't know at what point I would be running into constraints o=
f
> an aging Mac with 32 GB of memory and a 2 TB hard drive.
>
> I could comfortably live with what I'm doing now, dividing the data into
> three or four frequency ranges.
>
> Given this information, should I try and create a single table or am I
> likely to run into serious constraints if I move beyond my current maximu=
m
> table size of 15 million records.
>
> Perhaps there is no clear answer, and I should just experiment.  But if
> any reader with more knowledge of Postgres thinks that in my environment =
I
> would be skating on thin ice if I move beyond current limits, I'd be
> grateful to be told so.
>
>
>
>
>
>
>
>
>
>
> Martin Mueller
> Professor emeritus of English and Classics
> Northwestern University
> *From: *Adrian Klaver <adrian.klaver@aklaver.com>
> *Date: *Tuesday, May 19, 2026 at 09:45
> *To: *Martin Mueller <martinmueller@northwestern.edu>;
> pgsql-general@postgresql.org <pgsql-general@postgresql.org>
> *Subject: *Re: scaling up from t1n to 60 million records
>
> On 5/19/26 7:27 AM, Martin Mueller wrote:
> > I use Postgres with a GUI frontend (Aquafold) as a very large
> > spreadsheet on steroids that analyzes rare or defective spellings in a
> > corpus of 65,000 texts and1.5 billion words.  I typically extract  data
> > from the corpus with python scripts, turn them into tables and load the=
m
> > into the database.
> >
> >
> > On my Mac with 32 GB of memory performance is OK with queries that
> > typically within seconds extract data rows from tables  with up to ten
> > million rows.  If the result set is large, I suspect that most of time
> > machine's time is spent displaying result sets. I have used indexing
> > sparingly. While it helps, the time savings often don't matter much.
>
> This is going to need more information:
>
> 1) Postgres version.
>
> 2) The table schema including indexes.
>
> 3) An example of the query.
>
> 4) Where you are measuring the time.
>
> 5) The client you are displaying the results in.
>
> >
> >
> > I am thinking about scaling up to table with about 60 million rows.  Ar=
e
> > there things to do or watch out for? Or should I proceed on the
> > assumption that that 60 million records are within scope and that the
> > added timecost is roughly linear?
> >
> > Martin Mueller
> >
> > Professor emeritus of English and Classics
> >
> > Northwestern University
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>


--=20
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

--00000000000017ad1706523161fc
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Indices are your friend!=C2=A0 (Except wh=
en loading data.)</div><div dir=3D"ltr"><br></div><div>Add them on any rele=
vant column.</div><div><a href=3D"https://www.postgresql.org/docs/16/sql-cr=
eateindex.html">https://www.postgresql.org/docs/16/sql-createindex.html</a>=
</div><div><br></div><div>The &quot;(expression)&quot; clause might be usef=
ul in your situation, since it can exclude some words from an index, exclud=
e empty cells, index upper-case versions of the word, etc.</div><div><br></=
div><br><div class=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" c=
lass=3D"gmail_attr">On Tue, May 19, 2026 at 2:53=E2=80=AFPM Martin Mueller =
&lt;<a href=3D"mailto:martinmueller@northwestern.edu">martinmueller@northwe=
stern.edu</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex">


<div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
Here is =C2=A0a more detailed version.</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
I work on the curation of a corpus of some 65,000 Early Modern =C2=A0texts =
with 1.5 =C2=A0billion words.=C2=A0 They exist as TEI-XML files and each wo=
rd is wrapped in a &lt;w&gt; element.=C2=A0 Here are the first and last two=
 words in the corpus</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<p style=3D"direction:ltr;line-height:normal;margin:0px"><span style=3D"fon=
t-family:Helvetica;font-size:12px;color:rgb(0,0,150)">=E2=80=82=E2=80=82=E2=
=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=
=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82&lt;</spa=
n><span style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size:=
16pt;color:rgb(0,0,0)">w=C2=A0lemma=3D&quot;here&quot;=C2=A0pos=3D&quot;av&=
quot;=C2=A0xml:id=3D&quot;a73abc-001-b-0010&quot;&gt;HEre&lt;/w&gt;</span><=
/p>
<p style=3D"direction:ltr;line-height:normal;margin:0px"><span style=3D"fon=
t-family:Palatino,Arial,Helvetica,sans-serif;font-size:16pt;color:rgb(0,0,0=
)">=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82 &lt;w=C2=A0lemma=3D&quot;b=
egin&quot;=C2=A0pos=3D&quot;vvz&quot;=C2=A0reg=3D&quot;beginneth&quot;=C2=
=A0xml:id=3D&quot;a73abc-001-b-0020&quot;&gt;begynneth&lt;/w&gt;</span></p>
<p style=3D"direction:ltr;line-height:normal;margin:0px"><span style=3D"fon=
t-family:Palatino,Arial,Helvetica,sans-serif;font-size:16pt;color:rgb(0,0,0=
)">...</span></p>
<p style=3D"direction:ltr;line-height:normal;margin:0px"><span style=3D"fon=
t-family:Palatino,Arial,Helvetica,sans-serif;font-size:16pt;color:rgb(0,0,0=
)">=C2=A0=E2=80=82=E2=80=82=E2=80=82=E2=80=82=E2=80=82&lt;w=C2=A0lemma=3D&q=
uot;mercy&quot;=C2=A0pos=3D&quot;n1&quot;=C2=A0xml:id=3D&quot;e20ady-0008-3=
120&quot;&gt;mercy&lt;/w&gt;</span></p>
<p style=3D"direction:ltr;line-height:normal;margin:0px"><span style=3D"fon=
t-family:Palatino,Arial,Helvetica,sans-serif;font-size:16pt;color:rgb(0,0,0=
)">=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &lt;pc=C2=A0unit=3D&quot;sentence&quo=
t;=C2=A0xml:id=3D&quot;e20ady-0008-3130&quot;&gt;.&lt;/pc&gt;</span></p>
<div style=3D"direction:ltr;font-family:Helvetica;font-size:12px;color:rgb(=
0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;line-height:normal;margin:0px;font-family:Helve=
tica;font-size:12px;color:rgb(0,0,150)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
The corpus has many corrupt spellings and errors in the linguistic annotati=
on. Most of them are low-frequency phenomena and occur in no more than 64 d=
ocuments. In nearly all cases you have enough evidence to correct a word or=
 its annotation if you can see the
 word in the middle of a text string that =C2=A0includes</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<ol start=3D"1" style=3D"direction:ltr;margin-top:0px;margin-bottom:0px;lis=
t-style-type:decimal">
<li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size:16pt=
;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">the spelling</div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">the lemma</div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">the part of speech tag</=
div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">a standard spelling (e.g=
. &#39;loue&#39; for &#39;love&#39;)</div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">up to seven previous wor=
ds</div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">up to seven next words</=
div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">the spelling and POS tag=
 of the previous</div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">the lemma and POS tag of=
 the next word</div>
</li><li style=3D"font-family:Palatino,Arial,Helvetica,sans-serif;font-size=
:16pt;color:rgb(0,0,0);direction:ltr;margin-top:0px;margin-bottom:0px">
<div role=3D"presentation" style=3D"direction:ltr">the Xpath of the current=
 work</div>
</li></ol>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
My goal is to involve users of the corpus in identifying and correcting cor=
rupt readings. I call this a &quot;philological shopping cart&quot; since t=
he offering of a correction can be thought of as a sale. Instead of buying =
something, with the machine registering the
 who, what, when, and where of the purchase, I offer an emendation, with th=
e machine registering the who, what, when, and where of my emendation.</div=
>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
My hunch is that it would not be particularly difficult to build such a phi=
lological shopping cart and that in terms of scale it would not be a big th=
ing.=C2=A0</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
I am trying to mirror that &quot;shopping cart&quot; on my Mac.=C2=A0 There=
 are about 60 million word occurrences that occur in no more than 64 texts.=
 The basic table has the columns described above, and half a dozen other co=
lumns for data entry and various counts.=C2=A0 There
 are some helper tables. The most important of them is a simple case-insens=
itive list of spellings with their document frequencies.=C2=A0 This is very=
 useful for finding suspect spellings with queries like &quot;show me all s=
pellings in a low frequency range that contain
 &#39;tb&#39; and look for words where replacing &#39;tb&#39; with &#39;th&=
#39; will find a word with a higher document frequency. That picks up spell=
ings like &#39;tbe&#39;, &#39;tbat&#39;, &#39;autboritie&#39;, etc.</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
I&#39;ve worked with KWIC tables of this kind for several years.=C2=A0 I ha=
ve Aqua Data Studio as a front end for Postgres, currently version 17, runn=
ing on a five-year old Mac with an Intel processor and 32 GB of memory.=C2=
=A0 I know a lot less about the innards of a SQL
 database than I should.=C2=A0</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
My largest kwic table has about 15 million rows with dozen columns for each=
 row. Except for the left and right context, the columns consists of single=
 words or numbers. The left and right context columns rarely add up to more=
 than 35 characters each..=C2=A0 I have
 used plain indexes for some columns, with commands like &quot;Create index=
 on kwics16(keyword)&quot;, where &#39;kwics16&#39; is the table name. My t=
ypical routine takes =C2=A0a single-user interactive form: ask a query, wai=
t for the results (typically seconds, sometimes a minute
 or more), and do something with the results.=C2=A0 I know next to nothing =
about the size of the database or tables, and it&#39;s not something Ihave =
needed to worry about. There are occasional memory bottle necks, because Aq=
ua Data Studio isn&#39;t particularly good at
 release memory once it&#39;s no longer used. Closing and reopening the cli=
ent fixes that.=C2=A0</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
It takes an hour or so to upload a table of this kind into the database.=C2=
=A0 Several tables of that size exist on my database and don&#39;t cause an=
y trouble. I don&#39;t know at what point I would be running into constrain=
ts of an aging Mac with 32 GB of memory and a
 2 TB hard drive.=C2=A0</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
I could comfortably live with what I&#39;m doing now, dividing the data int=
o three or four frequency ranges.=C2=A0</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
Given this information, should I try and create a single table or am I like=
ly to run into serious constraints if I move beyond my current maximum tabl=
e size of 15 million records.</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
Perhaps there is no clear answer, and I should just experiment.=C2=A0 But i=
f any reader with more knowledge of Postgres thinks that in my environment =
I would be skating on thin ice if I move beyond current limits, I&#39;d be =
grateful to be told so.=C2=A0</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div style=3D"direction:ltr;font-family:Palatino,Arial,Helvetica,sans-serif=
;font-size:16pt;color:rgb(0,0,0)">
<br>
</div>
<div id=3D"m_5084555173038495112ms-outlook-mobile-signature">
<div class=3D"MsoNormal" style=3D"font-size:14pt">Martin Mueller</div>
<div class=3D"MsoNormal" style=3D"font-size:14pt">Professor emeritus of Eng=
lish and Classics</div>
<div class=3D"MsoNormal" style=3D"font-size:14pt">Northwestern University</=
div>
</div>
<div id=3D"m_5084555173038495112mail-editor-reference-message-container">
<div>

</div>
<div style=3D"padding:3pt 0in 0in;border-width:1pt medium medium;border-sty=
le:solid none none;border-color:rgb(181,196,223) currentcolor currentcolor"=
>
<div style=3D"text-align:left;font-family:Aptos;font-size:12pt;color:black"=
>
<b>From: </b>Adrian Klaver &lt;<a href=3D"mailto:adrian.klaver@aklaver.com"=
 target=3D"_blank">adrian.klaver@aklaver.com</a>&gt;<br>
<b>Date: </b>Tuesday, May 19, 2026 at 09:45<br>
<b>To: </b>Martin Mueller &lt;<a href=3D"mailto:martinmueller@northwestern.=
edu" target=3D"_blank">martinmueller@northwestern.edu</a>&gt;; <a href=3D"m=
ailto:pgsql-general@postgresql.org" target=3D"_blank">pgsql-general@postgre=
sql.org</a> &lt;<a href=3D"mailto:pgsql-general@postgresql.org" target=3D"_=
blank">pgsql-general@postgresql.org</a>&gt;<br>
<b>Subject: </b>Re: scaling up from t1n to 60 million records<br>
<br>
</div>
</div>
<div style=3D"font-size:11pt">On 5/19/26 7:27 AM, Martin Mueller wrote:<br>
&gt; I use Postgres with a GUI frontend (Aquafold) as a very large<br>
&gt; spreadsheet on steroids that analyzes rare or defective spellings in a=
<br>
&gt; corpus of 65,000 texts and1.5 billion words.=C2=A0 I typically extract=
 =C2=A0data<br>
&gt; from the corpus with python scripts, turn them into tables and load th=
em<br>
&gt; into the database.<br>
&gt;<br>
&gt;<br>
&gt; On my Mac with 32 GB of memory performance is OK with queries that<br>
&gt; typically within seconds extract data rows from tables =C2=A0with up t=
o ten<br>
&gt; million rows.=C2=A0 If the result set is large, I suspect that most of=
 time<br>
&gt; machine&#39;s time is spent displaying result sets. I have used indexi=
ng<br>
&gt; sparingly. While it helps, the time savings often don&#39;t matter muc=
h.<br>
<br>
This is going to need more information:<br>
<br>
1) Postgres version.<br>
<br>
2) The table schema including indexes.<br>
<br>
3) An example of the query.<br>
<br>
4) Where you are measuring the time.<br>
<br>
5) The client you are displaying the results in.<br>
<br>
&gt;<br>
&gt;<br>
&gt; I am thinking about scaling up to table with about 60 million rows.=C2=
=A0 Are<br>
&gt; there things to do or watch out for? Or should I proceed on the<br>
&gt; assumption that that 60 million records are within scope and that the<=
br>
&gt; added timecost is roughly linear?<br>
&gt;<br>
&gt; Martin Mueller<br>
&gt;<br>
&gt; Professor emeritus of English and Classics<br>
&gt;<br>
&gt; Northwestern University<br>
&gt;<br>
<br>
<br>
--<br>
Adrian Klaver<br>
<a href=3D"mailto:adrian.klaver@aklaver.com" target=3D"_blank">adrian.klave=
r@aklaver.com</a><br>
</div>
</div>
</div>

</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr">Death to &lt;Redacted&gt;, and butter sauce.<div=
>Don&#39;t boil me, I&#39;m still alive.<br><div><div>&lt;Redacted&gt; lobs=
ter!</div></div></div></div></div></div>

--00000000000017ad1706523161fc--