Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPLdu-000fZ8-0k for pgsql-general@arkaria.postgresql.org; Tue, 19 May 2026 14:41:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wPLdr-004bXp-2n for pgsql-general@arkaria.postgresql.org; Tue, 19 May 2026 14:41:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPLdr-004bXg-1N for pgsql-general@lists.postgresql.org; Tue, 19 May 2026 14:41:56 +0000 Received: from mail-oi1-x22d.google.com ([2607:f8b0:4864:20::22d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wPLdp-00000000LE8-3WpK for pgsql-general@postgresql.org; Tue, 19 May 2026 14:41:55 +0000 Received: by mail-oi1-x22d.google.com with SMTP id 5614622812f47-479d9b155deso1082707b6e.3 for ; Tue, 19 May 2026 07:41:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1779201714; cv=none; d=google.com; s=arc-20240605; b=bAC0CuIBYcyUuZN/rlHxettnj4kn4PwRkCIvizSNdaDE3mCbvdK1G7O+NBHtysoVcG g+J3Q2P6NZKmVxlKGhboZiqYv76tKU/9XZ5u556SLA+2GWRIGdL5weC+T3W385rbgWsS 2YPaGoyTpnsA1L03Bym0CWF/SVTGSmEHPz6snr3IVNlvTVaJHNMrV72Lm9md94YSm6XE DmqeWR7qGCoE3LqD42nfQqV/lZ/uZvNQ69fuNXWqhHzbzwzRrFJHPjj2hByrn5Mo+37q ue44gr0BccqFGBDxmhhjeznG9K8FMBfst+ilF0KLWX0/nidsBvHgmd7H/L77bxD0WIVb 8tCg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :dkim-signature; bh=TAmTEJ42ErFs0lyIs43NjeWfUZsQ/06obSTjBGUHLTg=; fh=xlEumQ/ItFTuseMU8abXWvw+r9sDum0uiRjXXzYxDTI=; b=EINboB6qpsVlx+T2oyBwimskWgmickTy0qhKFCtaQ2LOymUJVME0fCD7lRVAGEXwUH wM0iu92AHqTB8K5PNZDz/TQuHfJ2JkWrH1vnHjnkOgnGOv/E+xXLDnHXCz7PWRKVKfka MTLdpZ3jhMXacCfgaeM3js4SxRSjaP5iuyWCONyvFpWO3hyFOMrnJrNJ5fg1lvPGYnKq pTx/JvdYktpM6frV0Im+O/+8HidC4pP3M4lEaxHYqAu1e7S9CpzLQXDvhezjyZVu5w/r nJBygIfYTL87w1ARlng/E/95o781D1LaBgUQF8ZSxR9+fetrAFO+yS0k/YsngUrX8Nrc m/bg==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779201714; x=1779806514; darn=postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=TAmTEJ42ErFs0lyIs43NjeWfUZsQ/06obSTjBGUHLTg=; b=Y+v9SWjNOsPXKZAk+kG9SVrt4tCkj/oambbhMZ+sOBbvghc1qkDFY4SHeP9x9nS+Q2 MQZ+UM/vbjPmYffTvl8jsva9k3RzXVx4BgtP/DSRJZtqJwYcrdJT2tOC3AlL9FXeAdmQ xXEPvLaJwr2CcU5x+s8EEZGWhoWbx/SyWphNi42Xu853uK4Q8Uuf7v4InRbUHd8teqrX vR95Ln5wPSlKPUIr2+9MyIieusWwqOQa1N+kaJavPN/jo7G7vxPU7sOcU6Pb8nGL2Yqz CF1aJhQSg31q143ADjKH52Udzu1e77ttIIVCcCzYZKlWwHTnj0QAdZP9VkKWeF1NXJHu sSLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779201714; x=1779806514; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=TAmTEJ42ErFs0lyIs43NjeWfUZsQ/06obSTjBGUHLTg=; b=srIZqYajgsEyHyORyVr8s2eJp7EM3il5OivByJ8G37+LXWXafHhpFjhpe9SIwGJ4OL LBNVFGY5YfaVPVrbAChCAGY/QMZkp0xodHgKIoc5oTKfXHjEh71XVwvy5NlBaWnSo9QP xfudkpPh01W4jvwd2tL8mMol+1MHDeEQA8fUwaMCB3A8UpT6fuGvcfE8Orc0giyBwmY1 8Et+b4Ap46qSKU74jsCY7/VbwbOYMd05Wjfs55bPyB5YIFhOx9BKCoat9hwqXEmyOw02 WEysF3aZc1GP7QIkY3slBlnaNQ/cA/dMbjZPHFbTLl20Nl7xrP0SaqS6hi/X6CHzcxrg 7EBA== X-Gm-Message-State: AOJu0Yz7dEAF4WrUm9uPDLT6/t6VuB26/IWJCdITwDT9BPEFuJ7Z/zmp AAMVg6p/ctrsDs4ojG5lFyPcqCYw+xj7ZrBeJg91K+HMB9H1m6ArYMocUmAZ6qF8tqqtmQT46C4 T5KyPKxGFzgtDF7yC4ivSclXTg9vEGi5pIZus X-Gm-Gg: Acq92OHpgDqm6WURxKOd0sQX2nNXtUG7oSe4Ep9l6QC+KMQSCdNvFxbltk6tuj4rZ39 nrwHPKT2FIIX6dWoFzHT4CZuFQUWyZqwi2EM90GRGnzHts8ThNY68KjKhrGGFRjDrUINmF5iETG PECNb0gLNtAsrB7WcXoaDxnV+4m80gk8eKa7jdIf302O89hTBjWIN/Fvt5gtKHiTXEt3hrPRoG3 6Uv7mJf7tcojJ5tYOP1OLEucjmo1IaVyYxupBI409EFBZIBaL453J+W8YLFYZou4/vu9kR0yuql gWRRkRe9XmGcGb52sZI= X-Received: by 2002:a05:6809:243:20b0:484:afc2:eecc with SMTP id 5614622812f47-484afc30775mr5261804b6e.5.1779201713824; Tue, 19 May 2026 07:41:53 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Tue, 19 May 2026 10:41:42 -0400 X-Gm-Features: AVHnY4JCZDnqbq5VGwBXT7KT98kqW79dexR9_J-gKMfJnMmEGcpOOgQAaycnmKY Message-ID: Subject: Re: scaling up from t1n to 60 million records To: "pgsql-general@postgresql.org" Content-Type: multipart/alternative; boundary="00000000000022eaca06522cae72" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000022eaca06522cae72 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, May 19, 2026 at 10:27=E2=80=AFAM Martin Mueller < martinmueller@northwestern.edu> wrote: > I use Postgres with a GUI frontend (Aquafold) as a very large spreadsheet > on steroids that analyzes rare or defective spellings in a corpus of 65,0= 00 > texts and1.5 billion words. I typically extract data from the corpus wi= th > python scripts, turn them into tables and load them into the database. > > > On my Mac with 32 GB of memory performance is OK with queries that > typically within seconds extract data rows from tables with up to ten > million rows. If the result set is large, I suspect that most of time > machine's time is spent displaying result sets. I have used indexing > sparingly. While it helps, the time savings often don't matter much. > > > I am thinking about scaling up to table with about 60 million rows. Are > there things to do or watch out for? > Use the correct tool for the task at hand, even if you are not a carpenter and thus only know how to use a hammer. Or should I proceed on the assumption that that 60 million records are > within scope and that the added timecost is roughly linear? > In my experience, database performance shows a hockey stick graph: good while stuff fits in memory, and then suddenly not so good. The correct tool for full text search is PG's Full Text Search (ts_vector) facility, paired with GIN indexes. Do you use them? Probably not, based on your comments, but that would "keep 'everything' in memory", thus staving off performance degradation. --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --00000000000022eaca06522cae72 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Tue, May 19, 2026 at 10:27=E2=80=AFAM = Martin Mueller <martin= mueller@northwestern.edu> wrote:

I use Postgres with a GUI f= rontend (Aquafold) as a very large spreadsheet on steroids that analyzes ra= re or defective spellings in a corpus of 65,000 texts and1.5 billion words.= =C2=A0 I typically extract =C2=A0data from the corpus with python scripts, turn them into tables and load them i= nto the database.


On my Mac with 32 GB of mem= ory performance is OK with queries that typically within seconds extract da= ta rows from tables =C2=A0with up to ten million rows.=C2=A0 If the result = set is large, I suspect that most of time machine's time is spent displaying result sets. I have used indexing s= paringly. While it helps, the time savings often don't matter much.=C2= =A0


I am thinking about scaling= up to table with about 60 million rows.=C2=A0 Are there things to do or wa= tch out for?

=C2=A0
Use the co= rrect tool for the task at hand, even if you are not a carpenter and thus o= nly know how to use a hammer.

Or should I proceed on the assumption that that 60 million = records are within scope and that the added timecost is roughly linear?

=

In my experience, database performance shows a ho= ckey stick graph: good while stuff fits in memory, and then suddenly not so= good.

The correct = tool for full text search is PG's Full Text Search (ts_vector) facility= , paired with GIN indexes.=C2=A0 Do you use them?=C2=A0 Probably not, based= on your comments, but that would "keep 'everything' in memory= ", thus staving off performance degradation.

--
Death to <Redacted>, and butter sauc= e.
Don't boil me, I'm still alive.
<Redacted>= ; lobster!
--00000000000022eaca06522cae72--