Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPLV9-000fSs-20 for pgsql-general@arkaria.postgresql.org; Tue, 19 May 2026 14:32:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wPLV7-004Yh6-1c for pgsql-general@arkaria.postgresql.org; Tue, 19 May 2026 14:32:54 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wPLV6-004Ygr-33 for pgsql-general@lists.postgresql.org; Tue, 19 May 2026 14:32:54 +0000 Received: from mail-ej1-x62a.google.com ([2a00:1450:4864:20::62a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wPLV5-00000000L9I-0W7l for pgsql-general@postgresql.org; Tue, 19 May 2026 14:32:52 +0000 Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-bd85ebb368fso459329366b.1 for ; Tue, 19 May 2026 07:32:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1779201170; x=1779805970; darn=postgresql.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=uHH6l/FdBucZARUeKrPbyR9625Zz1sggAJa6XO2C3Jw=; b=IN71wTj0K0zdaxltVrjCQXX+EArQXvqukz104wV1UjtKVxNh5eTlPDVvCcGN4HJicN j80/a8RmIlbiptxxJX0suO7YZdivZ38nG2whbqZYTkvrVefNrhQCfPhuPhRtf1YMRNfB eDxINpNP5ZFASI4q4lhhIFrWpNfotapP3OWrsl//i66XJgwADl0tcPyj/QqsP1VJBnXT xVsjFwpkaJ+et51g6rpidc8vU6VS5K6Tnx+MiVaTTyj8KOMILPF66qbkKHts6712Sxt3 mUVEKSnQOiM2Pz7PneeOiWzxxwsiIHeYm/MsqI3pHsgbSgzH+yISEpb/8cSV8n7pBoSq 9raQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1779201170; x=1779805970; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uHH6l/FdBucZARUeKrPbyR9625Zz1sggAJa6XO2C3Jw=; b=F1qk83a0zFNZzN4N4ev4TsgHHCFqp/1uAxARVo5rZkZSxoVI0JkD1SR/53PAK+bMHS RJonKZmg29DsFeXdXaLzZ6wzhzg3K7arOGXDuBCpQFBgSIZOBV3oMd29BJuyN3KDK2Rg Z5MhWqIEXwcElSvioqgsQEQpLa9xZtnaMvPW9FJc5B4iNcjkogU7uts71/Chi44cF/Ey aj3eb0c8JeG6um0ABqL8lez21Gd8Y6+QKxyh43QR9kthmzusWqYrQWRGsSEp0KVolAPz JczCZDNfknbjwYsoMW28wRP4ZFiEfWsMiyP80JnI0zmVTTDCf96b1IXZ4mN2oHrUkx5H QNSA== X-Gm-Message-State: AOJu0YxNVz2MjcIQRJS4lvriD8iip1dcBn5XeOsjs/SucEvnN+HryHfK +vxnb2L8PHBYT74G+syS4GSi46LF8oT0XhSyERCScuD6aIsArWM/Q6Cr X-Gm-Gg: Acq92OFnm5iVbKTBEC8WiCiyDefbGU7JmuGT+7h9uKwbriPhpTGPLwoTUDm2yvKQarq s2+fGePMtYPuSNxw7ELQnshGiQ7390r466exSSEDH9o5pEGeVbAIL4qRHuQpSVK/td0i3QnXy7e qBFhe/36WjjZqifrIlUuuMOzcoSF0WqBrJLPPS/nKZfC1SmSHVjgelnpyogE1QHclnb9f4PAemk XJZ9XN4l1GoF1mEkpMJzdfw+pD2lrE0N5c7HdHivTrQ/shIIUg97dd47pENLFSeQYOOmBS2QJwU HpnbZjmwafcixb4bBuihvBNhgM5vqOHGp8NjxYxplEwpUrwUx4cKQPWMF8+mQ9/w8iNX6N1qyJ6 K4rpP4WZcvyXHG8+1a2gidlUGMFw4rXbJrYHucr/erZfen6lYlO2SEqjazLWVqDUgZuMpWFMoJI Fv82PmHYj8DzVPaWCrnwM/KRfLoYYEJrjoql4uAM4D6vBegQNsJ5q4gWs= X-Received: by 2002:a17:907:9806:b0:bd4:d755:1fc6 with SMTP id a640c23a62f3a-bd517aa7566mr1033632966b.33.1779201170233; Tue, 19 May 2026 07:32:50 -0700 (PDT) Received: from smtpclient.apple ([2a02:a44d:c827:0:f8e4:93c6:72ff:3674]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-bd4f4eb6320sm712512966b.59.2026.05.19.07.32.49 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 19 May 2026 07:32:49 -0700 (PDT) From: Jan Karremans Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_4846DF2C-C35F-4A5D-AD06-6CC7D8874551" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.600.51.1.1\)) Subject: Re: scaling up from t1n to 60 million records Date: Tue, 19 May 2026 16:32:39 +0200 In-Reply-To: Cc: "pgsql-general@postgresql.org" To: Martin Mueller References: X-Mailer: Apple Mail (2.3864.600.51.1.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_4846DF2C-C35F-4A5D-AD06-6CC7D8874551 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Dear Martin, I think you would be mostly good for just going ahead with this. You might look at the size of your tables, but I expect that all to be = well within safe ranges. Cheers, Jan > On 19 May 2026, at 16:27, Martin Mueller = wrote: >=20 > I use Postgres with a GUI frontend (Aquafold) as a very large = spreadsheet on steroids that analyzes rare or defective spellings in a = corpus of 65,000 texts and1.5 billion words. I typically extract data = from the corpus with python scripts, turn them into tables and load them = into the database. >=20 > On my Mac with 32 GB of memory performance is OK with queries that = typically within seconds extract data rows from tables with up to ten = million rows. If the result set is large, I suspect that most of time = machine's time is spent displaying result sets. I have used indexing = sparingly. While it helps, the time savings often don't matter much.=20 >=20 > I am thinking about scaling up to table with about 60 million rows. = Are there things to do or watch out for? Or should I proceed on the = assumption that that 60 million records are within scope and that the = added timecost is roughly linear? > =20 > Martin Mueller > Professor emeritus of English and Classics > Northwestern University > =20 > =20 > =20 --Apple-Mail=_4846DF2C-C35F-4A5D-AD06-6CC7D8874551 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=us-ascii Dear Martin,

I think you would be mostly good for just going ahead with this.
You might look at the size of your tables, but I expect that all to be well within safe ranges.

Cheers,
Jan

On 19 May 2026, at 16:27, Martin Mueller <martinmueller@northwestern.edu> wrote:

I use Postgres with a GUI frontend (Aquafold) as a very large spreadsheet on steroids that analyzes rare or defective spellings in a corpus of 65,000 texts and1.5 billion words.  I typically extract  data from the corpus with python scripts, turn them into tables and load them into the database.

On my Mac with 32 GB of memory performance is OK with queries that typically within seconds extract data rows from tables  with up to ten million rows.  If the result set is large, I suspect that most of time machine's time is spent displaying result sets. I have used indexing sparingly. While it helps, the time savings often don't matter much. 

I am thinking about scaling up to table with about 60 million rows.  Are there things to do or watch out for? Or should I proceed on the assumption that that 60 million records are within scope and that the added timecost is roughly linear?

 

Martin Mueller
Professor emeritus of English and Classics
Northwestern University

 

 

 


--Apple-Mail=_4846DF2C-C35F-4A5D-AD06-6CC7D8874551--