Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUJY-001ajW-0o for pgsql-admin@arkaria.postgresql.org; Fri, 22 May 2026 18:09:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wQUJW-00DyeT-0q for pgsql-admin@arkaria.postgresql.org; Fri, 22 May 2026 18:09:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wQUJV-00DyeK-2q for pgsql-admin@lists.postgresql.org; Fri, 22 May 2026 18:09:38 +0000 Received: from rs.plausibolo.de ([85.214.83.89] helo=plausibolo.de) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wQUJU-00000000vaC-31TN for pgsql-admin@lists.postgresql.org; Fri, 22 May 2026 18:09:38 +0000 Received: from localhost (localhost [127.0.0.1]) by rs.plausibolo.de (Postfix) with ESMTP id 1BFE4380616; Fri, 22 May 2026 20:09:35 +0200 (CEST) Received: from plausibolo.de ([IPv6:::1]) by localhost (h2367442.stratoserver.net [IPv6:::1]) (amavis, port 10024) with ESMTP id 93IX51P3STXI; Fri, 22 May 2026 20:09:34 +0200 (CEST) Received: from [10.172.217.86] (unknown [185.239.72.18]) by rs.plausibolo.de (Postfix) with ESMTPSA id 32D96380246; Fri, 22 May 2026 20:09:34 +0200 (CEST) Content-Type: multipart/alternative; boundary="------------suQUw4fezmgFPuNuf74EgHcu" Message-ID: <7eb3ea21-5bf9-42bd-ac11-0fdc6f48866d@jakobs.com> Date: Fri, 22 May 2026 20:09:32 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Request For Feature: pg_dump To: pgsql-admin@lists.postgresql.org References: <26493.1779468779@sss.pgh.pa.us> Content-Language: en-GB, de-DE From: Holger Jakobs Autocrypt: addr=holger@jakobs.com; keydata= xsDiBEG8IA8RBAC6WqUzEPTjest7MvFca5WlI47EgtKzOy1D1X9gSEmPj0sQrilNGEitY/+Q kPnCvI3odz1XXf/MZQtXwlxJA4lPmx8/K7MqBj9vh0J5jRznpE7l+SBDmmDobIqBAgl6BZvH 1C7e2y72h5T1/plrXonLUGzthqBtsvWl0ogoyTJqiwCg6AOhYX+Xbcl8/AXy4F4qbUq4LXsD /j6yCKRkyleN688y9YpuS5J6/ZZ/OBCM74pa2iNge3GmXn1JeUowF693QFCGHkpswmDIoGhL 5iM6GHj9HFB6rTXW5H3sR1y6ta7vXwoNBGkgiM89B1a/KsLT5L6MZgZ5AV0tj6vxwAsuLixS SDiNVNAd8vp2FfqdLlOgThsgVIwKBACsCzZa1VxpdoNSYnkTOSZhEjZkbWApdneQ2bTxXJgW +fWgVeLiojCp6I1AmQqzwHz8UN1nrsQjH8Rnxt2J/C5H/Ek9jlOFakFQXlPMZw3tNeTA1P3B 3zPyq6hKFPZHFklUquE3gHVjnX9qxSW0xvFirf6xAMCMYWRnCQHRgiXoAc0hSG9sZ2VyIEph a29icyA8aG9sZ2VyQGpha29icy5jb20+wnsEExECADsCGwMGCwkIBwMCAxUCAwMWAgECHgEC F4AWIQScuJCxWJK1p5BOebGKV5yTsxlhugUCaVdWsQUJMQE4HAAKCRCKV5yTsxlhureAAJ96 hJrXLiFGQJknuPn8vYEFTvQe4gCgjfc8YoZBMiEgbgibbp+9Ho93YZLOwE0EQbwgEBAEAKIa ShfwcoKhx0LBF0zNz2yIOzXDRg1rKFM+cr0iLfDPNLZH01LJaw9BXLHqGRV9KYroITlHN+4N TOtgSDJpX/a9PckP0aSj5G4WeN5C9WjyFzX1uYFKf3kBOSL6EXL/rwyqkyWwbtjIKW5FKleH arGWFuHYXNMhCaIRABHiQUITAAMFA/9nVjqfChfD/kmYsksVy9lmUV+fiD1OPIxLAfnT8Beo 4ClxzL0lLOhMVjzsi2YaSEPSqPXw8kFK3a6oBIauTb/nlcrn3pFaeh7Iv2bOPmvrCgOo81Fc hsrzl+gZUgxGtv8S5+BI4/kfpUfJu8E8PgKmkTDhDR9GwwJo5y+JuSMeisJmBBgRAgAmAhsM FiEEnLiQsViStaeQTnmxileck7MZYboFAmlXVsYFCTEBODYACgkQileck7MZYbomuQCgj9xe vI2QAGxQS+u27l+7zpHvUAEAoNL+6+cWX1zG3GfEk5j0nAmwPtyi In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------suQUw4fezmgFPuNuf74EgHcu Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Am 22.05.26 um 19:20 schrieb Ron Johnson: > On Fri, May 22, 2026 at 12:53 PM Tom Lane wrote: > > Ron Johnson writes: > > In --format=directory mode, remove .dat files with zero data > records, and > > mark that table's toc.dat entry that it's an empty table. > > > Justification: *lots* of empty tables means *lots* of teeny-tiny > files in > > the DB's dump directory.  That unnecessarily bloats the fs, and > makes "du > > -c" really really slow. > > Evidence please?  Most file systems that I've looked at optimize > zero-size files pretty well. > > > They aren't zero bytes. > It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) > byte files.  66 thousand tiny files plus 8 thousand files with data in > them makes for a 2.4MB directory.  That's big and slow. > > $ find . -size 14c | wc >   66180   66180 1191240 > > $ zstd -dk 2115841.dat.zst > 2115841.dat.zst     : 5 bytes > > $ cat 2115841.dat > \. > > $ dir | grep " 14 " | head -n20 > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115841.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115842.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115843.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115844.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115845.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115851.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115899.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115901.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115902.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115903.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115905.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115907.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115909.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115913.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115915.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115917.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115919.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115923.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115926.dat.zst > -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 > 2115931.dat.zst > -- > Death to , and butter sauce. > Don't boil me, I'm still alive. > lobster! Maybe just avoiding to compress empty files would already do the job. I think any file below a certain size isn't worth compressing. Regards, Holger -- Holger Jakobs --------------suQUw4fezmgFPuNuf74EgHcu Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit
Am 22.05.26 um 19:20 schrieb Ron Johnson:
On Fri, May 22, 2026 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ron Johnson <ronljohnsonjr@gmail.com> writes:
> In --format=directory mode, remove .dat files with zero data records, and
> mark that table's toc.dat entry that it's an empty table.

> Justification: *lots* of empty tables means *lots* of teeny-tiny files in
> the DB's dump directory.  That unnecessarily bloats the fs, and makes "du
> -c" really really slow.

Evidence please?  Most file systems that I've looked at optimize
zero-size files pretty well.

They aren't zero bytes.
It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) byte files.  66 thousand tiny files plus 8 thousand files with data in them makes for a 2.4MB directory.  That's big and slow.

$ find . -size 14c | wc
  66180   66180 1191240

$ zstd -dk 2115841.dat.zst
2115841.dat.zst     : 5 bytes      

$ cat 2115841.dat
\.

$ dir | grep " 14 " | head -n20
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115841.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115842.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115843.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115844.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115845.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115851.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115899.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115901.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115902.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115903.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115905.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115907.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115909.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115913.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115915.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115917.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115919.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115923.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115926.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 2115931.dat.zst
 
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

Maybe just avoiding to compress empty files would already do the job. I think any file below a certain size isn't worth compressing.

Regards,

Holger

--

Holger Jakobs

--------------suQUw4fezmgFPuNuf74EgHcu--