public inbox for pgsql-admin@postgresql.org  
help / color / mirror / Atom feed
Request For Feature: pg_dump
5+ messages / 3 participants
[nested] [flat]

* Request For Feature: pg_dump
@ 2026-05-22 13:32 Ron Johnson <ronljohnsonjr@gmail.com>
  2026-05-22 16:52 ` Re: Request For Feature: pg_dump Tom Lane <tgl@sss.pgh.pa.us>
  0 siblings, 1 reply; 5+ messages in thread

From: Ron Johnson @ 2026-05-22 13:32 UTC (permalink / raw)
  To: Pgsql-admin <pgsql-admin@lists.postgresql.org>

In --format=directory mode, remove .dat files with zero data records, and
mark that table's toc.dat entry that it's an empty table.

Justification: *lots* of empty tables means *lots* of teeny-tiny files in
the DB's dump directory.  That unnecessarily bloats the fs, and makes "du
-c" really really slow.

But why are there sooo many empty tables?  You shouldn't have so many empty
tables!

Yeah, well, software (especially 3rd-party software that must be generic to
satisfy the varying needs of a large and varied customer base) can't always
be perfectly tuned to the precise and immediate needs of a particular
site.  Partitioning makes that much much worse.

We've survived this long without it, but pgbackrest has a similar feature
(though implemented differently from how pg_dump would do it), and it's
*really* handy.

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Request For Feature: pg_dump
  2026-05-22 13:32 Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
@ 2026-05-22 16:52 ` Tom Lane <tgl@sss.pgh.pa.us>
  2026-05-22 17:20   ` Re: Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread

From: Tom Lane @ 2026-05-22 16:52 UTC (permalink / raw)
  To: Ron Johnson <ronljohnsonjr@gmail.com>; +Cc: Pgsql-admin <pgsql-admin@lists.postgresql.org>

Ron Johnson <ronljohnsonjr@gmail.com> writes:
> In --format=directory mode, remove .dat files with zero data records, and
> mark that table's toc.dat entry that it's an empty table.

> Justification: *lots* of empty tables means *lots* of teeny-tiny files in
> the DB's dump directory.  That unnecessarily bloats the fs, and makes "du
> -c" really really slow.

Evidence please?  Most file systems that I've looked at optimize
zero-size files pretty well.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Request For Feature: pg_dump
  2026-05-22 13:32 Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
  2026-05-22 16:52 ` Re: Request For Feature: pg_dump Tom Lane <tgl@sss.pgh.pa.us>
@ 2026-05-22 17:20   ` Ron Johnson <ronljohnsonjr@gmail.com>
  2026-05-22 18:09     ` Re: Request For Feature: pg_dump Holger Jakobs <holger@jakobs.com>
  0 siblings, 1 reply; 5+ messages in thread

From: Ron Johnson @ 2026-05-22 17:20 UTC (permalink / raw)
  To: Tom Lane <tgl@sss.pgh.pa.us>; +Cc: Pgsql-admin <pgsql-admin@lists.postgresql.org>

On Fri, May 22, 2026 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ron Johnson <ronljohnsonjr@gmail.com> writes:
> > In --format=directory mode, remove .dat files with zero data records, and
> > mark that table's toc.dat entry that it's an empty table.
>
> > Justification: *lots* of empty tables means *lots* of teeny-tiny files in
> > the DB's dump directory.  That unnecessarily bloats the fs, and makes "du
> > -c" really really slow.
>
> Evidence please?  Most file systems that I've looked at optimize
> zero-size files pretty well.
>

They aren't zero bytes.
It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) byte
files.  66 thousand tiny files plus 8 thousand files with data in them
makes for a 2.4MB directory.  That's big and slow.

$ find . -size 14c | wc
  66180   66180 1191240

$ zstd -dk 2115841.dat.zst
2115841.dat.zst     : 5 bytes

$ cat 2115841.dat
\.

$ dir | grep " 14 " | head -n20
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115841.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115842.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115843.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115844.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115845.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115851.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115899.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115901.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115902.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115903.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115905.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115907.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115909.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115913.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115915.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115917.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115919.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115923.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115926.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115931.dat.zst

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Request For Feature: pg_dump
  2026-05-22 13:32 Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
  2026-05-22 16:52 ` Re: Request For Feature: pg_dump Tom Lane <tgl@sss.pgh.pa.us>
  2026-05-22 17:20   ` Re: Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
@ 2026-05-22 18:09     ` Holger Jakobs <holger@jakobs.com>
  2026-05-22 18:41       ` Re: Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
  0 siblings, 1 reply; 5+ messages in thread

From: Holger Jakobs @ 2026-05-22 18:09 UTC (permalink / raw)
  To: pgsql-admin@lists.postgresql.org

Am 22.05.26 um 19:20 schrieb Ron Johnson:
> On Fri, May 22, 2026 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>     Ron Johnson <ronljohnsonjr@gmail.com> writes:
>     > In --format=directory mode, remove .dat files with zero data
>     records, and
>     > mark that table's toc.dat entry that it's an empty table.
>
>     > Justification: *lots* of empty tables means *lots* of teeny-tiny
>     files in
>     > the DB's dump directory.  That unnecessarily bloats the fs, and
>     makes "du
>     > -c" really really slow.
>
>     Evidence please?  Most file systems that I've looked at optimize
>     zero-size files pretty well.
>
>
> They aren't zero bytes.
> It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) 
> byte files.  66 thousand tiny files plus 8 thousand files with data in 
> them makes for a 2.4MB directory.  That's big and slow.
>
> $ find . -size 14c | wc
>   66180   66180 1191240
>
> $ zstd -dk 2115841.dat.zst
> 2115841.dat.zst     : 5 bytes
>
> $ cat 2115841.dat
> \.
>
> $ dir | grep " 14 " | head -n20
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115841.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115842.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115843.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115844.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115845.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115851.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115899.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115901.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115902.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115903.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115905.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115907.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115909.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115913.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115915.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115917.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115919.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115923.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115926.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30 
> 2115931.dat.zst
> -- 
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!

Maybe just avoiding to compress empty files would already do the job. I 
think any file below a certain size isn't worth compressing.

Regards,

Holger

-- 

Holger Jakobs


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Request For Feature: pg_dump
  2026-05-22 13:32 Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
  2026-05-22 16:52 ` Re: Request For Feature: pg_dump Tom Lane <tgl@sss.pgh.pa.us>
  2026-05-22 17:20   ` Re: Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
  2026-05-22 18:09     ` Re: Request For Feature: pg_dump Holger Jakobs <holger@jakobs.com>
@ 2026-05-22 18:41       ` Ron Johnson <ronljohnsonjr@gmail.com>
  0 siblings, 0 replies; 5+ messages in thread

From: Ron Johnson @ 2026-05-22 18:41 UTC (permalink / raw)
  To: Pgsql-admin <pgsql-admin@lists.postgresql.org>

On Fri, May 22, 2026 at 2:09 PM Holger Jakobs <holger@jakobs.com> wrote:

> Am 22.05.26 um 19:20 schrieb Ron Johnson:
>
> On Fri, May 22, 2026 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Ron Johnson <ronljohnsonjr@gmail.com> writes:
>> > In --format=directory mode, remove .dat files with zero data records,
>> and
>> > mark that table's toc.dat entry that it's an empty table.
>>
>> > Justification: *lots* of empty tables means *lots* of teeny-tiny files
>> in
>> > the DB's dump directory.  That unnecessarily bloats the fs, and makes
>> "du
>> > -c" really really slow.
>>
>> Evidence please?  Most file systems that I've looked at optimize
>> zero-size files pretty well.
>>
>
> They aren't zero bytes.
> It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces)
> byte files.  66 thousand tiny files plus 8 thousand files with data in them
> makes for a 2.4MB directory.  That's big and slow.
>
> $ find . -size 14c | wc
>   66180   66180 1191240
>
> $ zstd -dk 2115841.dat.zst
> 2115841.dat.zst     : 5 bytes
>
> $ cat 2115841.dat
> \.
>
> $ dir | grep " 14 " | head -n20
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115841.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115842.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115843.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115844.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115845.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115851.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115899.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115901.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115902.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115903.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115905.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115907.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115909.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115913.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115915.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115917.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115919.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115923.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115926.dat.zst
> -rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
> 2115931.dat.zst
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>
> Maybe just avoiding to compress empty files would already do the job.
>

The files aren't empty, though, since they have the terminating "\."


> I think any file below a certain size isn't worth compressing.
>

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2026-05-22 18:41 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-05-22 13:32 Request For Feature: pg_dump Ron Johnson <ronljohnsonjr@gmail.com>
2026-05-22 16:52 ` Tom Lane <tgl@sss.pgh.pa.us>
2026-05-22 17:20   ` Ron Johnson <ronljohnsonjr@gmail.com>
2026-05-22 18:09     ` Holger Jakobs <holger@jakobs.com>
2026-05-22 18:41       ` Ron Johnson <ronljohnsonjr@gmail.com>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox