public inbox for pgsql-hackers@postgresql.org  
help / color / mirror / Atom feed
From: Atsushi Torikoshi <torikoshia.tech@gmail.com>
To: Jelte Fennema-Nio <postgres@jeltef.nl>
To: rjuju123@gmail.com
Cc: torikoshia <torikoshia@oss.nttdata.com>
Cc: tgl@sss.pgh.pa.us
Cc: Bruce Momjian <bruce@momjian.us>
Cc: Pgsql Hackers <pgsql-hackers@postgresql.org>
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date: Tue, 7 Jan 2025 22:41:47 +0900
Message-ID: <CAM6-o=BE=oewSsdNKrbbNjR2muzQR49STXsuta+Wpq6CSyxTag@mail.gmail.com> (raw)
In-Reply-To: <D6V9GJ2LOLC5.1ZF3BWKKEBZWC@jeltef.nl>
References: <c20f6340eb26f3b736abc59471bfada8@oss.nttdata.com>
	<1614321.1735055528@sss.pgh.pa.us>
	<D6MJOHS7HZ80.3B17NDGUV6T47@jeltef.nl>
	<Z3LII9YEgp3kAZG-@momjian.us>
	<3861403.1735576764@sss.pgh.pa.us>
	<D6PENGW1PL79.266GQNPSMRIGA@jeltef.nl>
	<1f22794321b745549d54359d343e37b8@oss.nttdata.com>
	<D6V9GJ2LOLC5.1ZF3BWKKEBZWC@jeltef.nl>

On Mon, Jan 6, 2025 at 6:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

> Hi,
>
> On Mon, Jan 06, 2025 at 06:49:06PM +0900, torikoshia wrote:
> >
> > > **However, I think the general direction has merit**: Changing this
> > > patch to
> > > use `ru_inblock`/`ru_oublock` gives very useful insights. `ru_inblock`
> > > is 0 when everything is in page cache, and it is very high when stuff
> is
> > > not. I was only hacking around and basically did this:
> > >
> > > s/ru_minflt/ru_inblock/g
> > > s/ru_majflt/ru_oublock/g
> > > [...]
> > >  Also, maybe there's some better way
> > > of getting read/write numbers for the current process than
> > > ru_inblock/ru_oublock (but this one seems to work at least reasonably
> > > well).
> >
> > Maybe, but as far as using getrusage(), ru_inblock and ru_outblock seem
> the
> > best.
>
> FWIW that's the counters we've been using in pg_stat_kcache / powa to
> compute
> disk IO and "postgres shared buffers hit VS OS cache hit VS disk read" hit
> ratio for many years and we didn't get any problem report for that.


Thanks for sharing the information!

On Tue, Jan 7, 2025 at 5:09 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:

> On Mon Jan 6, 2025 at 10:49 AM CET, torikoshia wrote:
> > On Tue, Dec 31, 2024 at 7:57 AM Jelte Fennema-Nio <postgres@jeltef.nl>
> > Updated the PoC patch to calculate them by KB:
> >
> >    =# EXPLAIN (ANALYZE, STORAGEIO) SELECT * FROM pgbench_accounts;
> >                                                               QUERY PLAN
> >
> >
> ---------------------------------------------------------------------------------------------------------------------------------
> >     Seq Scan on pgbench_accounts  (cost=0.00..263935.35 rows=10000035
> > width=97) (actual time=1.447..3900.279 rows=10000000 loops=1)
> >       Buffers: shared hit=2587 read=161348
> >     Planning Time: 0.367 ms
> >     Execution:
> >       Storage I/O: read=1291856 KB write=0 KB
> >     Execution Time: 4353.253 ms
> >    (6 rows)
> >
> >
>
> The core functionality works well in my opinion. I think it makes sense
> to spend the effort to move this from PoC quality to something
> committable.


Thanks for the comment and review!
If there are no other opinions, I will make a patch based on the direction
of the current PoC patch.


> 4. I think this setting should be enabled by default for ANALYZE, just
>    like BUFFERS is now since c2a4078e[1].
>
> 5. I'm wondering whether this option deserves its own EXPLAIN option, or
>    if it should simply be made part of BUFFERS.


Although this is not information about PostgreSQL buffers, I now feel that
making this addition part of BUFFERS is attractive, since people using
BUFFERS would also sometimes want to know about the storage I/O.

Since BUFFERS is now ON by default with EXPLAIN ANALYZE, I am concerned
about the performance impact.
However, if it is limited to just twice—once at the start and once at the
end—for the planning phase, execution phase, and each parallel worker, I
believe the impact is relatively small.


> 9. I think this division by 2 could use some explanation in a comment. I
>    understand that you're doing this because linux divides its original
>    bytes using 512 bytes[2] and your additional factor of 2 gets that to
>    1024 bytes. But that's not clear immediately from the code.
>
>    I'm also not convinced that 512 is the blocksize if this logic is
>    even correct on every platform. I'm wondering if maybe we should
>    simply show the blocks after all.
>

Maybe so. I'll look into this and then decide the unit.

For the other comments, I plan to address them as you suggested.


Regards,
Atsushi Torikoshi


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-hackers@postgresql.org
  Cc: torikoshia.tech@gmail.com, postgres@jeltef.nl, rjuju123@gmail.com, torikoshia@oss.nttdata.com, tgl@sss.pgh.pa.us, bruce@momjian.us
  Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
  In-Reply-To: <CAM6-o=BE=oewSsdNKrbbNjR2muzQR49STXsuta+Wpq6CSyxTag@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox