public inbox for pgsql-hackers@postgresql.org
help / color / mirror / Atom feedFrom: Atsushi Torikoshi <torikoshia.tech@gmail.com>
To: Jelte Fennema-Nio <postgres@jeltef.nl>
To: rjuju123@gmail.com
Cc: torikoshia <torikoshia@oss.nttdata.com>
Cc: tgl@sss.pgh.pa.us
Cc: Bruce Momjian <bruce@momjian.us>
Cc: Pgsql Hackers <pgsql-hackers@postgresql.org>
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date: Tue, 7 Jan 2025 22:41:47 +0900
Message-ID: <CAM6-o=BE=oewSsdNKrbbNjR2muzQR49STXsuta+Wpq6CSyxTag@mail.gmail.com> (raw)
In-Reply-To: <D6V9GJ2LOLC5.1ZF3BWKKEBZWC@jeltef.nl>
References: <c20f6340eb26f3b736abc59471bfada8@oss.nttdata.com>
<1614321.1735055528@sss.pgh.pa.us>
<D6MJOHS7HZ80.3B17NDGUV6T47@jeltef.nl>
<Z3LII9YEgp3kAZG-@momjian.us>
<3861403.1735576764@sss.pgh.pa.us>
<D6PENGW1PL79.266GQNPSMRIGA@jeltef.nl>
<1f22794321b745549d54359d343e37b8@oss.nttdata.com>
<D6V9GJ2LOLC5.1ZF3BWKKEBZWC@jeltef.nl>
On Mon, Jan 6, 2025 at 6:59 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Hi,
>
> On Mon, Jan 06, 2025 at 06:49:06PM +0900, torikoshia wrote:
> >
> > > **However, I think the general direction has merit**: Changing this
> > > patch to
> > > use `ru_inblock`/`ru_oublock` gives very useful insights. `ru_inblock`
> > > is 0 when everything is in page cache, and it is very high when stuff
> is
> > > not. I was only hacking around and basically did this:
> > >
> > > s/ru_minflt/ru_inblock/g
> > > s/ru_majflt/ru_oublock/g
> > > [...]
> > > Also, maybe there's some better way
> > > of getting read/write numbers for the current process than
> > > ru_inblock/ru_oublock (but this one seems to work at least reasonably
> > > well).
> >
> > Maybe, but as far as using getrusage(), ru_inblock and ru_outblock seem
> the
> > best.
>
> FWIW that's the counters we've been using in pg_stat_kcache / powa to
> compute
> disk IO and "postgres shared buffers hit VS OS cache hit VS disk read" hit
> ratio for many years and we didn't get any problem report for that.
Thanks for sharing the information!
On Tue, Jan 7, 2025 at 5:09 AM Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> On Mon Jan 6, 2025 at 10:49 AM CET, torikoshia wrote:
> > On Tue, Dec 31, 2024 at 7:57 AM Jelte Fennema-Nio <postgres@jeltef.nl>
> > Updated the PoC patch to calculate them by KB:
> >
> > =# EXPLAIN (ANALYZE, STORAGEIO) SELECT * FROM pgbench_accounts;
> > QUERY PLAN
> >
> >
> ---------------------------------------------------------------------------------------------------------------------------------
> > Seq Scan on pgbench_accounts (cost=0.00..263935.35 rows=10000035
> > width=97) (actual time=1.447..3900.279 rows=10000000 loops=1)
> > Buffers: shared hit=2587 read=161348
> > Planning Time: 0.367 ms
> > Execution:
> > Storage I/O: read=1291856 KB write=0 KB
> > Execution Time: 4353.253 ms
> > (6 rows)
> >
> >
>
> The core functionality works well in my opinion. I think it makes sense
> to spend the effort to move this from PoC quality to something
> committable.
Thanks for the comment and review!
If there are no other opinions, I will make a patch based on the direction
of the current PoC patch.
> 4. I think this setting should be enabled by default for ANALYZE, just
> like BUFFERS is now since c2a4078e[1].
>
> 5. I'm wondering whether this option deserves its own EXPLAIN option, or
> if it should simply be made part of BUFFERS.
Although this is not information about PostgreSQL buffers, I now feel that
making this addition part of BUFFERS is attractive, since people using
BUFFERS would also sometimes want to know about the storage I/O.
Since BUFFERS is now ON by default with EXPLAIN ANALYZE, I am concerned
about the performance impact.
However, if it is limited to just twice—once at the start and once at the
end—for the planning phase, execution phase, and each parallel worker, I
believe the impact is relatively small.
> 9. I think this division by 2 could use some explanation in a comment. I
> understand that you're doing this because linux divides its original
> bytes using 512 bytes[2] and your additional factor of 2 gets that to
> 1024 bytes. But that's not clear immediately from the code.
>
> I'm also not convinced that 512 is the blocksize if this logic is
> even correct on every platform. I'm wondering if maybe we should
> simply show the blocks after all.
>
Maybe so. I'll look into this and then decide the unit.
For the other comments, I plan to address them as you suggested.
Regards,
Atsushi Torikoshi
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: pgsql-hackers@postgresql.org
Cc: torikoshia.tech@gmail.com, postgres@jeltef.nl, rjuju123@gmail.com, torikoshia@oss.nttdata.com, tgl@sss.pgh.pa.us, bruce@momjian.us
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
In-Reply-To: <CAM6-o=BE=oewSsdNKrbbNjR2muzQR49STXsuta+Wpq6CSyxTag@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox