Re: RFC: Allow EXPLAIN to Output Page Fault Information

public inbox for pgsql-hackers@postgresql.org  
help / color / mirror / Atom feed

From: vellaipandiyan sm <vellaipandiyan.sm@gmail.com>
To: torikoshia <torikoshia@oss.nttdata.com>
Cc: Pgsql Hackers <pgsql-hackers@postgresql.org>
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Date: Thu, 21 May 2026 11:08:33 +0530
Message-ID: <CAGXjcj=hnEZCCDDMRv06EQnDcv0mRe6+Ph1gv8=Hb7NEc51y5A@mail.gmail.com> (raw)
In-Reply-To: <c20f6340eb26f3b736abc59471bfada8@oss.nttdata.com>
References: <c20f6340eb26f3b736abc59471bfada8@oss.nttdata.com>

Hello hackers,

I reviewed the EXPLAIN storage I/O patch and the overall direction seems
useful, especially for distinguishing shared-buffer hits from actual
storage reads during query analysis.

One concern that stood out to me from the later discussion is the
interaction with asynchronous I/O and worker-based I/O accounting.

Since the patch currently relies on per-process getrusage() statistics, it
seems possible that the reported values could become partial or misleading
once I/O is performed outside the backend process context. In particular,
worker-based AIO could undercount storage reads/writes while still
returning non-zero values, which may make the output appear more accurate
than it actually is.

I also wonder whether this should affect the semantics/documentation of the
feature itself — i.e. whether the reported numbers should be described more
explicitly as “backend-observed synchronous storage I/O” rather than total
query I/O.

That said, the examples showing differences between OS cache hits and
actual storage reads were quite compelling, especially for query
performance debugging.

Thanks for working on the patch.

Regards,
Vellaipandiyan

On Thu, May 21, 2026 at 11:07 AM torikoshia <torikoshia@oss.nttdata.com>
wrote:

> Hi,
>
> When reading the output of EXPLAIN (ANALYZE) to diagnose slow queries
> for our customers, I often want to know how many page faults occurred,
> especially major page faults, which involve disk access.
>
> Currently, the BUFFERS option in EXPLAIN provides information on whether
> a page was found in the shared buffers, but it doesn't provide insight
> into whether the page was found in the OS cache or not and disk I/O
> occurred.
>
> Since page faults especially major one impact performance compared to
> shared buffer and OS cache hits, it would be helpful to track these
> events.
>
> I have attached a PoC patch that modifies EXPLAIN to include page fault
> information during both the planning and execution phases of a query.
> The output would look like this:
>
>    =# EXPLAIN (ANALYZE, PAGEFAULTS)
>            SELECT * FROM pgbench_branches b
>              JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
>
>    QUERY PLAN
>
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------
>      Nested Loop  (cost=0.58..335386.98 rows=4999917 width=197) (actual
> time=3.785..5590.294 rows=5000000 loops=1)
>
>        ... (omitted)
>
>      Planning:
>        Buffers: shared hit=50 read=48 dirtied=4 written=2
>        Page Faults: minor=30 major=19 <-ADDED
>      Planning Time: 22.080 ms
>      Execution:
>        Page Faults: minor=49 major=5 <-ADDED
>      Execution Time: 5794.356 ms
>
> The patch has not yet added this functionality to auto_explain, but I
> believe this feature would be more useful in auto_explain than in the
> EXPLAIN command itself. I plan to add it to auto_explain if there's
> interest in including page fault information in EXPLAIN.
>
> While GUCs like log_statement_stats allow logging of getrusage(2)
> information, including page faults, always enabling this can lead to
> excessive log output. It would be better to log this data only when
> queries are slow. Therefore, adding this feature to auto_explain seems
> like a good solution.
>
>
> The patch introduces a new option, PAGEFAULTS, but it may be more
> appropriate to include the page fault information in another option,
> such as SUMMARY, especially if there are other useful resources that can
> be obtained from getrusage(2).
>
>
> The patch cannot be applied to Windows because getrusage() in PostgreSQL
> ported for Windows currently only tracks CPU times. I'm not sure if
> information on major and minor page faults is accessible on Windows, but
> it might be acceptable to treat it similarly to log_statement_stats and
> exclude it from Windows support.
>
>
> I also tried to add page faults information for each plan node, similar
> to the BUFFERS option, but I decided against this approach due to the
> performance impact. The frequent calls to getrusage(2), i.e. each time
> getting one row, created significant overhead.
>
>
> Any feedback would be greatly appreciated.
>
>
> --
> Regards,
>
> --
> Atsushi Torikoshi
> Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-hackers@postgresql.org
  Cc: vellaipandiyan.sm@gmail.com, torikoshia@oss.nttdata.com
  Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
  In-Reply-To: <CAGXjcj=hnEZCCDDMRv06EQnDcv0mRe6+Ph1gv8=Hb7NEc51y5A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox