Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tUtPT-003cMz-Hf for pgsql-hackers@arkaria.postgresql.org; Mon, 06 Jan 2025 20:09:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tUtPS-00CISi-NW for pgsql-hackers@arkaria.postgresql.org; Mon, 06 Jan 2025 20:09:10 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tUtPS-00CISM-8D for pgsql-hackers@lists.postgresql.org; Mon, 06 Jan 2025 20:09:10 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tUtPO-000DWk-2a for pgsql-hackers@postgresql.org; Mon, 06 Jan 2025 20:09:09 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-aa67f31a858so2659067666b.2 for ; Mon, 06 Jan 2025 12:09:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=jeltef.nl; s=google; t=1736194146; x=1736798946; darn=postgresql.org; h=in-reply-to:references:subject:cc:from:to:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zcAuPHrjXKl3HBl/6/P8youx8cyQxib4wodG+jqnfcQ=; b=VrtuSZr/yifvG2D67iOT/DbLZhqQIqo3W2T1jxp6lHG1fYnCvyoh4pbhTBHOtXap9n wKbiOERUBT16gLAtgdxV1oiC4GdkmnycVbgT6FE3tx4ekW3VutRywViVeWrbIEvBrqsG SCMwxLDTPsQE+zI9l0H12bX9cGgDzqfNf7eEGOEw9KGimFgX+Pu3fJD4FThxj70IFG2F w9hUidL400r+CmaO+dCV1lFAvr+46jOeogIKCx9s9pIhjZoUxvPIjWI9i4Aea/gqVP9i 51Vgl4ndI7EKl/Vh5Aucn5qpNGU2CXiMuGptlfIATZVdsty5Gkx4lfMnmNAQPuy4o7QO JL9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736194146; x=1736798946; h=in-reply-to:references:subject:cc:from:to:message-id:date :content-transfer-encoding:mime-version:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=zcAuPHrjXKl3HBl/6/P8youx8cyQxib4wodG+jqnfcQ=; b=vfZh7Yw4nNBKQBxD/1ErG0f/gvJBGl8Q9vZcghKTT/4pxgp4i8oQgSqTumcZ0fFAWg P6SH7nIKfSFUX+iBwRATNO2+aji8izIB1uacJnrrqEednXacJa7t9Mj3fXncRwmEPbrJ uwz1RpjB3JFRR0NSVVyub+cSjfL70BadUkoR3+GLKAAGSXuT8CQOA/JGAy4X2RYBf4/n LXQzMBQKBnWbtHPxJBo6gxfLTAS/swjAPjRuTaGrJe43cGcdyvgxUfxBs3HvXjgDZOFt FqM6MEcDoFz+rmMl1EoQGDgVaFWg4su8tDYwvpaLrBngxK4M775hgl+QrJyQIbm0yYp7 LToA== X-Forwarded-Encrypted: i=1; AJvYcCXmzZg8QJqOVUBRLP4zt+0Yx9t/LiT+Ln3aizInBz4FbNDn8e68E7rHy+tKvC0XGF26rZVcX4iXT8q8y0zx@postgresql.org X-Gm-Message-State: AOJu0YwhGmuZODI/B7n+ClGuE03BW0R1/NB7G+8Dtrjc3SJVbSayFRrg 6tmuqLH4hKq68Ct5iprVt8TlceD+po0zUGdwMQJuJgUEqS7VNE8ZI7+yT1Ls05U= X-Gm-Gg: ASbGnctjZ+APICZ1regTvrsS2jCdi2i4QU1+oomgzoELiCnj0aYtjAJ/jvHMoaISDTt Tf9IRQkL8TgIz5+85sVqGdojKlRk4YydzZV9nk5iuBakdqHZju6Ud2Z/r6jvKnIrTBzpQgXAobi WzMgUEaKDLQMO7ozQjd/3mgo7ZSRwpe4CkxIKfEZ7nnFmECw1aWKJ0EfOOEhZHvA048J9+BzG+T k5DXM4GcGSr/W4+T0DEerY+7w+o9HY2qraZ8h7nvbn7uUmdcMs1AsXinvYvDrD9//oldA9xVHBY pRSYNg== X-Google-Smtp-Source: AGHT+IGmsHGWyQF3UDnPQrdXd99UZAe5BIaskMvp+i8tYpX+yErbSEB/a4j1zTWuDQWmvLOOoIiIYw== X-Received: by 2002:a17:907:1b93:b0:aae:8307:ec69 with SMTP id a640c23a62f3a-aae8307eecemr3617973966b.7.1736194145567; Mon, 06 Jan 2025 12:09:05 -0800 (PST) Received: from localhost (145-53-221-196.fixed.kpn.net. [145.53.221.196]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aac0e82f138sm2287567466b.17.2025.01.06.12.09.04 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Jan 2025 12:09:05 -0800 (PST) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8; format=Flowed Date: Mon, 06 Jan 2025 21:09:04 +0100 Message-Id: To: "torikoshia" , From: "Jelte Fennema-Nio" Cc: "Bruce Momjian" , "Pgsql Hackers" Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information X-Mailer: aerc 0.18.2-131-g6c17c2000fa3 References: <1614321.1735055528@sss.pgh.pa.us> <3861403.1735576764@sss.pgh.pa.us> <1f22794321b745549d54359d343e37b8@oss.nttdata.com> In-Reply-To: <1f22794321b745549d54359d343e37b8@oss.nttdata.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon Jan 6, 2025 at 10:49 AM CET, torikoshia wrote: > On Tue, Dec 31, 2024 at 7:57=E2=80=AFAM Jelte Fennema-Nio =20 > Updated the PoC patch to calculate them by KB: > > =3D# EXPLAIN (ANALYZE, STORAGEIO) SELECT * FROM pgbench_accounts; > QUERY PLAN > =20 > -------------------------------------------------------------------------= -------------------------------------------------------- > Seq Scan on pgbench_accounts (cost=3D0.00..263935.35 rows=3D10000035= =20 > width=3D97) (actual time=3D1.447..3900.279 rows=3D10000000 loops=3D1) > Buffers: shared hit=3D2587 read=3D161348 > Planning Time: 0.367 ms > Execution: > Storage I/O: read=3D1291856 KB write=3D0 KB > Execution Time: 4353.253 ms > (6 rows) > > The core functionality works well in my opinion. I think it makes sense to spend the effort to move this from PoC quality to something committable. Below some of the things that are necessary to do that after an initial pass over the code (and trying it out): >> One other thing that I noticed when playing around with this, which >> would need to be addressed: Parallel workers need to pass these values >> to the main process somehow, otherwise the IO from those processes gets= =20 >> lost. > > Yes. > I haven't developed it yet but I believe we can pass them like=20 > buffer/WAL usage. 1. Yeah, makes sense to do this the same way as we do for buffers. Let's do this now. > + if (es->storageio) > + { > + getrusage(RUSAGE_SELF, &rusage); > + > + storageio.inblock =3D rusage.ru_inblock - storageio_star= t.inblock; > + storageio.outblock =3D rusage.ru_oublock - storageio_sta= rt.outblock; > + > + if (es->format =3D=3D EXPLAIN_FORMAT_TEXT) > + { > + ExplainIndentText(es); > + appendStringInfoString(es->str, "Execution:\n"); > + es->indent++; > + } > + show_storageio(es, &storageio); > + > + if (es->format =3D=3D EXPLAIN_FORMAT_TEXT) > + es->indent--; > + ExplainCloseGroup("Execution", "Execution", true, es); > + } 2. The current code always shows "Execution: " in the explain analyze output, even if no storageio is done. I think this should use peek_storageio() to check if any storageio was done and only show the "Execution: " line if that is the case. 3. FORMAT JSON seems to be broken with this patch. With the following simple query: explain (ANALYZE, STORAGEIO, FORMAT JSON) select max(a), max(b) from t_big; I get this this assert failure: TRAP: failed Assert("es->indent =3D=3D 0"), File: "../src/backend/commands/= explain.c", Line: 375, PID: 199034 postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(ExceptionalCondition+0x74= )[0x5ad72872b464] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(ExplainQuery+0x75b)[0x5ad= 7283c87bb] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(standard_ProcessUtility+0= x595)[0x5ad7285e97f5] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x4daadf)[0x5ad7285e7adf= ] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x4dafc4)[0x5ad7285e7fc4= ] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(PortalRun+0x32d)[0x5ad728= 5e834d] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x4d70a2)[0x5ad7285e40a2= ] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(PostgresMain+0x16e9)[0x5a= d7285e5b39] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(BackendMain+0x5f)[0x5ad72= 85e02df] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(postmaster_child_launch+0= xe1)[0x5ad72853cde1] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(+0x433758)[0x5ad728540758= ] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(PostmasterMain+0xddd)[0x5= ad72854223d] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(main+0x1d0)[0x5ad72828b60= 0] /lib/x86_64-linux-gnu/libc.so.6(+0x2a1ca)[0x714aa222a1ca] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x8b)[0x714aa222a28b] postgres: jelte postgres 127.0.0.1(49034) EXPLAIN(_start+0x25)[0x5ad72828bc= 05] FORMAT JSON should obviously not crash the server, but apart from that it should also actually show this new data in the json output. 4. I think this setting should be enabled by default for ANALYZE, just like BUFFERS is now since c2a4078e[1]. 5. I'm wondering whether this option deserves its own EXPLAIN option, or if it should simply be made part of BUFFERS. 6. Windows compilation is currently failing on the CFbot. Looking at the output, that's because rusage does not contain these fields there. I think you'll need some #ifdefs=20 7. The result of getrusage() should be checked for errors and we should report the error. (eventhough it's very unexpected to ever fail). 8. This needs docs > + appendStringInfo(es->str, " read=3D%ld KB", (long) usage= ->inblock / 2); > + appendStringInfo(es->str, " write=3D%ld KB", (long) usag= e->outblock / 2); 9. I think this division by 2 could use some explanation in a comment. I understand that you're doing this because linux divides its original bytes using 512 bytes[2] and your additional factor of 2 gets that to 1024 bytes. But that's not clear immediately from the code. I'm also not convinced that 512 is the blocksize if this logic is even correct on every platform. I'm wondering if maybe we should simply show the blocks after all. [1]: https://github.com/postgres/postgres/commit/c2a4078ebad71999dd451ae7d4= 358be3c9290b07 [2]: https://github.com/torvalds/linux/blob/fbfd64d25c7af3b8695201ebc85efe9= 0be28c5a3/include/linux/task_io_accounting_ops.h#L16-L23