Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rgRxZ-0069gs-KI for pgsql-bugs@arkaria.postgresql.org; Sat, 02 Mar 2024 16:11:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rgRxX-00GgMO-Pt for pgsql-bugs@arkaria.postgresql.org; Sat, 02 Mar 2024 16:11:36 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rgRxX-00GgMG-I3 for pgsql-bugs@lists.postgresql.org; Sat, 02 Mar 2024 16:11:36 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rgRxQ-002VbM-Hw for pgsql-bugs@lists.postgresql.org; Sat, 02 Mar 2024 16:11:34 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 422GBRBN3120722; Sat, 2 Mar 2024 11:11:27 -0500 From: Tom Lane To: exclusion@gmail.com cc: pgsql-bugs@lists.postgresql.org Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault In-reply-to: <18374-ebb8113ce4d02f0d@postgresql.org> References: <18374-ebb8113ce4d02f0d@postgresql.org> Comments: In-reply-to PG Bug reporting form message dated "Sat, 02 Mar 2024 06:00:01 +0000" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <3120720.1709395887.1@sss.pgh.pa.us> Date: Sat, 02 Mar 2024 11:11:27 -0500 Message-ID: <3120721.1709395887@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk PG Bug reporting form writes: > When a backend with deeply nested memory contexts hits out-of-memory > condition and logs the contexts, it might lead to a segmentation fault > (due to the lack of free memory again). Hmph. That's not an out-of-memory crash, that's a stack-too-deep crash. Seems like we ought to do one or both of these: 1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal. 2. Teach MemoryContextStatsInternal to refuse to recurse more than N levels, for N perhaps around 100. Neither of these are very attractive though, as they'd obscure the OOM situation that we're trying to help debug. It strikes me that we don't actually need recursion in order to traverse the context tree: since the nodes have parent pointers, it'd be possible to visit them all using only iteration. The recursion seems necessary though to manage the child summarization logic as we have it (in particular, we must have a local_totals per level to produce summarization like this). Maybe we could modify solution #2 into 2a. Once we get more than say 100 levels deep, summarize everything below that in a single line, obtained in an iterative rather than recursive traversal. I wonder whether MemoryContextDelete and other cleanup methods also need to be rewritten to avoid recursion. In the infinite_recurse test case I think we escape trouble because we longjmp out of most of the stack before we try to clean up --- but you could probably devise a test case that tries to do a subtransaction abort at a deep call level, and then maybe kaboom? regards, tom lane