Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rgSie-006Cx2-KZ for pgsql-bugs@arkaria.postgresql.org; Sat, 02 Mar 2024 17:00:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rgSib-00Gj0k-SV for pgsql-bugs@arkaria.postgresql.org; Sat, 02 Mar 2024 17:00:14 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rgSib-00Gj0c-3f for pgsql-bugs@lists.postgresql.org; Sat, 02 Mar 2024 17:00:13 +0000 Received: from mail-lf1-x136.google.com ([2a00:1450:4864:20::136]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rgSiT-002Vt1-41 for pgsql-bugs@lists.postgresql.org; Sat, 02 Mar 2024 17:00:12 +0000 Received: by mail-lf1-x136.google.com with SMTP id 2adb3069b0e04-512bde3d197so3045195e87.0 for ; Sat, 02 Mar 2024 09:00:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709398803; x=1710003603; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=0zw4Dr2qepNuXRF6DQc5GLKteapZYkdMxTbDemHBHDg=; b=hJ+EsDM+3NK/FjbkQaZ6NzVDuSGBFgUYbM8tO5IV36XUH9n8VOStHeW/I7KaXbuuNk Ghd/Dd5UknkKtD/3cYRaylx/Sz9b6+afxjLpnUHA5jRqZx2ZCYWoppLyvXbIbkH7Vm/j kmV8jYfhAagVHilQ5Ji5s4mrGDPRie17UoM7wVOJNLZiwzEpQPS8Ax15xHoUyl5tC86E Mj5FuLAReC7L4qG5NyMDbmBbbRWOeSm2Zuzn9oJlOiww70+FdFZDcOixJTxzi2G44h52 ZdTcCbpjCSgfYmWHuwj9VOl77zgm06Q8eSucln7BFOfJdBg0BYMTggdlIZQDkBxMA4v0 Qxig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709398803; x=1710003603; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=0zw4Dr2qepNuXRF6DQc5GLKteapZYkdMxTbDemHBHDg=; b=bes4PqxEYw7fyPjLqsxxwM4oG8oj7w+12Neo2FKe390jF/gNxTgzx3tAqw2OHHjiWD ZLoC2+IUj/Ef9JMBRz/T7vDZ12CE8ibwFuiuP06YjNuSgHKd8t60HYeTAVAFkqwcd1GI rCv0h+AVjBgRPKVwGjLs6hWEarGswCzXiGz62BpzCnrZpBqjitZmYrfGSbEI4TnC6B7+ WmiyFAA9PSQhunLoREC/MPlH3xEUSU0wPjIUh6j3dNcu60XIYrHXdfOz47N76ptbNkiP +t9A8lzY/JX/mqge7B957Eiohq/NSuzj6hkRyg4uSu8HFi8w8MVYjWwxoVleXRHe5AfT y/NQ== X-Gm-Message-State: AOJu0YxH1CA5NseSrVk+LHGqJ8oodLPr7VPzNx2RdYumrxZPfy/RkfRN EcE96s/WdTWXI6AEPf8atMV8jpELN/nkYkl+QejFUPFIHPR00lqn2MZq9KQa X-Google-Smtp-Source: AGHT+IHUucwdIDZ2qQ0BfJD/etUFKB6/IYexnggY9EoAi+N9WIXk3CO8CkTm77QxT9ChLJqm+YkT2Q== X-Received: by 2002:ac2:4d03:0:b0:513:cb:232c with SMTP id r3-20020ac24d03000000b0051300cb232cmr2909489lfi.4.1709398803237; Sat, 02 Mar 2024 09:00:03 -0800 (PST) Received: from [1.0.0.7] ([178.155.19.47]) by smtp.gmail.com with ESMTPSA id x7-20020ac25dc7000000b0051302450df3sm1031236lfq.148.2024.03.02.09.00.01 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 02 Mar 2024 09:00:02 -0800 (PST) Message-ID: Date: Sat, 2 Mar 2024 20:00:00 +0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault Content-Language: en-US To: Tom Lane Cc: pgsql-bugs@lists.postgresql.org References: <18374-ebb8113ce4d02f0d@postgresql.org> <3120721.1709395887@sss.pgh.pa.us> From: Alexander Lakhin In-Reply-To: <3120721.1709395887@sss.pgh.pa.us> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello Tom, 02.03.2024 19:11, Tom Lane wrote: > PG Bug reporting form writes: >> When a backend with deeply nested memory contexts hits out-of-memory >> condition and logs the contexts, it might lead to a segmentation fault >> (due to the lack of free memory again). > Hmph. That's not an out-of-memory crash, that's a stack-too-deep > crash. I tried to decrease the limit and still got the failure (with the much shorter stack): ulimit -Sv 200000; TESTS=infinite_recurse make -s check-tests (gdb) p $rsp $1 = (void *) 0x7ffcc83d4ff0 (gdb) frame 13269 #13269 0x000056289bc2685a in main (argc=8, argv=0x56289d3b4930) at main.c:198 198                     PostmasterMain(argc, argv); (gdb) p $rsp $2 = (void *) 0x7ffcc84834d0 (gdb) p $rsp - 0x7ffcc83d4ff0 $3 = (void *) 0xae4e0 (Far less than ulimit -s == 8 MB.) It made me think that it's not a stack overflow issue, but may be I miss something. > Seems like we ought to do one or both of these: > > 1. Put a CHECK_STACK_DEPTH() call in MemoryContextStatsInternal. > > 2. Teach MemoryContextStatsInternal to refuse to recurse more > than N levels, for N perhaps around 100. > > Neither of these are very attractive though, as they'd obscure > the OOM situation that we're trying to help debug. > > It strikes me that we don't actually need recursion in order to > traverse the context tree: since the nodes have parent pointers, > it'd be possible to visit them all using only iteration. The > recursion seems necessary though to manage the child summarization > logic as we have it (in particular, we must have a local_totals > per level to produce summarization like this). Maybe we could > modify solution #2 into > > 2a. Once we get more than say 100 levels deep, summarize everything > below that in a single line, obtained in an iterative rather than > recursive traversal. > > I wonder whether MemoryContextDelete and other cleanup methods > also need to be rewritten to avoid recursion. In the infinite_recurse > test case I think we escape trouble because we longjmp out of most > of the stack before we try to clean up --- but you could probably > devise a test case that tries to do a subtransaction abort at a > deep call level, and then maybe kaboom? Exploiting and protecting MemoryContextStatsInternal() were discussed before: https://www.postgresql.org/message-id/flat/1661334672.728714027%40f473.i.mail.ru (It looks like the function got no stack-overflow protection at the end.) But I'm still not sure that we deal here with the same issue. Best regards, Alexander