public inbox for pgsql-bugs@postgresql.org  
help / color / mirror / Atom feed
From: Alexander Lakhin <exclusion@gmail.com>
To: Tom Lane <tgl@sss.pgh.pa.us>
Cc: pgsql-bugs@lists.postgresql.org
Cc: Heikki Linnakangas <hlinnaka@iki.fi>
Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
Date: Sat, 16 May 2026 22:00:00 +0300
Message-ID: <89884b8a-2a50-4f6d-9e9c-223c6741643e@gmail.com> (raw)
In-Reply-To: <3399097.1709501969@sss.pgh.pa.us>
References: <18374-ebb8113ce4d02f0d@postgresql.org>
	<3120721.1709395887@sss.pgh.pa.us>
	<b1a1eaf3-d5b7-da52-6bb7-c5b3fbe47f3e@gmail.com>
	<3140126.1709405398@sss.pgh.pa.us>
	<3148162.1709409519@sss.pgh.pa.us>
	<3399097.1709501969@sss.pgh.pa.us>

Hello Tom,

03.03.2024 23:39, Tom Lane wrote:
> I wrote:
>> I find this in [1]:
>>
>>    The C language stack growth does an implicit mremap. If you want absolute
>>    guarantees and run close to the edge you MUST mmap your stack for the
>>    largest size you think you will need. For typical stack usage this does
>>    not matter much but it's a corner case if you really really care
>>
>> Seems like we need to do some more work at startup to enforce that
>> we have the amount of stack we think we do, if we're on Linux.
> After thinking about that some more, I'm really quite unenthused about
> trying to remap the stack for ourselves.  It'd be both platform- and
> architecture-dependent, and I'm afraid it'd introduce as many failure
> modes as it removes.  (Notably, I'm not sure we could guarantee
> there's a guard page below the stack.)  Since we've not seen reports
> of this failure from the wild, I doubt it's worth the trouble.

I'm not too excited either, but I observed such SIGSEGVs in a
memory-restricted cloud (neon) environment (perhaps it could be considered
the wild), and what looks bad to me is that there is no protection from it
at all. That is, if you can get an out-of-memory error in some environment,
you can also bring the whole server down with the segfault, occasionally or
intentionally.

I researched the subject and found the only way to prevent this -- to
allocate the stack memory (up to max_stack_depth) on postmaster child's
start.

Please look at a test, which triggers the server crash, and a possible
protection.

When running this test on Linux, I'm getting:
PROVE_TESTS="t/099*" make -s check -C src/test/modules/test_misc/
# Testing ulimit -Sv 280000
# out of memory
# Testing ulimit -Sv 1140000
# stack depth limit exceeded
...
# Boundary between 'out of memory' and 'stack depth limit exceeded' found: 283779
...
# Testing ulimit -Sv 275587
# psql:<stdin>:17: server closed the connection unexpectedly
#       This probably means the server terminated abnormally
#       before or while processing the request.

2026-05-16 20:33:30.724 EEST postmaster[4101481] LOG:  client backend (PID 4101496) was terminated by signal 11: 
Segmentation fault
2026-05-16 20:33:30.724 EEST postmaster[4101481] DETAIL:  Failed process was running: select explainer('execute stmt');

While with echo "preallocate_stack=on" >/tmp/temp.config;
TEMP_CONFIG=/tmp/temp.config PROVE_TESTS="t/099*" make -s check -C src/test/modules/test_misc/
survives the test.

ulimit -Sv is easy to use for the test, but in the wild the restriction
would be rather on the total amount of memory for all (postgres) processes,
so there could be more interesting scenarios...

Catching sigsegv in allocate_stack() is needed to handle correctly the
even worse situation, when there is not enough memory to preallocate stack
even on a process start.

The test and the fix work for me on Linux and FreeBSD.

Yes, this protection has it's price (max_stack_depth * num processes), but
perhaps one who wants to avoid server crashes should have the choice.

Thanks to Heikki for help with making the solution robust.

Best regards,
Alexander

Attachments:

  [application/x-perl] 099_stack_overflow.pl (2.3K, 2-099_stack_overflow.pl)
  download

  [text/x-patch] prevent-segfaults-under-memory-pressure.patch (6.3K, 3-prevent-segfaults-under-memory-pressure.patch)
  download | inline diff:
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 90c7c4528e8..46517c6b092 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -256,6 +256,7 @@ bool		remove_temp_files_after_crash = true;
  */
 bool		send_abort_for_crash = false;
 bool		send_abort_for_kill = false;
+bool		preallocate_stack = false;
 
 /* special child processes; NULL when not running */
 static PMChild *StartupPMChild = NULL,
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dbef734a93f..21775456c35 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3619,6 +3619,106 @@ ProcessInterrupts(void)
 		ProcessRepackMessages();
 }
 
+#ifndef WIN32
+/*
+ * occupy_stack - recursively touch stack pages to force physical allocation
+ *
+ * Each frame allocates 256kB (half of STACK_DEPTH_SLOP). Recursion stops
+ * when stack_is_too_deep() fires.
+ *
+ * Returns the number of frames successfully allocated.
+ */
+static pg_noinline int
+occupy_stack(void)
+{
+	/* Occupy half of STACK_DEPTH_SLOP at once */
+	volatile char	stack_data[256 * 1024];
+	volatile char	*p = stack_data;
+
+	/* Touch each page to force kernel to back it with physical memory */
+	for (long i = 0; i < sizeof(stack_data); i += 4096)
+		p[i] = 0;
+
+	if (!stack_is_too_deep())
+		return occupy_stack() + 1;
+	return 0;
+}
+
+static sigjmp_buf alt_sigsegv_jump;
+
+/* SIGSEGV handler for allocate_stack() */
+#if defined(USE_SIGACTION) && defined(USE_SIGINFO)
+static void
+alt_sigsegv_handler(int postgres_signal_arg, siginfo_t *info, void *context)
+#else                           /* no USE_SIGINFO */
+static void
+alt_sigsegv_handler(int postgres_signal_arg)
+#endif
+{
+	siglongjmp(alt_sigsegv_jump, 1);
+}
+#endif
+
+/*
+ * allocate_stack - acquire stack pages to prevent SIGSEGV under memory pressure
+ *
+ * When the system is under memory pressure (e.g., cgroup limits), the kernel
+ * may fail to extend the stack on demand, killing the backend with SIGSEGV
+ * before check_stack_depth() can intervene.  This function proactively acquires
+ * physical pages of stack up to max_stack_depth via occupy_stack(), while it
+ * can still handle failures gracefully, using a temporary SIGSEGV handler on
+ * an alternate signal stack.
+ *
+ * Returns the number of frames successfully allocated (0 on Windows).
+ */
+int
+allocate_stack(void)
+{
+	int			result = 0;
+
+#ifndef WIN32
+	struct sigaction	act, oldact;
+	stack_t			ss, old_ss;
+
+	ss.ss_sp = palloc(SIGSTKSZ);
+	ss.ss_size = SIGSTKSZ;
+	ss.ss_flags = 0;
+	if (sigaltstack(&ss, &old_ss) == -1)
+		elog(FATAL, "sigaltstack failed: %m");
+
+	act.sa_flags = SA_ONSTACK;
+	act.sa_handler = alt_sigsegv_handler;
+	sigemptyset(&act.sa_mask);
+	if (sigaction(SIGSEGV, &act, &oldact) == -1)
+		elog(FATAL, "sigaction failed: %m");
+
+	/*
+	 * Use sigsetjmp to intercept SIGSEGV that can happen even at the process
+	 * start, and in this case occupy_stack() can't handle it.
+	 *
+	 * Terminating the process with FATAL is the only way to avoid
+	 * segfaults later.
+	 */
+	if (sigsetjmp(alt_sigsegv_jump, 1) == 0)
+	{
+		result = occupy_stack();
+	}
+	else
+	{
+		/* We got the SIGSEGV trap */
+		elog(FATAL, "could not allocate stack");
+	}
+
+	if (sigaltstack(&old_ss, NULL) == -1)
+		elog(FATAL, "sigaltstack restore failed: %m");
+	if (sigaction(SIGSEGV, &oldact, NULL) == -1)
+		elog(FATAL, "sigaction failed: %m");
+	pfree(ss.ss_sp);
+#endif
+
+	return result;
+}
+
 /*
  * GUC check_hook for client_connection_check_interval
  */
diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
index 7ffc808073a..a0dd7c6f901 100644
--- a/src/backend/utils/init/miscinit.c
+++ b/src/backend/utils/init/miscinit.c
@@ -107,6 +107,9 @@ InitPostmasterChild(void)
 	pgwin32_signal_initialize();
 #endif
 
+	if (preallocate_stack)
+		(void) allocate_stack();
+
 	InitProcessGlobals();
 
 	/*
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index afaa058b046..b2d5642f572 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -2390,6 +2390,12 @@
   max => '60',
 },
 
+{ name => 'preallocate_stack', type => 'bool', context => 'PGC_POSTMASTER', group => 'RESOURCES_MEM',
+  short_desc => 'Preallocate stack on subprocess start.',
+  variable => 'preallocate_stack',
+  boot_val => 'false',
+},
+
 { name => 'primary_conninfo', type => 'string', context => 'PGC_SIGHUP', group => 'REPLICATION_STANDBY',
   short_desc => 'Sets the connection string to be used to connect to the sending server.',
   flags => 'GUC_SUPERUSER_ONLY',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index ac38cddaaf9..0c112f51684 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -150,6 +150,7 @@
 #autovacuum_work_mem = -1               # min 64kB, or -1 to use maintenance_work_mem
 #logical_decoding_work_mem = 64MB       # min 64kB
 #max_stack_depth = 2MB                  # min 100kB
+#preallocate_stack = off                # allocate max_stack_depth on subprocess start
 #shared_memory_type = mmap              # the default is the first option
                                         # supported by the operating system:
                                         #   mmap
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 8ccdf61246b..b0efa042ec5 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -306,6 +306,7 @@ extern void restore_stack_base(pg_stack_base_t base);
 extern void check_stack_depth(void);
 extern bool stack_is_too_deep(void);
 extern ssize_t get_stack_depth_rlimit(void);
+extern int allocate_stack(void);
 
 /* in tcop/utility.c */
 extern void PreventCommandIfReadOnly(const char *cmdname);
diff --git a/src/include/postmaster/postmaster.h b/src/include/postmaster/postmaster.h
index 716b4c912b3..7898c021b12 100644
--- a/src/include/postmaster/postmaster.h
+++ b/src/include/postmaster/postmaster.h
@@ -70,6 +70,7 @@ extern PGDLLIMPORT bool restart_after_crash;
 extern PGDLLIMPORT bool remove_temp_files_after_crash;
 extern PGDLLIMPORT bool send_abort_for_crash;
 extern PGDLLIMPORT bool send_abort_for_kill;
+extern PGDLLIMPORT bool preallocate_stack;
 
 #ifdef WIN32
 extern PGDLLIMPORT HANDLE PostmasterHandle;


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: pgsql-bugs@postgresql.org
  Cc: exclusion@gmail.com, tgl@sss.pgh.pa.us, pgsql-bugs@lists.postgresql.org, hlinnaka@iki.fi
  Subject: Re: BUG #18374: Printing memory contexts on OOM condition might lead to segmentation fault
  In-Reply-To: <89884b8a-2a50-4f6d-9e9c-223c6741643e@gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox