Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wNsNu-0018Kv-1w for pgsql-bugs@arkaria.postgresql.org; Fri, 15 May 2026 13:15:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wNsNt-000W6D-1p for pgsql-bugs@arkaria.postgresql.org; Fri, 15 May 2026 13:15:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wNqSf-000D9v-0b for pgsql-bugs@lists.postgresql.org; Fri, 15 May 2026 11:12:09 +0000 Received: from mahout.postgresql.org ([2001:4800:3e1:1::227]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wNqSc-00000000kBc-1sgy for pgsql-bugs@lists.postgresql.org; Fri, 15 May 2026 11:12:08 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Message-ID:Date:Reply-To:Cc:From:To:Subject: Content-Transfer-Encoding:MIME-Version:Content-Type:Sender:Content-ID: Content-Description:In-Reply-To:References; bh=OpC2shsx4cdbSr71DvDbL3dBe8YmbQA+YprJQWWlGWY=; b=AdG/BL/gQJWb9wlspoDjSH8DCZ tFtpsmDnZqE1CsYKvgBFt/v/fIYPaBMDHI8tsWo6M9HDj513Og4lWdbvlQrKoe07mUM1idLNtPZzf i6gFNAbl3pof1nyur8rLfSg96OBwvBw5U/FWGNHlLs6oSEL3Rk+JpfCcNTLUvHJkYP5u0epC/SoQr F+uaFMVOdN0McDKhdJjnzH006YgmDPbfkUDT8pzbsXUUvyfisnA45JcfyDecWQAcw715Qjokfdq52 3rQa06OqoI3n9DgaA/Y5tkSjxq2XFYTPoNZHMioUnkXwcixC65yiWhS9phzYkBSa/WOwRt1QwYr+7 rizE4r7g==; Received: from wrigleys.postgresql.org ([2a02:16a8:dc51::60]) by mahout.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wNqSa-00209b-3A for pgsql-bugs@lists.postgresql.org; Fri, 15 May 2026 11:12:05 +0000 Received: from localhost ([127.0.0.1] helo=wrigleys.postgresql.org) by wrigleys.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wNqSZ-0045Zz-1n for pgsql-bugs@lists.postgresql.org; Fri, 15 May 2026 11:12:03 +0000 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: BUG #19480: PL/Python SRF crashes (SIGSEGV) when function is replaced mid-iteration: use-after-free in PLy_funct To: pgsql-bugs@lists.postgresql.org From: PG Bug reporting form Cc: adoros@starfishstorage.com Reply-To: adoros@starfishstorage.com, pgsql-bugs@lists.postgresql.org Date: Fri, 15 May 2026 11:11:37 +0000 Message-ID: <19480-f1f9fdce30462fc4@postgresql.org> X-Auto-Response-Suppress: All Auto-Submitted: auto-generated List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk The following bug has been logged on the website: Bug reference: 19480 Logged by: Andrzej Doros Email address: adoros@starfishstorage.com PostgreSQL version: 17.9 Operating system: Ubuntu 22.04.5 LTS (x86_64), kernel 5.15, glibc 2. Description: =20 PostgreSQL version: 17.9 (production crash), confirmed identical on 17.10 OS: Ubuntu 22.04.5 LTS, x86_64, kernel 5.15, glibc 2.35 Package: postgresql-plpython3-17 from pgdg apt repository DESCRIPTION ----------- A PL/Python set-returning function (SRF) crashes the backend with SIGSEGV when another session executes CREATE OR REPLACE FUNCTION (or ALTER FUNCTION) on the same function while the SRF is mid-iteration. This is a use-after-free. srfstate->savedargs is allocated inside proc->mcxt by PLy_function_save_args() (plpy_exec.c:503). On each per-call SRF invocation, plpython3_call_handler calls PLy_procedure_get(), which may call PLy_procedure_delete(old_proc) -> MemoryContextDelete(old_proc->mcxt) if the function's pg_proc row has changed (different xmin or ctid). After that, srfstate->savedargs is a dangling pointer =E2=80=94 it is not cleared. The = next PLy_function_restore_args() reads freed memory: if (srfstate->savedargs) /* non-NULL dangling pointer */ PLy_function_restore_args(proc, srfstate->savedargs); /* reads freed mem */ Inside PLy_function_restore_args (plpy_exec.c:551): for (i =3D 0; i < savedargs->nargs; i++) /* nargs from freed memory */ { if (proc->argnames[i] && ...) PyDict_SetItemString(..., proc->argnames[i], ...); When savedargs->nargs is garbage (e.g. 2056017128 in two production core dumps), proc->argnames[i] for large i reads an invalid pointer, which is passed to PyDict_SetItemString -> PyUnicode_FromString -> strlen -> SIGSEGV. CRASH STACK (two identical core dumps from production, PG 17.9, Ubuntu 22.04) ---------------------------------------------------------------------------= --- #0 __strlen_evex() #1 PyUnicode_FromString(u=3D0x69ffff0000) #2 PyDict_SetItemString(...) #3 PLy_function_restore_args(proc=3D..., savedargs=3D...) #4 PLy_exec_function(...) #5 plpython3_call_handler(...) #6 fmgr_security_definer(...) #7 ExecMakeTableFunctionResult(...) State from the newer core dump: proc->proname =3D "tags_report_plpython" proc->nargs =3D 1 proc->argnames[0]=3D "flavour" savedargs->nargs =3D 2056017128 <- should be 1; contains garbage savedargs->namedargs[0] =3D 'tags' <- still valid (not yet overwritten) i =3D 4 <- loop has iterated far past argnames= [] TRIGGER CONDITION ----------------- The pg_proc invalidation reaches Session A's backend when AcceptInvalidationMessages() is called. This happens when Session A's Python code calls plpy.execute() with a statement that acquires a NEW relation lock (e.g. CREATE TEMP TABLE, any table not previously locked in this statement). Simply calling plpy.execute("SELECT 1") is not sufficient because the lock on pg_proc is already held and subsequent requests are served from the per-process lock table without invoking AcceptInvalidationMessages. In production the trigger is autovacuum on pg_proc (which moves the tuple's ctid) or any concurrent DDL from another session. Long-running SRFs (hours) are much more likely to hit this window. STEPS TO REPRODUCE ------------------ Requires two concurrent sessions and PostgreSQL with plpython3u. Session A =E2=80=94 start and leave running: CREATE EXTENSION IF NOT EXISTS plpython3u; CREATE OR REPLACE FUNCTION repro_srf(flavour VARCHAR) RETURNS TABLE (i BIGINT) AS $$ import time for i in range(100): -- CREATE TEMP TABLE acquires a new relation lock each iteration, -- which causes AcceptInvalidationMessages to be called. plpy.execute(f"CREATE TEMP TABLE _rt_{i} (x int)") plpy.execute(f"DROP TABLE _rt_{i}") time.sleep(0.3) yield i $$ LANGUAGE plpython3u VOLATILE; SELECT count(*) FROM repro_srf('test'); Session B =E2=80=94 while Session A is running (after ~2 seconds): CREATE OR REPLACE FUNCTION repro_srf(flavour VARCHAR) RETURNS TABLE (i BIGINT) AS $$ import time for i in range(100): plpy.execute(f"CREATE TEMP TABLE _rt_{i} (x int)") plpy.execute(f"DROP TABLE _rt_{i}") time.sleep(0.3) yield i $$ LANGUAGE plpython3u VOLATILE; NOTE: In a minimal test without memory pressure, the freed savedargs memory is often not overwritten quickly enough to produce a crash =E2=80=94 savedargs->nargs accidentally retains its correct value of 1 and restore_args succeeds. Under production load (long-running SRF, many Python allocations), the freed region is overwritten and the crash occurs. The crash can be triggered deterministically with gdb by setting savedargs->nargs to a large value immediately after PLy_procedure_delete fires (see gdb script below). This produces the identical crash stack seen in production. GDB CONFIRMATION (PostgreSQL 17.10) ------------------------------------- The following gdb session was used to confirm the exact sequence: (gdb) b PLy_procedure_delete (gdb) commands 1 > printf "DELETE proname=3D%s mcxt=3D%p\n", proc->proname, proc->mcxt > set $corrupt_next =3D 1 > c > end (gdb) b PLy_function_restore_args (gdb) commands 2 > if $corrupt_next > set {int}((long)savedargs + 24) =3D 2056017128 > set $corrupt_next =3D 0 > end > c > end Output: DELETE proname=3Drepro_srf mcxt=3D0x5686641e1b20 [PLy_function_restore_args fires with savedargs=3D0x5686641e28e8] [nargs set to 2056017128] Program received signal SIGSEGV, Segmentation fault. __strlen_avx2 () PostgreSQL log: server process (PID 366) was terminated by signal 11: Segmentation fault all server processes terminated; reinitializing AFFECTED CODE ------------- src/pl/plpython/plpy_exec.c, lines 503-506: PLy_function_save_args allocates savedargs in proc->mcxt src/pl/plpython/plpy_exec.c, lines 117-119: PLy_function_restore_args is called with potentially dangling savedargs (no check whether proc was rebuilt since savedargs was created) src/pl/plpython/plpy_procedure.c, line 405 (PLy_procedure_delete): MemoryContextDelete(proc->mcxt) frees savedargs without nulling srfstate->savedargs PROPOSED FIX ------------ The root cause is that srfstate->savedargs is tied to proc->mcxt (which can be deleted at any per-call boundary) rather than to funcctx->multi_call_memory_ctx (which lives for the entire SRF lifetime). Option A =E2=80=94 allocate savedargs in funcctx->multi_call_memory_ctx: Change PLy_function_save_args to accept a MemoryContext parameter and pass funcctx->multi_call_memory_ctx from PLy_exec_function. The saved PyObject* references are valid regardless of which MemoryContext holds the struct. Option B =E2=80=94 detect proc rebuild and discard stale savedargs: After PLy_procedure_get returns a new proc, check whether it differs from the proc that created srfstate->savedargs. If so, discard savedargs (PLy_function_drop_args or simply set to NULL) and skip the restore.