Return-Path: pg_adm@postgres.berkeley.edu
Received: by postgres.Berkeley.EDU (5.61/1.29)
	id AA18691; Wed, 17 Mar 93 00:10:01 -0800
Date: Wed, 17 Mar 93 00:10:01 -0800
Message-Id: <9303170810.AA18691@postgres.Berkeley.EDU>
From: mcquaig!postgres@uunet.UU.NET (Postgres System User)
Subject: Core Dumps
To: postgres@postgres.berkeley.edu
Sender: pg_adm@postgres.berkeley.edu


There were where several questions lately about postgres systems
aborting.  This may or may not be the cause.  It is however a definite
problem in SVR4 (actually sysv in general).  In sysv when a signal
handler is called, the signal is reset so that the handler no longer
catches the signal (probably an attempt to defeat endless loops if an
error occurs while in a handler.)  Evidently bsd does not reset
signals or if it does, postgres in 2 (at least) places does not
re-establish the signal.

	* postgres.c: about line 1220
	within the setjmp(Warn_restart) condition added a call to
	signal(SIGHUP, handle_warn) to reinstate the error loop since
	sysv looses the signal once it is called!  The symptom is that
	2 elog(WARN,...)s causes a core dump.  The first warn goes to
	handle_warn() and does a longjmp the the setjmp(Warn_restart)
	the second warning causes the process to HANGUP.  The HANGUP
	causes the postmaster to hit a segment violation trying to free
	shared memory.

	* postgres.c:reaper() 
	just prior to exit, must reestablish signal else svr4
	leaves zombies lying around.
	#if defined(PORTNAME_svr4)
		signal(SIGCHLD, reaper);
	#endif /* PORTNAME_svr4 */

For some reason pgv4r0r1 and earlier did not dump core, it just quit.
So there may be more trouble lying around waiting to be discovered or
perhaps it was trouble that caused pgv4r0r1 to not dump core.  Time
will tell.  I hope this helps.  
nmm

Neil M. McQuaig, III      344 Millicent Way,  Shreveport, LA  71106
VOICE: (318)868-5611 UUCP: mcquaig!nmm (318)861-1051 or uunet!mcquaig!nmm
