Return-Path: pg_adm@postgres.berkeley.edu
Received: by postgres.Berkeley.EDU (5.61/1.29)
	id AA00013; Thu, 25 Mar 93 16:08:49 -0800
Date: Thu, 25 Mar 93 16:08:49 -0800
Message-Id: <9303260008.AA00013@postgres.Berkeley.EDU>
From: mcquaig!postgres@uunet.UU.NET (Postgres System User)
Subject: Core dump elucidate?
To: postgres@postgres.berkeley.edu
Sender: pg_adm@postgres.berkeley.edu


Perhaps a shared memory guru amongst us (a not so guru with any idea
would also be most appreciated ;-) could elucidate the behavior I am
experiencing.  
Scenario: postgres aborts for some reason.  The postmaster runs
its cleanup as in the following stack trace:
   #0  0x810d303 in hash_search ()
   #1  0x80e0e80 in ShmemPIDLookup ()
   #2  0x80e71fa in ProcSemaphoreKill ()
   #3  0x804b4ac in CleanupProc ()
   #4  0x804b322 in reaper ()
   #5  0x8001e301 in IpcPrivateMem ()
   #6  0x804aa73 in ServerLoop ()
   #7  0x804a8c4 in main ()
   #8 0x804a377 in _start () 

In hash_search():1160 or thereabouts is hctl = hashp->hctl the value
of hctl is in shared (virtual) memory.  Its value is in the 2gig range
and has not changed since it was created.  It was at creation and
prior the abort, accessible and valid.  It is after the abort
unaccessible and causes.

   Core was generated by `/usr/local/postgres/bin/postmaster'.
   Program terminated with signal 11, Segmentation fault.
   #0  0x810d303 in hash_search ()

Both the postmaster and postgres have and use the same HTAB hashp
value and its hctl member value.  pgv4r0r1 experiences the same type
of crashes (at least the one I reported last week about the use of
varlena's that are NULL within a rule) but the shared memory on
cleanup does not become a segment violation, it remains accessible and
valid.  The address generated in both releases are similar (in the
2gig range).

QUERY: Can anyone give me a clue as to what might make shared memory
inaccesible upon return to the postmaster?  I apologize for being
shamefully lacking in this area.

THE DUMP: Someone recently found the NOFILES (or perhaps SFNOLIM and
HFNOLIM) being too small would cause postgres to abort.  However, I
used the configuration (which is maxed) since pgv3r0 at least.  Has
something changed with respect to this tunable in pgv4r1?

I must admit that at present I am at a loss for the cause of the
intermittent aborts (core dumps would be preferable ;).  My
development rig is only a 486DX (32meg) running SVR4 but I am
essentially the only traffic on it for any practical purposes.  I
applied the 'if (value=NULL) return(NULL);' fix suggested earlier and
it stablized things considerabley THANKS.  However, I am still
suspicious of an errant NULL pointer somewhere.  If it looks, feels
and smells like it ...  but then ya never know til ya find it.

salutations from the swamp
nmm

Neil M. McQuaig      344 Millicent Way,  Shreveport, LA  71106
VOICE: (318)868-5611 UUCP: mcquaig!nmm (318)861-1051 or uunet!mcquaig!nmm
