Return-Path: mer
Received: by postgres.Berkeley.EDU (5.61/1.29)
	id AA16725; Tue, 18 Aug 92 18:03:38 -0700
Message-Id: <9208190103.AA16725@postgres.Berkeley.EDU>
From: mer@postgres.Berkeley.EDU
Subject: Re: more info on nastiness
To: postgres@postgres.berkeley.edu
Sender: pg_adm@postgres.berkeley.edu
In-Reply-To: Your message of "Tue, 18 Aug 92 11:22:46 PDT."
             <9208181822.AA11124@postgres.Berkeley.EDU> 
Date: Tue, 18 Aug 92 18:06:33 -0700
From: mer@postgres.Berkeley.EDU
X-Mts: smtp

In message <9208181822.AA11124@postgres.Berkeley.EDU> you write:

> program terminated by signal BUS (alignment error)
> (dbx) where
> ExecHashOverflowInsert() at 0x1f630
> ExecHashTableInsert() at 0x1f38c
> ExecHash() at 0x1eca8
...

> Yuck. Does *this* ring any bells? (V3R1, remember); it vaguely did for
> me, but I don't remember if the error was V3R1-specific or made its way
> into V4 too. Okay, so now I want to zero out the db and try loading in
> the tables sans what I thought were the offending objects. I thought
> maybe my extreme method of zeroing it out wasn't necessary now, so I ran
> the backend directly and did "delete tablename" and got unending streams
> of the following messages

I just fixed this bug recently.  Someone forgot to LONGALIGN after doing
pointer/constant arithmetic in ExecHashOverflowInsert().

> NOTICE:Aug 18 13:50:11:AbortTransaction and not in in-progress state
> WARN:Aug 18 13:50:11:out of free buffers: time to abort !
>         AbortCurrentTransaction() at Tue Aug 18 13:50:11 1992
> 
> from the backend. ^C and back to the old but trusty method (rm -rf ....).

This is related to the many buffer leaks that were plugged as of v4.
Postgres used to pin buffer pool buffers and never unpin them which caused
the backend to eventually exhaust all of the free buffers.  It looks like
when it tries to abort the xact, the abort processing also tries to acquire
a free buffer and you get this endless recursion (reminds me of a bad
episode of star trek).

> Zo. All of a sudden, nothing workee. Of course, my suposition that the
> problem was somewhere in the (thousands of) new objects added by the
> user is probably wrong. Still, this don't seem right. I'm zapping the
> whole db and starting from scratch, something I *really* don't want to
> ever have to do, since I'm (planning on?) putting my application into
> beta test locally, and we all need to share the same rdbms. I can have a
> high degree of concurrency, that is, many user's connecting to postgres
> at the same time; is this what caused the problem? (I'm the only one
> doing it now). Should I switch to V4R0, buggy though it may be?

I would strongly recommend waiting for 4.0.1 before attempting this.  3.1
has quite a few corruption problems that were directly related to running
multiple users.


Jeff Meredith
mer@postgres.berkeley.edu
