Return-Path: pg_adm@postgres.berkeley.edu
Received: by postgres.Berkeley.EDU (5.61/1.29)
	id AA14555; Tue, 18 Aug 92 14:57:53 -0700
Date: Tue, 18 Aug 92 14:57:53 -0700
Message-Id: <9208182157.AA14555@postgres.Berkeley.EDU>
From: Sean Levy <snl+@cs.cmu.edu>
Subject: Re: Bizarre problems with V3R1 on DS5000/200 (Ultrix 4.2)
To: postgres@postgres.berkeley.edu
Sender: pg_adm@postgres.berkeley.edu
Cc: postgres@postgres.berkeley.edu
In-Reply-To: <9208182109.AA05897@triplerock.CS.Berkeley.EDU>
References: <9208182109.AA05897@triplerock.CS.Berkeley.EDU>

Excerpts from mail: 18-Aug-92 Re: Bizarre problems with V..
mao@postgres.berkeley.ed (828)

> it looks like your database was corrupted.  the most likely explanation
> is that a bug in the hashing code contaminated a data page in shared
> memory, and the data page made it down to disk.  the easiest fix is to
> destroy and recreate your db.  if you can't roll back to a version of
> the data/base directory off dump tape, you have probably lost the stuff
> you loaded.


Um, I've totally scratched out the data/ dir and reloaded from ascii
files umpteen times now. The sequence was (I did this initially before I
started this whole dance).
	* copy object to "/tmp/object.dump" \g
	* copy link to "/tmp/link.dump" \g
	* delete object \g
at this point I get the NOTICE/WARN messages about tables overflowing so
I say screw it, ^C, kill the postmaster and
	csh>cd ~postgres
	csh>rm -rf data 	# EVERYTHING GOES
	csh>initdb			# NEW DEAL
	csh>createdb ndim
	csh>create-schemas	# this is a csh script with monitor -c cmds to
					# create new object and link classes
at this point I am starting TOTALLY from scratch. I load in the old tables, e.g
	monitor ndim
	* copy object from "/tmp/object.dump" \g
	COPY
	* copy link from "/tmp/link.dump" \g
	COPY
Everyone is happy. Now, however, when I start up my application which, I
must stress, WORKED JUST FINE YESTERDAY, we get the core dump; likewise,
if I issue queries that do anything at all (e.g. a join) from the
monitor, the backend dies as described. This happens on two different
machines, which happen to be two different platforms (sun4 and ultrix).
As I said, I've gone through this cycle umpteen times, the only
differences generally being that I massage the *.dump files offline to
try and get rid of offending records. Since there are ~6000 records in
object.dump and ~4000 in link.dump, it is sort of hard to pick out an
offending line by scanning. But, anyway, I shouldn't be able to corrput
the database with data I'm loading from an ascii file, should I? I
*have* noticed that the copy command seems to blithely ignore errors in
the structure of the input table (too many columns, anyway) and other
problems that it should probably at least flag for you, but I didn't
think that was causing a core dump (sun4 core dumps, pmax produces
alignment error messages ad inf. The core dump on the sun4 was a bus
error from an alignment problem, by the way).

--
Sean Levy, n-dim Group, EDRC, CMU, 5000 Forbes Ave, PGH, PA 15213
Email: snl+@cmu.edu, Phone: +1 412 268 5221, Fax: +1 412 268 5229
