From: tuttle@crl.dec.com
Date: Wed, 27 Mar 96 17:36:46 -0500
Subject: [PG95]: Regression test errors on DEC Alpha/DEC OSF1?

To report a bug, please complete the following form and send it by
email to postgres95@postgres95.vnet.net

============================================================================
                        POSTGRES95 BUG REPORT TEMPLATE
============================================================================


Your name		:	Mark Tuttle
Your email address	:	tuttle@crl.dec.com


System Configuration
- ---------------------
  Architecture (example: Intel Pentium)  	: DEC AlphaStation 400 4/233
  Operating System (example: Linux 1.3.42 ELF) 	: DEC OSF1 V3.2.
  Postgres95 version (example: Postgres95-1.01) : Postgres95-1.01
  Compiler used (example:  gcc 2.7.0)		: cc 3.2 and gcc 2.6.3

Here are some notes on my experience building and testing Postgres95
version 1.01 on a DEC alpha workstation running OSF1.  

I ran the recommended regression tests, and I am concerned about two
discrepancies between my results that the expected results given in
sample.regress.out.  The two differences are given below, search for
lines starting with "++".  Are they significant?

Thanks
Mark

Mark R. Tuttle                                       tuttle@crl.dec.com
DEC Cambridge Research Lab, Cambridge, MA, USA        (+1) 617-692-7635

================================================================
== Compilation notes
================================================================

I am using Postgres95-1.01 grabbed from s2k-ftp.CS.Berkeley.edu 3/25/96.
I am compiling on a DEC AlphaStation 400 4/233 running DEC OSF1 V3.2.
"cc -V" returns the string "The DEC OSF/1 AXP Compiler Driver 3.11"
"gcc --version" returns "2.6.3".
"gnumake --version" returns "GNU Make version 3.68".

- ----------------

There is apparently a bug in the DEC loader that causes it to dump core when
compiling postgres in postgres95/src/backend:

	cc -g  -o obj/postgres obj/ACCESS.o obj/BOOTSTRAP.o obj/COMMANDS.o \
	obj/EXECUTOR.o obj/MAIN.o obj/MISC.o obj/NODES.o obj/PARSER.o \
	obj/OPTIMIZER.o obj/REGEX.o obj/REWRITE.o obj/STORAGE.o obj/TCOP.o \
	obj/UTILS.o  -lln -lm
	Fatal error in: /usr/lib/cmplrs/cc/ld Segmentation fault - core dumped
	gnumake[1]: *** [postgres] Error 11

Even "cc -g -o obj/postgres obj/ACCESS.o" dumps core.  The problem seems
related to the -g flag, since omitting it or replacing it with -g1 fixes
the problem (-g is shorthand for -g2 which includes more symbol table
information "for full symbolic debugging and suppress optimizations that
limit full symbolic debugging").

I just scanned the archives, and I see now Chris Maeda
(maeda@parc.xerox.com) reported a similar problem.

So I ended up compiling with gcc instead of cc.

- ----------------

In postgres95/src:

There is a minor bug in Makefile, since it does not include the definitions in
Makefile.global.  In particular, I set USE_TCL=true in Makefile.global, but
Makefile tests for USE_TCL=true before Makefile.global is included.  This
means that libpgtcl is not added to the list $SUBDIR of subdirectories to
make.

I ended up doing all compilation, installation, and testing with 
"gnumake CC=gcc USE_TCL=true".

- ----------------

In postgres95/src/backend:

Various pairs of the gcc and osf include files 
/usr/include/sys/siginfo.h
/usr/local/lib/gcc-lib/alpha-dec-osf3.0/2.6.3/include/signal.h
/usr/local/lib/gcc-lib/alpha-dec-osf3.0/2.6.3/include/sys/signal.h
are included together.  

Unfortunately, they all contain a "typedef ... sigval_t" and don't
protect themselves from each other.  I had to make local copies of the
files in postgres95/src/backend{/.,/sys} and surround the typedefs with
#ifdefs as in:

#if !defined(SIGVALT)
#define SIGVALT
typedef union sigval {
	int 	sival_int;
	void	*sival_ptr;
} sigval_t;
#endif

- ----------------

In postgres95/bin/pgtclsh:

I had to add libpgtcl to the include search path

CFLAGS+=  -I$(TCL_INCDIR) -I$(TK_INCDIR) -I$(srcdir)/libpgtcl

in the Makefile.

- ----------------

In postgres95/libpgtcl:

I had to add libpq to the include search path

CFLAGS+= -I$(HEADERDIR) \
         -I$(srcdir)/backend/include \
         -I$(srcdir)/backend \
         -I$(CURDIR) \
         -I$(TCL_INCDIR) \
         -I$(srcdir)/libpq

in the Makefile.

================================================================
== Installation notes
================================================================

No surprises.

================================================================
== Regression testing notes
================================================================

I started as tuttle, then su'd to postgres to run "initdb" and
"postmaster -S", then returned to being tuttle and did 
"gnumake CC=gcc USE_TCL=true all runtest".

There were a few things that surprised me:

- ----------------

+ Immediately I was told:

  =============== destroying old regression database... =================
  Connection to database 'template1' failed.
  FATAL 1:SetUserId: user "tuttle" is not in "pg_user"
  destroydb: database destroy failed on regression.
  =============== creating new regression database... =================
  Connection to database 'template1' failed.
  FATAL 1:SetUserId: user "tuttle" is not in "pg_user"
  createdb: database creation failed on regression.
  createdb failed
  RESULTS OF REGRESSION ARE SAVED IN obj/regress.out

so I su'd to postgres, and did "createuser tuttle" and gave myself
permission to create databases and add users.

- ----------------

+ I cleaned up and ran the test and was told among other things that

  QUERY: COPY onek TO 
	'/tmp_mnt/winnie/e1/src/postgres95/src/test/regress/obj/onek.data';
  WARN:COPY: file
	/tmp_mnt/winnie/e1/src/postgres95/src/test/regress/obj/onek.data 
	could not be open for writing

I realized that the source tree was owned by tuttle, but that postgres95
was trying to write into ./obj/onek.data as user postgres.  I gave postgres
write permission for ./obj.  I cleaned up and ran the test again, and
found onek.data and stud_emp.data in ./obj owned by postgres.  Are the 
files onek.data and stud_emp.data supposed to be owned by postgres or by
tuttle?

- ----------------

+ The documentation says that my results ./obj/regress.out and the
sample results ./sample.regress.out should differ only in pathnames,
but there are two classes of unexpected differences:

++ timestamps:

./obj/regress.out included lines like

	QUERY: SELECT '' AS eleven, ABSTIME_TBL.*;
	eleven  f1                            
	------- ----------------------------- 
	        Sun Jan 14 03:14:21 1973 EST  
	        Mon May 01 03:30:30 1995 EDT  
 
./sample.regress.out included corresponding lines like

	QUERY: SELECT '' AS eleven, ABSTIME_TBL.*;
	eleven  f1                            
	------- ----------------------------- 
	        Sun Jan 14 03:14:21 1973 PST  
	        Mon May 01 00:30:30 1995 PDT  

The timezones differ and the times in the second lines differ by three
hours.  (I live in Eastern time, which differs by three hours from
Pacific time.  But why does only one of the two lines differ by three
hours?)

++ a single true error:

./obj/regress.out includes

	QUERY: SELECT user_relns() AS user_relns
	   ORDER BY user_relns;
	user_relns     
	-------------- 
	ABSTIME_TBL    
	BOOLTBL1       
	BOOLTBL2       
	...
	TINTERVAL_TBL  
	a,764348       
	...

where "..." represents deleted lines, but sample.regress.out contains
"a,276956" in place of "a,764348".  I don't know what "a,764348"
represents, but this does look like a true error.

================================================================
== End of notes
================================================================

------------------------------

From: Jolly Chen <jolly@postgres.berkeley.edu>
Date: Wed, 27 Mar 1996 16:57:51 -0800
Subject: Re: [PG95]: Regression test errors on DEC Alpha/DEC OSF1? 

Thank you Mark for a lucid piece of mail that clearly described your
experience with postgres95.   It really helps us a lot in diagnosing
problems.  (In general, the probability of a bug report being heeded is
much higher if a clear bug template form has been submitted.  Even so,
the probability is still < 1. =) )

> There is apparently a bug in the DEC loader that causes it to dump core when
> compiling postgres in postgres95/src/backend:
>
> related to the -g flag, since omitting it or replacing it with -g1 fixes
> the problem (-g is shorthand for -g2 which includes more symbol table
> information "for full symbolic debugging and suppress optimizations that
> limit full symbolic debugging").

This is good to know.  If anyone else out there is using DEC's cc on
OSF1, please report if you have a similar experience.

By default, the only user that's installed in pg_user is the user who
ran 'make install'.   I recommend doing the build and install as the
'postgres' user and then adding normal users as desired.  The files
should be owned by 'postgres'.

> I realized that the source tree was owned by tuttle, but that postgres95
> was trying to write into ./obj/onek.data as user postgres.  I gave postgres
> write permission for ./obj.  I cleaned up and ran the test again, and
> found onek.data and stud_emp.data in ./obj owned by postgres.  Are the 
> files onek.data and stud_emp.data supposed to be owned by postgres or by
> tuttle?

Files outputted by COPY TO are owned by the user who started the
postmaster process, typically the 'postgres' user.

> ++ timestamps:
> 
> ./obj/regress.out included lines like
> 
> 	QUERY: SELECT '' AS eleven, ABSTIME_TBL.*;
> 	eleven  f1                            
> 	------- ----------------------------- 
> 	        Sun Jan 14 03:14:21 1973 EST  
> 	        Mon May 01 03:30:30 1995 EDT  
>  
> ./sample.regress.out included corresponding lines like
> 
> 	QUERY: SELECT '' AS eleven, ABSTIME_TBL.*;
> 	eleven  f1                            
> 	------- ----------------------------- 
> 	        Sun Jan 14 03:14:21 1973 PST  
> 	        Mon May 01 00:30:30 1995 PDT  
> 
> The timezones differ and the times in the second lines differ by three
> hours.  (I live in Eastern time, which differs by three hours from
> Pacific time.  But why does only one of the two lines differ by three
> hours?)

Only one of the two lines differ because in the regression test, the May
1st date was inserted with an explicit PDT timezone.  The Jan 14th day
was inserted without an explicit timezone and would have picked up your
local timezone.

> ++ a single true error:
> 
> ./obj/regress.out includes
> 
> 	QUERY: SELECT user_relns() AS user_relns
> 	   ORDER BY user_relns;
> 	user_relns     
> 	-------------- 
> 	ABSTIME_TBL    
> 	BOOLTBL1       
> 	BOOLTBL2       
> 	...
> 	TINTERVAL_TBL  
> 	a,764348       
> 	...
> 
> where "..." represents deleted lines, but sample.regress.out contains
> "a,276956" in place of "a,764348".  I don't know what "a,764348"
> represents, but this does look like a true error.

a,XXXXXX is the name of an archived relation.  In the regression test,
one of the temporary relations was created with ARCHIVE = light.
Because the XXXXXX part is the oid, it will differ for each run for the
regression test.

I'm glad to see you didn't have any more serious problems.  Thanks for
taking time out to play with postgres95 and point out potential problems.

- - Jolly

