Return-Path: postarch
Received: by postgres.Berkeley.EDU (5.61/1.29)
	id AA18121; Thu, 18 Jun 92 16:12:42 -0700
Message-Id: <9206182312.AA18121@postgres.Berkeley.EDU>
From: postarch (Postgres Mailing Archive)
Subject: Re: internal storage???
To: postgres@postgres.berkeley.edu
Sender: pg_adm@postgres.berkeley.edu
Reply-To: mer@postgres.berkeley.edu
In-Reply-To: Your message of "Mon, 15 Jun 92 08:59:51 PDT."
             <9206151559.AA04543@postgres.Berkeley.EDU> 
Date: Thu, 18 Jun 92 16:12:34 PDT

you write:

> 	I'm using postgres to store the signal data from many time
> series from different data-collections.
> 
> 	Each signal representation is a vector ranging somewhere
> between 128-512 elements. One byte per element is enough to store the
> information. 
> 	I'm storing approximately 4 vectors per sample. Postgres kept
> barfing when I tried to define and store them all as a single class,
> so I ended up defining them as 2 classes:

I suspect that maybe the tuple size was growing too big.  Tuples in
postgtes can be at most 8K - 60 bytes.  If you require something larger
you have to move to the large object interface.  If you have a core dump
from the barf days I would be interested in seeing a stack trace and some
sample data.

> create sig_amp (
> 	sig_id = int4,
> 	samp_number = int2,
> 	proc_code = int2,
> 	real_amp = bytea[],
> 	im_amp = bytea[]
> 	)
> 
> create sig_phase (
> 	sig_id = int4,
>         samp_number = int2,
> 	proc_code = int2,
> 	phase_ang = bytea[],
> 	phase_diff = bytea[]
> 	)

One thing is the way you have this declared uses bytea[] a variable length
array of byte arrays.  Do you really want the 2-dimensions? Could you get by
with just bytea?  The extra variable length dimension costs you 4 bytes per
row.

> 	Anyway... The purpose of this message is that after storing
> some 6,000 signal entries into a database I happened to notice that
> the disk storage being eaten up by this storage format is over 100 Mb.

wow! I don't think this is right. If we do the math:

	100 Mb / 6000 tuples = 17476 bytes per tuple

which in Postgres is impossible.  If it were 12000 tuples then we near
the realm of possibility.  Could it be that you populated then deleted
and repopulated (possibly several times)?  Postgres has a no overwrite
storage manager so classes are grow only until you run the vacuum cleaner.
For big databases we recommend running it nightly.

Doing a little more math we can figure out how much space you relations
should be taking up per tuple:

  space = 60 +                                 tuple header size
          4  +                                 size of int4
          2  +                                 size of int2
          2  +                                 size of int2
          (4 + rows*(4 + cols)) +              size of 2-dimensional bytea
          (4 + rows*(4 + cols))                size of 2-dimensional bytea

        = 68 + 8 + 2*rows*(4 + cols)           roughly

        = 76 + 8*rows + 2*rows*cols

> Since I've got a few more signals to store (over 100,000) I can
> easily see that I'll run out of disk space before I can store them
> all.
> 	For the current signals I've stored, I'm only storing 256
> entries per sample array. I just hand't planned for all of the extra
> overhead in storing the samples into postgres and I expected to find
> that I had used less than 10Mb instead of over 100 Mb.

given these parameters (assuming 16 rows by 16 cols):

	tuple size = 76 + 8*16 + 2*256

	           = 716 bytes

So finally to make 100 Mb you must have appended:

	100 Mb / 716 = 146449 tuples

My math isn't very good, can anyone see something that I'm missing?


Jeff Meredith
mer@postgres.berkeley.edu