Return-Path: linux-postgres-request@native-ed.bc.ca
Received: from raven.native-ed.bc.ca  (raven.native-ed.bc.ca [134.87.106.1]) by nobozo.CS.Berkeley.EDU (8.6.10/8.6.3) with ESMTP id NAA03041 for <aoki@postgres.Berkeley.EDU>; Thu, 2 Mar 1995 13:27:20 -0800
Received: from Relay1.Austria.EU.net (relay1.Austria.EU.net [192.92.138.47]) by  raven.native-ed.bc.ca  (8.6.4/8.6.4) with SMTP id MAA11725 for <linux-postgres@native-ed.bc.ca>; Thu, 2 Mar 1995 12:24:37 -0800
Received: from zen.UUCP by Relay1.Austria.EU.net with UUCP id AA24463
  (5.67b/IDA-1.5 for native-ed.bc.ca!linux-postgres); Thu, 2 Mar 1995 21:23:55 +0100
Message-Id: <m0rkHOh-000I3jC@eka>
To: Kai Petzke <wpp@marie.physik.TU-Berlin.DE>
Cc: linux-postgres@native-ed.bc.ca
Subject: Re: S_LOCK (tas()) endless loops in 4r2 
In-Reply-To: Your message of "Sun, 26 Feb 1995 22:01:14 +0100."
             <199502270029.BAA01239@marie.physik.tu-berlin.de> 
X-Mailer: exmh version 1.4 6/24/94
Date: Thu, 02 Mar 1995 21:22:53 +0100
From: "V.Grabner" <zen@eka.gklw.co.at>


Kai Petzke wrote>

> > Hi,
> > 
> > We have been writing stress test suites for postgres here, in order to 
> > reproduce a strange effect where a backend loops and blocks all others.
> > We actually don't exactly know what causes this effect, but it's somehow 
> > reproducable on machines with heavy load (swapping etc..).
> 
> Please give away a copy of your test suite.  If it includes
> propriety data, give away at least a copy of the queries, that
> cause the problem.  Otherwise it won't be hard to hunt down the
> problems.
> 

Not neccesary, I found a remedy to avoid this deadlock effect:

We wrote a perl script to filter and sort by time the debug log
of the backends, it always went like this:

...
BACKEND 3>...
BACKEND 2>...
BACKEND 12>NOTICE: LockTableLookup failed no lock ....
and then a deadlock of another backend somtimes .......

I increased the following #defines: 

RCS file: /usr/local/CVS/postgres/postgres-release/src/backend/storage/lock.h,v
retrieving revision 1.1
diff -r1.1 lock.h
17c17
< #define MAX_TABLE_SIZE                1000
---
> #define MAX_TABLE_SIZE                10000
29c29
< #define NLOCKS_PER_XACT 40
---
> #define NLOCKS_PER_XACT 100

Deadlocks do not occur anymore (test ran 20 times) ((knock on wood ......)

> So it is almost surely not a problem specific to this or that port.
> I bet race conditions, which are triggered by the swapping, because
> swapping causes unusually long halts during execution.
> 

I think it's a problem of array ranges, but I did not really debug
into it ...

Why not start a project: "Let's increase the fault tolerance of postgres"

2.) Does using the -o-S option of the postmaster (no fsync) any effect, when
    the machine does n o t crash (postgres += 500% faster).

--zen


 -----------------------------------------------------------------------------
 Vinzenz Grabner				Voice	: + 43 1 817 62 30-11  
 GKL&W GmbH.					Fax	: + 43 1 817 62 30-17
 Schoenbrunnerstr. 179/II/4.St. 
 A-1120 Vienna/Austria/Europe			e-mail	: zen@gklw.co.at      
 -----------------------------------------------------------------------------
			--->>>> We moved <<<<---
 -----------------------------------------------------------------------------
 Be aware: The rat will always win, because it's a rat and what else has he 
           got to do but thinking about you ..
 -----------------------------------------------------------------------------

